Language Models

NGrams.LanguageModel — Type.

LanguageModel(N; bos, eos, estimator=NGrams.MLE())

Create an N-gram language model, estimating probabilities with estimator.

source

Training

NGrams.fit! — Function.

NGrams.fit!(lm::LanguageModel, tokens)

Train the language model by observing a sequence of tokens.

source

Probability and Smoothing

NGrams.MLE — Type.

NGrams.MLE()

Maximum Likelihood Estimation for n-gram language modeling.

source

NGrams.AddK — Type.

NGrams.AddK(k::Number)

Add-k probability smoothing for n-gram language modeling.

source

NGrams.Laplace — Type.

NGrams.Laplace()

Laplace (add-1) smoothing for n-gram language modeling.

source

NGrams.LinearInterpolation — Type.

LinearInterpolation(λ)

Linear interpolation for probability smoothing in n-gram language modeling.

λ should be a vector or tuple of linear coefficients for smoothing the model. The coeffients are ordered occording to the n-gram complexity of the model; i.e., the first element is the weight for the model without any backoff, and the final element is the weight for the unigram model.

source

NGrams.AbsoluteDiscounting — Type.

NGrams.AbsoluteDiscounting(d::Number)

Absolute discounting for n-gram language modeling.

source

Sampling from a language model

NGrams.sample — Function.

NGrams.sample([rng::AbstractRNG,] lm, [vocabulary])

Sample a single token from the language model.

source

NGrams.generate — Function.

NGrams.generate(lm, num_words=1, text_seed=[])

Randomly generate num_words from language model.

If text_seed is provided, output is conditioned on that history. The seed is included in the return value and counts against num_words.

source