Language Modeling

Language Models

LanguageModel(N; bos, eos, estimator=NGrams.MLE())

Create an N-gram language model, estimating probabilities with estimator.

source

Training

NGrams.fit!Function.
NGrams.fit!(lm::LanguageModel, tokens)

Train the language model by observing a sequence of tokens.

source

Probability and Smoothing

NGrams.MLEType.
NGrams.MLE()

Maximum Likelihood Estimation for n-gram language modeling.

source
NGrams.AddKType.
NGrams.AddK(k::Number)

Add-k probability smoothing for n-gram language modeling.

source
NGrams.LaplaceType.
NGrams.Laplace()

Laplace (add-1) smoothing for n-gram language modeling.

source
LinearInterpolation(λ)

Linear interpolation for probability smoothing in n-gram language modeling.

λ should be a vector or tuple of linear coefficients for smoothing the model. The coeffients are ordered occording to the n-gram complexity of the model; i.e., the first element is the weight for the model without any backoff, and the final element is the weight for the unigram model.

source
NGrams.AbsoluteDiscounting(d::Number)

Absolute discounting for n-gram language modeling.

source

Sampling from a language model

NGrams.sampleFunction.
NGrams.sample([rng::AbstractRNG,] lm, [vocabulary])

Sample a single token from the language model.

source
NGrams.generateFunction.
NGrams.generate(lm, num_words=1, text_seed=[])

Randomly generate num_words from language model.

If text_seed is provided, output is conditioned on that history. The seed is included in the return value and counts against num_words.

source