Trigram Model
Three consecutive words make up a trigram, sometimes referred to as a 3-gram. A 3-gram is an n-gram. Trigrams provide context by capturing three words, while unigrams include one word and bigrams two.
The trigrams “I am learning” and “am learning NLP” might be used to say “I am learning NLP”.
Trigrams are often employed in several tasks related to natural language processing, especially in:

Language Modelling
- An n-gram model in which a word’s likelihood is determined by the two words that come just before it is called a trigram. This is based on the Markov assumption, often known as a restricted horizon, which states that the tag or word depends solely on a limited number of prior components (k=2 for trigrams).
- Using counts from a training corpus, one may estimate the likelihood of a word w_i given the two words that come before it (w_{i-2}, w_{i-1}). (w_i | w_{i-2}, w_{i-1}) = C(w_{i-2}, w_{i-1}, w_i) / C(w_{i-2}, w_{i-1}) is the formula.
- Pseudo-words (such as ) are usually added to sentences that have fewer than two previous words in order to give the context required for trigram probabilities.
- Because they take into account more context, trigram models frequently perform better than unigram and bigram models. Lower perplexity values reflect this; a lower perplexity indicates a better language model. Because trigram models anticipate the following word more accurately than unigram models, they give word sequences a greater probability.
- In fact, when there is enough training data available, trigram models are more prevalent than bigram models. If sufficient data is available, higher-order n-grams, such as 4-grams and 5-grams, may also be employed.
Part-of-Speech (POS) Tagging
- One kind of statistical tagger is a trigram tagger. By taking into account the likelihood of a tag given the two tags that come right before it, they award a word a tag. This model is referred to as second-order.
- Compared to unigram or bigram taggers, which only remember one tag or have no recall of past tags, trigram taggers are more advanced. By examining the context of two prior tags, they can assist in resolving ambiguity in tagging. They can tell the difference between ‘is clearly marked’ (BEZ RB VBN) and ‘he clearly marked’ (PN RB VBD), for example, because these trigrams are more common.
- In POS tagging, trigram taggers are typically preferred over bigram taggers. Trigram taggers are based on the conventional Markov model (MM) used in Church’s (1988) early, significant work on statistical tagging.
Sparsity and Smoothing
- As with bigram models, data sparsity is a major problem for trigram models. If basic frequency counting is employed, many potential trigrams might not be present in the training corpus, resulting in zero probability estimations.
- Smoothing methods are essential for appropriate performance in order to remedy this. Through these techniques, some probability mass is redistributed to trigrams that are not visible.
- Using weighted sums, interpolated models may integrate trigram probabilities with those from lower-order models (bigrams and unigrams).
- Another technique is the backoff model, which uses a bigram or unigram probability in the event that a trigram probability is 0 or unreliable.
- To enhance estimates, especially for unobserved occurrences, trigram frequencies are also subjected to Good-Turing estimation.
Feature Representation
- Bigrams and word trigrams are frequently employed as features in several Natural Language Processing models. They aid in representing structures like “New York” or “not good” by capturing local word sequences that may be more instructive than individual words.
- Word n-grams beyond trigrams are less popular because of growing sparsity difficulties, even though 4-gram and 5-gram letter combinations are occasionally utilised.
A 4-gram model is always referred to as “tetragram” because, although “trigram” is a commonly used term, the sources point out that the Greek root “gramme” would technically pair with Greek prefixes. Nonetheless, it is observed that the area frequently uses a combination of Latin, Greek, and English prefixes in names like unigram, bigram, and trigram.