from Wiktionary, Creative Commons Attribution/Share-Alike License.
- noun linguistics An
n-gramconsisting of a single itemfrom a sequence
Sorry, no etymologies found.
Subsequently upon receiving a query, a set of features corresponding to the query, such as the length and/or frequency of the query, unigram probabilities of respective words and/or groups of words in the query, presence of pre-designated words or phrases in the query, or the like, can be generated.
(If you're following the play at home, note that all the conditional entropies are in nats, and all are determined relative to just one preceding character - the "number of tokens" on the X axis is not clearly explained in the text, but it appears to mean that they calculated the conditional entropy of the choices following the commonest (unigram) symbol, and then for the two commonest, the three commonest, etc.)
Similarly, if there are no counts to compute P (wk | wk − 1), the unigram probability P (wk) could be considered.
Thus for a trigram grammar, the format of each N-gram is the following: N-gram Format unigram log P ∗ (wk) wk log α (wk) bigram log P ∗ (wk