N-Gram Based Approaches

N-Gram Based Approaches
n-gram: a sequential list of n words, often used in information retrieval and language modeling to encode the likelihood that the phrase will appear in the future N-gram Based Approaches create probabilistic models of n-grams from a given corpus of text and tag new utterances using these models. “I don’t know what to say” 1-gram (unigram): I, don’t, know, what, to, say 2-gram (bigram): I don’t, don’t know, know what, what to, to say 3-gram (trigram): I don’t know, don’t know what, know what to, etc. … n-gram

N-Gram Motivation Advantages Disadvantages
Encode not just keywords, but also word ordering, automatically Models are not biased by hand coded lists of words, but are completely dependent on real data Learning features of each affect type is relatively fast and easy Human intuition is often incorrect and misses subtleties in language Disadvantages Long range dependencies are not captured Dependent on having a corpus of data to train from Sparse data for low frequency affect tags adversely affects the quality of the n-gram model

N-Gram Approaches Naïve Approach Standard n-grams only
Weighted Approach Weight the longer n-grams higher in the stochastic model Lengths Approach Include a length-of-utterances factor, capturing the differences in utterance length between affect tags Weights with Lengths Approach Combine Weighted with Lengths Analytical Approach Include word repetition as a factor in the models, isolating acknowledgement utterances from other types

P(tagi | utt) = maxj,k P(tagi | ngramjk)
Naïve Approach P(tagi | utt) = maxj,k P(tagi | ngramjk) Find the highest probability n-gram in a given utterance utt for each possible tag tagi and choose the tag with the highest probability.

Naïve Approach Example
I don’t want to be chained to a wall. N-gram Tag Top N-gram Probability 1 GEN don’t 0.665 2 to a 0.692 3 <s> I don’t 0.524 4 DTL don’t want to be 0.833 5 I don’t want to be 1.00

P(tagi | utt) = ∑k=0,m ((maxj P(tagi | ngramjk)) * weightk)
Weighted Approach P(tagi | utt) = ∑k=0,m ((maxj P(tagi | ngramjk)) * weightk) weightk = hand coded weights for each n-gram length k = {1,2,3,4,5} weightk = { 0.4, 0.4, 0.5, 0.8, 0.8 } Sum the highest probability n-gram in utt of each possible tag tagi and multiple the sum by a weight based on the size of the n-gram (5-grams contain more information than 1-grams).

Weighted Approach Example
I don’t want to be chained to a wall. N-gram Tag Top N-gram Probability 1 GEN DTL don’t want 0.665 0.452 2 to a want to 0.692 0.443 3 <s> I don’t I don’t want 0.524 0.592 4 I don’t want to don’t want to be 0.27 0.833 5 <s> I don’t want to I don’t want to be 0.25 1.00 GEN sum (w/weights) 1.255 DTL sum (w/weights) 2.086

P(tagi | utt) = (maxj,k P(tagi | ngramjk)) * lenWeightim
Lengths Approach P(tagi | utt) = (maxj,k P(tagi | ngramjk)) * lenWeightim lenWeightim = probability a sentence m is tagged with tagi based on m’s length. Computed using the average length and standard deviations in the training data Find the highest probability n-gram in utt for tag tagi and multiple it by the probability that a sentence of utt’s length is tagged tagi.

Lengths Approach Example
I don’t want to be chained to a wall. N-gram Tag Top N-gram Probability 1 GEN don’t 0.665 * = 0.026 2 to a 0.692 * = 0.027 3 <s> I don’t 0.524 * = 0.021 4 DTL don’t want to be 0.833 * = 0.019 5 I don’t want to be 1.000 * = 0.023

Weights with Lengths Approach
I don’t want to be chained to a wall. Weighted Approach: GEN sum (w/weights) 1.255 DTL sum (w/weights) 2.086 With Lengths: GEN sum (w/weights) * = DTL sum (w/weights) * = Adding the lengths weight changes the tag choice from DTL to GEN.

Analytical Approach Many acknowledgement ACK utterances were being mistagged as GEN by the previous approaches. Most of the errors came from grounding that involved word repetition: A - so then you check that your tire is not flat. B - check the tire We created a model that takes into account word repetition in adjacent utterances in a dialogue. We also include a length probability to capture the Lengths Approach. Only unigrams are used to avoid sparseness in the training data.

Analytical Approach P(w1 | T) * P(w2 | T) * … * P(wn | T) *
P(Rw1 | Ow1, L, Lp, T) * … * P(Rwn | Own, L, Lp, T) * P(L | T) * P(T) unigram probabilities Probability that each word is repeated given it occurred in the previous utterance and the given the lengths of both utterances Length probability times the tag’s overall probability

N-Gram Approaches Results
6-Fold Cross Validation on the UR Marriage Corpus Naive Weighted Lengths Weights with Lengths Analytical 66.80% 67.43% 64.35% 66.02% 66.60% 6-Fold Cross Validation on the Switchboard Corpus Naive Weighted Lengths Weights with Lengths Analytical 68.41% 68.77% 69.01% 70.08% 61.40%

CATS CATS: An Automated Tagging System for affect and other similar information retrieval tasks. Written in Java for cross-platform interoperability. Implements the Naïve approach with unigrams and bigrams only. Builds the stochastic models automatically off of a tagged corpus, input by the user into the GUI display. Automatically tags new data using the user’s models. Each tag also receives a confidence score, allowing the user to hand check the dialogue quickly and with greater confidence.

The CATS GUI provides a clear workspace for text and tags.
Tagging new data and training old data is done with a mouse click.

Customizable models are available
Customizable models are available. Create your own list of tags, provide a training corpus, and build a new model.

Tags are marked with confidence scores based on the probabilistic models.

N-Gram Based Approaches

Similar presentations

Presentation on theme: "N-Gram Based Approaches"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

N-Gram Based Approaches

Similar presentations

Presentation on theme: "N-Gram Based Approaches"— Presentation transcript:

Similar presentations

About project

Feedback