Download presentation

Presentation is loading. Please wait.

Published byKristen Hush Modified over 4 years ago

1
Center for PersonKommunikation P.1 N-grams Sentence: S = w1 w2... wQ Ideal sentence probability: P(S) = P(w1 w2... wQ)= P(w1)P(w2|w1)P(w3|w1 w2)...P(wQ|w1 w2...wQ-1) Approximate conditional word probability: P(wQ|w1 w2... wQ-1) p(wQ|wQ-N+1... wQ-1) - where N has a constant “windowing” size: Unigram (N=1), Bigram (N=2), Trigram (N=3)

2
Center for PersonKommunikation P.2 Trigram smoothing (Jellinek) Used when there are insufficient data for real trigrams P(w3|w1 w2)= p1 F(w1,w2,w3) + p2 F(w1,w2) + p3 F(w1) F(w1, w2) F(w1) F(wi) Where: F is number of occurences of the string in its argument F(wi) is the number of words in corpus p1, p2, p3 are positive values and p1+p2+p3=1

3
Center for PersonKommunikation P.3 Clustering words in N-grams N-grams of word classes, categorical N-grams: –Words are “replaced” by (semantic, syntactic) categories before training. (e.g. “w_day” for Monday, Tuesday...) Data-driven clustering

4
Center for PersonKommunikation P.4 N-gram problems Long distance dependencies exceeding n: [kommoden/bordet/stolene] i værelset på tredje etage skal males [rød, rødt, røde] Stochastic grammars “freezes” human verbal behaviour at a state reflected in the training data. The verbal behaviour may change. Adaptive approach? Finding corpora reflecting how humans will communicate with the final system –(Human-human dialogs vs. WOZ-experiments).

Similar presentations

OK

Probabilistic Language Processing Chapter 23. Probabilistic Language Models Goal -- define probability distribution over set of strings Unigram, bigram,

Probabilistic Language Processing Chapter 23. Probabilistic Language Models Goal -- define probability distribution over set of strings Unigram, bigram,

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google