Presentation is loading. Please wait.

Presentation is loading. Please wait.

NLP. Introduction to NLP Rule-based Stochastic –HMM (generative) –Maximum Entropy MM (discriminative) Transformation-based.

Similar presentations


Presentation on theme: "NLP. Introduction to NLP Rule-based Stochastic –HMM (generative) –Maximum Entropy MM (discriminative) Transformation-based."— Presentation transcript:

1 NLP

2 Introduction to NLP

3 Rule-based Stochastic –HMM (generative) –Maximum Entropy MM (discriminative) Transformation-based

4 Find tag sequence that maximizes the probability formula –P(word|tag) * P(tag|previous n tags) A bigram-based HMM tagger chooses the tag t i for word w i that is most probable given the previous tag t i-1 and the current word w i : –t i = argmax j P(t j |t i-1,w i ) –t i = argmax j P(t j |t i-1 )P(w i |t j ) : HMM equation for a single tag

5 T = argmax P(T|W) –where T=t 1,t 2,…,t n By Bayes’ theorem –P(T|W) = P(T)P(W|T)/P(W) Thus we are attempting to choose the sequence of tags that maximizes the right hand side of the equation –P(W) can be ignored –P(T) is called the prior, P(W|T) is called the likelihood.

6 Complete formula –P(T)P(W|T) = Π P(w i |w 1 t 1 …w i-1 t i-1 t i )P(t i |t 1 …t i-2 t i-1 ) Simplification 1: –P(W|T) = Π P(w i |t i ) Simplification 2: –P(T)= Π P(t i |t i-1 ) Bigram approximation –T = argmax P(T|W) = argmax Π P(w i |t i ) P(t i |t i-1 )

7 P(NN|JJ) = C(JJ,NN)/C(JJ)=22301/89401 =.249 P(this|DT) = C(DT,this)/C(DT)=7037/103687 =.068

8 The/DT rich/JJ like/VBP to/TO travel/VB./.

9 DTNNVBPTO NN. Therichlike travel.to DTNNVBPTO VB. Therichlike travel.to

10 P(NN|TO) =.00047 P(VB|TO) =.83 P(race|NN) =.00057 P(race|VB) =.00012 P(NR|VB) =.0027 P(NR|NN) =.0012 P(VB|TO)P(NR|VB)P(race|VB) =.00000027 P(NN|TO)P(NR|NN)P(race|NN) =.00000000032

11 Data set –Training set –Development set –Test set Tagging accuracy –how many tags right Results –Accuracy around 97% on PTB trained on 800,000 words –(50-85% on unknown words; 50% for trigrams) –Upper bound 98% - noise (e.g., errors and inconsistencies in the data, e.g., NN vs JJ)

12 [Brill 1995] Example –P(NN|sleep) =.9 –P(VB|sleep) =.1 –Change NN to VB when the previous tag is TO Types of rules: –The preceding (following) word is tagged z –The word two before (after) is tagged z –One of the two preceding (following) words is tagged z –One of the three preceding (following) words is tagged z –The preceding word is tagged z and the following word is tagged w

13

14

15

16 New domains –Lower performance Distributional clustering –Combine statistics about semantically related words –Example: names of companies –Example: days of the week –Example: animals

17 Jason Eisner’s awesome interactive spreadsheet about learning HMMs –http://cs.jhu.edu/~jason/papers/#eisner-2002-tnlphttp://cs.jhu.edu/~jason/papers/#eisner-2002-tnlp –http://cs.jhu.edu/~jason/papers/eisner.hmm.xlshttp://cs.jhu.edu/~jason/papers/eisner.hmm.xls

18 NLP


Download ppt "NLP. Introduction to NLP Rule-based Stochastic –HMM (generative) –Maximum Entropy MM (discriminative) Transformation-based."

Similar presentations


Ads by Google