Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fall 2005 Lecture Notes #8 EECS 595 / LING 541 / SI 661 Natural Language Processing.

Similar presentations


Presentation on theme: "Fall 2005 Lecture Notes #8 EECS 595 / LING 541 / SI 661 Natural Language Processing."— Presentation transcript:

1 Fall 2005 Lecture Notes #8 EECS 595 / LING 541 / SI 661 Natural Language Processing

2 Evaluation of NLP systems

3 The classical pipeline (for supervised learning) Training set/dev set/test set Dumb baseline Intelligent baseline Your algorithm Human ceiling Accuracy/precision/recall Multiple references Statistical significance

4 Special cases Document retrieval systems Part of speech tagging Parsing –Labeled recall –Labeled precision –Crossing brackets

5 Word classes and part-of-speech tagging

6 Part of speech tagging Problems: transport, object, discount, address More problems: content French: est, président, fils “Book that flight” – what is the part of speech associated with “book”? POS tagging: assigning parts of speech to words in a text. Three main techniques: rule-based tagging, stochastic tagging, transformation-based tagging

7 Rule-based POS tagging Use dictionary or FST to find all possible parts of speech Use disambiguation rules (e.g., ART+V) Typically hundreds of constraints can be designed manually

8 Example in French ^ beginning of sentence La rf b nms u article teneur nfs nms noun feminine singular Moyenne jfs nfs v1s v2s v3s adjective feminine singular en p a b preposition uranium nms noun masculine singular des p r preposition rivi`eres nfp noun feminine plural, x punctuation bien_que cs subordinating conjunction délicate jfs adjective feminine singular À p preposition calculer v verb

9 Sample rules BS3 BI1: A BS3 (3rd person subject personal pronoun) cannot be followed by a BI1 (1st person indirect personal pronoun). In the example: ``il nous faut'' ({\it we need}) - ``il'' has the tag BS3MS and ``nous'' has the tags [BD1P BI1P BJ1P BR1P BS1P]. The negative constraint ``BS3 BI1'' rules out ``BI1P'', and thus leaves only 4 alternatives for the word ``nous''. N K: The tag N (noun) cannot be followed by a tag K (interrogative pronoun); an example in the test corpus would be: ``... fleuve qui...'' (...river, that...). Since ``qui'' can be tagged both as an ``E'' (relative pronoun) and a ``K'' (interrogative pronoun), the ``E'' will be chosen by the tagger since an interrogative pronoun cannot follow a noun (``N''). R V:A word tagged with R (article) cannot be followed by a word tagged with V (verb): for example ``l' appelle'' (calls him/her). The word ``appelle'' can only be a verb, but ``l''' can be either an article or a personal pronoun. Thus, the rule will eliminate the article tag, giving preference to the pronoun.

10 Confusion matrix INJJNNNNPRBVBDVBN IN-.2.7 JJ.2-3.32.11.7.22.7 NN8.7-.2 NNP.23.34.1-.2 RB2.22.0.5- VBD.3.5-4.4 VBN2.82.6- Most confusing: NN vs. NNP vs. JJ, VBD vs. VBN vs. JJ

11 HMM Tagging T = argmax P(T|W), where T=t 1,t 2,…,t n By Bayes’s theorem: P(T|W) = P(T)P(W|T)/P(W) Thus we are attempting to choose the sequence of tags that maximizes the rhs of the equation P(W) can be ignored P(T)P(W|T) = ? P(T) is called the prior, P(W|T) is called the likelihood.

12 HMM tagging (cont’d) P(T)P(W|T) = P(w i |w 1 t 1 …w i-1 t i-1 t i )P(t i |t 1 …t i-2 t i-1 ) Simplification 1: P(W|T) = P(w i |t i ) Simplification 2: P(T)= P(t i |t i-1 ) T = argmax P(T|W) = argmax P(w i |t i ) P(t i |t i-1 )    

13 Estimates P(NN|DT) = C(DT,NN)/C(DT)=56509/116454 =.49 P(is|VBZ = C(VBZ,is)/C(VBZ)=10073/21627=.47

14 Example Secretariat/NNP is/VBZ expected/VBN to/TO race/VB tomorrow/NR People/NNS continue/VBP to/TO inquire/VB the/AT reason/NN for/IN the/AT race/NN for/IN outer/JJ space/NN TO: to+VB (to sleep), to+NN (to school)

15 Example NNPVBZVBNTO VB NR Secretariatisexpectedracetomorrowto NNPVBZVBNTO NN NR Secretariatisexpectedracetomorrowto

16 Example (cont’d) P(NN|TO) =.00047 P(VB|TO) =.83 P(race|NN) =.00057 P(race|VB) =.00012 P(NR|VB) =.0027 P(NR|NN) =.0012 P(VB|TO)P(NR|VB)P(race|VB) =.00000027 P(NN|TO)P(NR|NN)P(race|NN) =.00000000032

17 Decoding Finding what sequence of states is the source of a sequence of observations Viterbi decoding (dynamic programming) – finding the optimal sequence of tags Input: HMM and sequence of words, output: sequence of states

18 Transformation-based learning P(NN|race) =.98 P(VB|race) =.02 Change NN to VB when the previous tag is TO Types of rules: –The preceding (following) word is tagged z –The word two before (after) is tagged z –One of the two preceding (following) words is tagged z –One of the three preceding (following) words is tagged z –The preceding word is tagged z and the following word is tagged w


Download ppt "Fall 2005 Lecture Notes #8 EECS 595 / LING 541 / SI 661 Natural Language Processing."

Similar presentations


Ads by Google