Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.

Similar presentations


Presentation on theme: "Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19."— Presentation transcript:

1 Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow rambow@cs.columbia.edu September 17 & 19

2 Admin Stuff (Homework) Ex 2.6: every possible time expression, not just those patterns that are listed in book! Email Ani if can’t access links on homework page

3 What is Syntax? Study of structure of language Specifically, goal is to relate surface form (e.g., interface to phonological component) to semantics (e.g., interface to semantic component) Morphology, phonology, semantics farmed out (mainly), issue is word order and structure Representational device is tree structure

4 What About Chomsky? At birth of formal language theory (comp sci) and formal linguistics Major contribution: syntax is cognitive reality Humans able to learn languages quickly, but not all languages  universal grammar is biological Goal of syntactic study: find universal principles and language-specific parameters Specific Chomskyan theories change regularly These ideas adopted by almost all contemporary syntactic theories

5 Types of Linguistic Activity Descriptive: provide account of syntax of a language; often good enough for NLP engineering work Explanatory: provide principles-and- parameters style account of syntax of (preferably) several languages Prescriptive: “prescriptive linguistics” is an oxymoron

6 Structure in Strings Some words: the a small nice big very boy girl sees likes Some good sentences: o the boy likes a girl o the small girl likes the big girl o a very small nice boy sees a very nice boy Some bad sentences: o *the boy the girl o *small boy likes nice girl Can we find subsequences of words (constituents) which in some way behave alike?

7 Structure in Strings Proposal 1 Some words: the a small nice big very boy girl sees likes Some good sentences: o (the) boy (likes a girl) o (the small) girl (likes the big girl) o (a very small nice) boy (sees a very nice boy) Some bad sentences: o *(the) boy (the girl) o *(small) boy (likes the nice girl)

8 Structure in Strings Proposal 2 Some words: the a small nice big very boy girl sees likes Some good sentences: o (the boy) likes (a girl) o (the small girl) likes (the big girl) o (a very small nice boy) sees (a very nice boy) Some bad sentences: o *(the boy) (the girl) o *(small boy) likes (the nice girl) This is better proposal: fewer types of constituents

9 More Structure in Strings Proposal 2 -- ctd Some words: the a small nice big very boy girl sees likes Some good sentences: o ((the) boy) likes ((a) girl) o ((the) (small) girl) likes ((the) (big) girl) o ((a) ((very) small) (nice) boy) sees ((a) ((very) nice) girl) Some bad sentences: o *((the) boy) ((the) girl) o *((small) boy) likes ((the) (nice) girl)

10 From Substrings to Trees (((the) boy) likes ((a) girl)) boy the likes girl a

11 Node Labels? ((the) boy) likes ((a) girl) Deliberately chose constituents so each one has one non-bracketed word: the head Group words by distribution of constituents they head (part-of-speech, POS): o Noun (N), verb (V), adjective (Adj), adverb (Adv), determiner (Det) Category of constituent: XP, where X is POS o NP, S, AdjP, AdvP, DetP

12 Node Labels (((the/ Det ) boy/ N ) likes/ V ((a/ Det ) girl/ N )) boy the likes girl a DetP NP DetP S

13 Word Classes (=POS) Heads of constituents fall into distributionally defined classes Additional support for class definition of word class comes from morphology

14 Some Points on POS Tag Sets Possible basic set: N, V, Adj, Adv, P, Det, Aux, Comp, Conj 2 supertypes: open- and closed-class o Open: N, V, Adj, Adv o Closed: P, Det, Aux, Comp, Conj Many subtypes: o eats/V  eat/VB, eat/VBP, eats/VBZ, ate/VBD, eaten/VBN, eating/VBG, o Reflect morphological form & syntactic function

15 More on POS Tag Sets: Problematic Cases o adjective or participle?  a seen event, a rarely seen event, an unseen event, an event rarely seen in Idaho, *a rarely seen in Idaho event o noun or adjective?  a child seat, *a very child seat, *this seat is child o preposition or particle?  he threw out the garbage, he threw the garbage out, he threw the garbage out the door, *he threw the garbage the door out

16 The Penn TreeBank POS Tag Set Penn Treebank: hand-annotated corpus of Wall Street Journal, 1M words 46 tags Some particularities: o to /TO not disambiguated o Auxiliaries and verbs not distinguished

17 Part-of-Speech Tagging Problem: assign POS tags to words in a sentence o fruit flies like a banana

18 Part-of-Speech Tagging Problem: assign POS tags to words in a sentence o fruit/N flies/N like/V a/DET banana/N

19 Part-of-Speech Tagging Problem: assign POS tags to words in a sentence o fruit/N flies/N like/V a/DET banana/N o fruit/N flies/V like/P a/DET banana/N

20 Part-of-Speech Tagging Problem: assign POS tags to words in a sentence o fruit/N flies/N like/V a/DET banana/N o fruit/N flies/V like/P a/DET banana/N 2nd example: o the/Det flies/N like/V a/Det banana/N

21 Part-of-Speech Tagging Problem: assign POS tags to words in a sentence o fruit/N flies/N like/V a/DET banana/N o fruit/N flies/V like/P a/DET banana/N 2nd example: o the/Det flies/N like/V a/Det banana/N Useful for parsing, but also partial parsing/chunking, IR, etc

22 Approaches to POS Tagging Hand-written rules Statistical approaches Machine learning of rules (e.g., decision trees or transformation-based learning) Role of corpus: o No corpus (hand-written) o No machine learning (hand-written) o Unsupervised learning from raw data o Supervised learning from annotated data

23 Methodological Points When looking at problem in NLP, need to know how to evaluate Possible evaluations: o against annotated naturally occurring corpus o against hand-crafted corpus o against human task performance Need to know baseline: how well does simple method do? Need to do topline: given evaluation, what is meaninful best result?

24 Methodological Points When looking at problem in NLP, need to know how to evaluate Possible evaluations: o against naturally occurring annotated corpus (POS tagging: 96%) o against hand-crafted corpus o against human task performance Need to know baseline: how well does simple method do? (POS tagging: 91%) Need to do topline: given evaluation, what is meaninful best result? (POS tagging: 97%)

25 Reminder: Bayes’s Law 10 students, of which 4 women & 3 smokers Probability that randomly chosen student is woman: p(w) = 0.4 Prob that rcs is smoker: p(s) = 0.3 Prob that rcs is female smoker: p(s,w) = 0.2 women smokers

26 Reminder: Bayes’s Law 10 students, of which 4 women & 3 smokers Prob that rcs is female smoker: p(s,w) = 0.2 women smokers Prob that a rc woman is a smoker: p(s|w) = 0.5 Probability that randomly chosen student is woman: p(w) = 0.4

27 Reminder: Bayes’s Law 10 students, of which 4 women & 3 smokers Prob that rcs is female smoker: p(s,w) = 0.2 = p(w) p(s|w) women smokers Prob that a rc woman is a smoker: p(s|w) = 0.5 Probability that randomly chosen student is woman: p(w) = 0.4

28 Reminder: Bayes’s Law 10 students, of which 4 women & 3 smokers Prob that rcs is smoker: p(s) = 0.3 Prob that rcs is female smoker: p(s,w) = 0.2 = p(s) p(w|s) women smokers Prob that rc smoker is a woman: p(w|s) = 0.66

29 Reminder: Bayes’s Law (end) p(s,w) = p(s) p(w|s) p(s,w) = p(w) p(s|w) So: p(s) = p(w) p(s|w) / p(w|s) p(s|w) = p(s) p(w|s) / p(w) prior probabilitylikelihood

30 Statistical POS Tagging Want to choose most likely string of tags (T), given the string of words (W) W = w 1, w 2, …, w n T = t 1, t 2, …, t n I.e., want argmax T p(T | W) Problem: sparse data

31 Statistical POS Tagging (ctd) p(T|W) = p(T,W) / p(W) = p(W|T) p (T) / p(W) argmax T p(T|W) = argmax T p(W|T) p (T) / p(W) = argmax T p(W|T) p (T)

32 Statistical POS Tagging (ctd) p(T) = p(t 1, t 2, …, t n-1, t n ) = p(t n | t 1, …, t n-1 ) p (t 1, …, t n-1 ) = p(t n | t 1, …, t n-1 ) p(t n-1 | t 1, …, t n-2 ) p (t 1, …, t n-2 ) =  i p(t i | t 1, …, t i-1 )   i p(t i | t i-2, t i-1 )  trigram (n-gram)

33 Statistical POS Tagging (ctd) p(W|T) = p(w 1, w 2, …, w n | t 1, t 2, …, t n ) =  i p(w i | w 1, …, w i-1, t 1, t 2, …, t n )   i p(w i | t i )

34 Statistical POS Tagging (ctd) argmax T p(T|W) = argmax T p(W|T) p (T)  argmax T  i p(w i | t i ) p(t i | t i-2, t i-1 ) Relatively easy to get data for parameter estimation (next slide) But: need smoothing for unseen words Easy to determine the argmax (Viterbi algorithm in time linear in sentence length)

35 Probability Estimation for trigram POS Tagging Maximum-Likelihood Estimation p’ ( w i | t i ) = c( w i, t i ) / c( t i ) p’ ( t i | t i-2, t i-1 ) = c( t i, t i-2, t i-1 ) / c( t i-2, t i-1 )

36 Statistical POS Tagging Method common to many tasks in speech & NLP “Noisy Channel Model”, Hidden Markov Model


Download ppt "Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19."

Similar presentations


Ads by Google