Presentation is loading. Please wait.

# Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19.

## Presentation on theme: "Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19."— Presentation transcript:

Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19

Admin Stuff These slides available at o http://www.cs.columbia.edu/~rambow/teaching.html http://www.cs.columbia.edu/~rambow/teaching.html For Eliza in homework, you can use a tagger or chunker, if you want – details at: o http://www.cs.columbia.edu/~ani/cs4705.html http://www.cs.columbia.edu/~ani/cs4705.html Special office hours (Ani): today after class, tomorrow at 10am in CEPSR 721

Statistical POS Tagging Want to choose most likely string of tags (T), given the string of words (W) W = w 1, w 2, …, w n T = t 1, t 2, …, t n I.e., want argmax T p(T | W) Problem: sparse data

Statistical POS Tagging (ctd) p(T|W) = p(T,W) / p(W) = p(W|T) p (T) / p(W) argmax T p(T|W) = argmax T p(W|T) p (T) / p(W) = argmax T p(W|T) p (T)

Statistical POS Tagging (ctd) p(T) = p(t 1, t 2, …, t n-1, t n ) = p(t n | t 1, …, t n-1 ) p (t 1, …, t n-1 ) = p(t n | t 1, …, t n-1 ) p(t n-1 | t 1, …, t n-2 ) p (t 1, …, t n-2 ) =  i p(t i | t 1, …, t i-1 )   i p(t i | t i-2, t i-1 )  trigram (n-gram)

Statistical POS Tagging (ctd) p(W|T) = p(w 1, w 2, …, w n | t 1, t 2, …, t n ) =  i p(w i | w 1, …, w i-1, t 1, t 2, …, t n )   i p(w i | t i )

Statistical POS Tagging (ctd) argmax T p(T|W) = argmax T p(W|T) p (T)  argmax T  i p(w i | t i ) p(t i | t i-2, t i-1 ) Relatively easy to get data for parameter estimation (next slide) But: need smoothing for unseen words Easy to determine the argmax (Viterbi algorithm in time linear in sentence length)

Probability Estimation for trigram POS Tagging Maximum-Likelihood Estimation p’ ( w i | t i ) = c( w i, t i ) / c( t i ) p’ ( t i | t i-2, t i-1 ) = c( t i, t i-2, t i-1 ) / c( t i-2, t i-1 )

Statistical POS Tagging Method common to many tasks in speech & NLP “Noisy Channel Model”, Hidden Markov Model

Back to Syntax (((the/ Det ) boy/ N ) likes/ V ((a/ Det ) girl/ N )) boy the likes girl a DetP NP DetP S Phrase-structure tree nonterminal symbols = constituents terminal symbols = words

Phrase Structure and Dependency Structure likes/ V boy/ N girl/ N the/ Det a/ Det boy the likes girl a DetP NP DetP S

Types of Dependency likes/ V boy/ N girl/ N a/ Det small/ Adj the/ Det very/ Adv sometimes/ Adv Obj Subj Adj(unct) Fw Adj

Grammatical Relations Types of relations between words o Arguments: subject, object, indirect object, prepositional object o Adjuncts: temporal, locative, causal, manner, … o Function Words

Subcategorization List of arguments of a word (typically, a verb), with features about realization (POS, perhaps case, verb form etc) In canonical order Subject-Object- IndObj Example: o like: N-N, N-V(to-inf) o see: N, N-N, N-N-V(inf) Note: J&M talk about subcategorization only within VP

Where is the VP? boy the likes girl a DetP NP DetP S boy the likes DetP NP girl a NP DetP S VP

Where is the VP? Existence of VP is a linguistic (empirical) claim, not a methodological claim Semantic evidence??? Syntactic evidence o VP-fronting (and quickly clean the carpet he did! ) o VP-ellipsis (He cleaned the carpets quickly, and so did she ) o Can have adjuncts before and after VP, but not in VP (He often eats beans, *he eats often beans ) Note: in all right-branching structures, issue is different again

Penn Treebank, Again Syntactically annotated corpus (phrase structure) PTB is not naturally occurring data! Represents a particular linguistic theory (but a fairly “vanilla” one) Particularities o Very indirect representation of grammatical relations (need for head percolation tables) o Completely flat structure in NP (brown bag lunch, pink-and-yellow child seat ) o Has flat Ss, flat VPs

Context-Free Grammars Defined in formal language theory (comp sci) Terminals, nonterminals, start symbol, rules String-rewriting system Start with start symbol, rewrite using rules, done when only terminals left

CFG: Example Rules o S  NP VP o VP  V NP o NP  Det N | AdjP NP o AdjP  Adj | Adv AdjP o N  boy | girl o V  sees | likes o Adj  big | small o Adv  very o Det  a | the the very small boy likes a girl

Derivations of CFGs String rewriting system: we derive a string (=derived structure) But derivation history represented by phrase-structure tree (=derivation structure)!

Grammar Equivalence and Normal Form Can have different grammars that generate same set of strings (weak equivalence) Can have different grammars that have same set of derivation trees (string equivalence)

Nobody Uses CFGs Only (Except Intro NLP Courses) o All major syntactic theories (Chomsky, LFG, HPSG, TAG-based theories) represent both phrase structure and dependency, in one way or another o All successful parsers currently use statistics about phrase structure and about dependency

Massive Ambiguity of Syntax For a standard sentence, and a grammar with wide coverage, there are 1000s of derivations! Example: o The large head master told the man that he gave money and shares in a letter on Wednesday

Some Syntactic Constructions: Wh -Movement

Control

Raising

Download ppt "Introduction to Syntax, with Part-of-Speech Tagging Owen Rambow September 17 & 19."

Similar presentations

Ads by Google