# Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05.

## Presentation on theme: "Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05."— Presentation transcript:

Probabilistic Parsing Ling 571 Fei Xia Week 5: 10/25-10/27/05

Outline Lexicalized CFG (Recap) Hw5 and Project 2 Parsing evaluation measures: ParseVal Collin’s parser TAG Parsing summary

Lexicalized CFG recap

Important equations

Lexicalized CFG Lexicalized rules: Sparse data problem –First generate the head –Then generate the unlexicalized rule

Lexicalized models

An example he likes her

An example he likes her

Estimate parameters

Building a statistical tool Design a model: –Objective function: generative model vs. discriminative model –Decomposition: independence assumption –The types of parameters and parameter size Training: estimate model parameters –Supervised vs. unsupervised –Smoothing methods Decoding:

Team Project 1 (Hw5) Form a team: program language, schedule, expertise, etc. Understand the lexicalized model Design the training algorithm Work out the decoding (parsing) algorithm: augment CYK algorithm. Illustrate the algorithms with a real example.

Team Project 2 Task: parse real data with a real grammar extracted from a treebank. Parser: PCFG or lexicalized PCFG Training data: English Penn Treebank Section 02-21 Development data: section 00

Team Project 2 (cont) Hw6: extract PCFG from the treebank Hw7: make sure your parser works given real grammar and real sentences; measure parsing performance Hw8: improve parsing results Hw10: write a report and give a presentation

Parsing evaluation measures

Evaluation of parsers: ParseVal Labeled recall: Labeled precision: Labeled F-measure: Complete match: % of sents where recall and precision are 100% Average crossing: # of crossing per sent No crossing: % of sents which have no crossing.

An example Gold standard: (VP (V saw) (NP (Det the) (N man)) (PP (P with) (NP (Det a) (N telescope)))) Parser output: (VP (V saw) (NP (NP (Det the) (N man)) (PP (P with) (NP (Det a) (N telescope)))))

ParseVal measures Gold standard: (VP, 1, 6), (NP, 2, 3), (PP, 4, 6), (NP, 5, 6) System output: (VP, 1, 6), (NP, 2, 6), (NP, 2, 3), (PP, 4, 6), (NP, 5, 6) Recall=4/4, Prec=4/5, crossing=0

A different annotation Gold standard: (VP (V saw) (NP (Det the) (N’ (N man)) (PP (P with) (NP (Det a) (N’ (N telescope))))) Parser output: (VP (V saw) (NP (Det the) (N’ (N man) (PP (P with) (NP (Det a) (N’ (N telescope)))))))

ParseVal measures (cont) Gold standard: (VP, 1, 6), (NP, 2, 3), (N’, 3, 3), (PP, 4, 6), (NP, 5, 6), (N’, 6,6) System output: (VP, 1, 6), (NP, 2, 6), (N’, 3, 6), (PP, 4, 6), (NP, 5, 6), (N’, 6, 6) Recall=4/6, Prec=4/6, crossing=1

EVALB A tool that calculates ParseVal measures To run it: evalb –p parameter_file gold_file system_output A copy is available in my dropbox You will need it for Team Project 2

Summary of Parsing evaluation measures ParseVal is the widely used: F-measure is the most important The results depend on annotation style EVALB is a tool that calculates ParseVal measures Other measures are used too: e.g., accuracy of dependency links

History-based models

History-based approaches maps (T, S) into a decision sequence Probability of tree T for sentence S is:

History-based models (cont) PCFGs can be viewed as a history-based model There are other history-based models –Magerman’s parser (1995) –Collin’s parsers (1996, 1997, ….) –Charniak’s parsers (1996,1997,….) –Ratnaparkhi’s parser (1997)

Collins’ models Model 1: Generative model of (Collins, 1996) Model 2: Add complement/adjunct distinction Model 3: Add wh-movement

Model 1 First generate the head constituent label Then generate left and right dependents

Model 1(cont)

An example Sentence: Last week Marks bought Brooks.

Model 2 Generate a head label H Choose left and right subcat frames Generate left and right arguments Generate left and right modifiers

An example

Model 3 Add Trace and wh-movement Given that the LHS of a rule has a gap, there are three ways to pass down the gap –Head: S(+gap)  NP VP(+gap) –Left: S(+gap)  NP(+gap) VP –Right: SBAR(that)(+gap)  WHNP(that) S(+gap)

Parsing results LRLP Model 187.4%88.1% Model 288.1%88.6% Model 388.1%88.6%

TAG TAG basics: Extension of LTAG –Lexicalized TAG (LTAG) –Synchronous TAG (STAG) –Multi-component TAG (MCTAG) –….

TAG basics A tree-rewriting formalism (Joshi et. al, 1975) It can generate mildly context-sensitive languages. The primitive elements of a TAG are elementary trees. Elementary trees are combined by two operations: substitution and adjoining. TAG has been used in –parsing, semantics, discourse, etc. –Machine translation, summarization, generation, etc.

Two types of elementary trees VP ADVP ADV still VP* Initial tree:Auxiliary tree: S NP VP VNP draft

Substitution operation

They draft policies

They still draft policies

Derivation tree Elementary trees Derived tree Derivation tree

Derived tree vs. derivation tree The mapping is not 1-to-1. Finding the best derivation is not the same as finding the best derived tree.

S V do S* they PN NP Wh-movement What do they draft ? i S i NP S VP V NP draft N what do PN they i i S NP S V S VP VNP draft what NP N

What does John think they draft ? S V does S* S NP VP V S* think Long-distance wh-movement S S NP VP V NP draft i i does think i i S NP S VS VP S NP VP V draft NP what John they

Who did you have dinner with? have S NP VP NP V S S* PN who iPP P NP with VP VP* i S NP PN whoPP P NP with VP have S NP V i i

TAG extension Lexicalized TAG (LTAG) Synchronized TAG (STAG) Multi-component TAG (MCTAG) ….

STAG The primitive elements in STAG are elementary tree pairs. Used for MT

Summary of TAG A formalism beyond CFG Primitive elements are trees, not rules Extended domain of locality Two operations: substitution and adjoining Parsing algorithm: Statistical parser for TAG Algorithms for extracting TAG from treebanks.

Parsing summary

Types of parsers Phrase structure vs. dependency tree Statistical vs. rule-based Grammar-based or not Supervised vs. unsupervised Our focus:  Phrase structure  Mainly statistical  Mainly Grammar-based: CFG, TAG  Supervised

Grammars Chomsky hierarchy: –Unstricted grammar (type 0) –Context-sensitive grammar –Context-free grammar –Regular grammar  Human languages are beyond context-free Other formalism –HPSG, LFG –TAG –Dependency grammars

Parsing algorithm for CFG Top-down Bottom-up Top-down with bottom-up filter Earley algorithm CYK algorithm –Requiring CFG to be in CNF –Can be augmented to deal with PCFG, lexicalized CFG, etc.

Extensions of CFG PCFG: find the most likely parse trees Lexicalized CFG: –use less strong independence assumption –Account for certain types of lexical and structural dependency

Beyond CFG History-based models –Collins’ parsers TAG –Tree-writing –Mildly context-sensitive grammar –Many extensions: LTAG, STAG, …

Statistical approach Modeling –Choose the objective function –Decompose the function: Common equations: joint, conditional, marginal probabilities Independency assumptions Training: –Supervised vs. unsupervised –Smoothing Decoding –Dynamic programming –Pruning

Evaluation of parsers Accuracy: ParseVal Robustness Resources needed Efficiency Richness

Other things Converting into CNF: –CFG –PCFG –Lexicalized CFG Treebank annotation –Tagset: syntactic labels, POS tag, function tag, empty categories –Format: indentation, brackets