Building a statistical tool Design a model: –Objective function: generative model vs. discriminative model –Decomposition: independence assumption –The types of parameters and parameter size Training: estimate model parameters –Supervised vs. unsupervised –Smoothing methods Decoding:
Team Project 1 (Hw5) Form a team: program language, schedule, expertise, etc. Understand the lexicalized model Design the training algorithm Work out the decoding (parsing) algorithm: augment CYK algorithm. Illustrate the algorithms with a real example.
Team Project 2 Task: parse real data with a real grammar extracted from a treebank. Parser: PCFG or lexicalized PCFG Training data: English Penn Treebank Section 02-21 Development data: section 00
Team Project 2 (cont) Hw6: extract PCFG from the treebank Hw7: make sure your parser works given real grammar and real sentences; measure parsing performance Hw8: improve parsing results Hw10: write a report and give a presentation
Evaluation of parsers: ParseVal Labeled recall: Labeled precision: Labeled F-measure: Complete match: % of sents where recall and precision are 100% Average crossing: # of crossing per sent No crossing: % of sents which have no crossing.
An example Gold standard: (VP (V saw) (NP (Det the) (N man)) (PP (P with) (NP (Det a) (N telescope)))) Parser output: (VP (V saw) (NP (NP (Det the) (N man)) (PP (P with) (NP (Det a) (N telescope)))))
EVALB A tool that calculates ParseVal measures To run it: evalb –p parameter_file gold_file system_output A copy is available in my dropbox You will need it for Team Project 2
Summary of Parsing evaluation measures ParseVal is the widely used: F-measure is the most important The results depend on annotation style EVALB is a tool that calculates ParseVal measures Other measures are used too: e.g., accuracy of dependency links
History-based approaches maps (T, S) into a decision sequence Probability of tree T for sentence S is:
History-based models (cont) PCFGs can be viewed as a history-based model There are other history-based models –Magerman’s parser (1995) –Collin’s parsers (1996, 1997, ….) –Charniak’s parsers (1996,1997,….) –Ratnaparkhi’s parser (1997)
Collins’ models Model 1: Generative model of (Collins, 1996) Model 2: Add complement/adjunct distinction Model 3: Add wh-movement
Model 1 First generate the head constituent label Then generate left and right dependents
Model 3 Add Trace and wh-movement Given that the LHS of a rule has a gap, there are three ways to pass down the gap –Head: S(+gap) NP VP(+gap) –Left: S(+gap) NP(+gap) VP –Right: SBAR(that)(+gap) WHNP(that) S(+gap)
Parsing results LRLP Model 187.4%88.1% Model 288.1%88.6% Model 388.1%88.6%
TAG TAG basics: Extension of LTAG –Lexicalized TAG (LTAG) –Synchronous TAG (STAG) –Multi-component TAG (MCTAG) –….
TAG basics A tree-rewriting formalism (Joshi et. al, 1975) It can generate mildly context-sensitive languages. The primitive elements of a TAG are elementary trees. Elementary trees are combined by two operations: substitution and adjoining. TAG has been used in –parsing, semantics, discourse, etc. –Machine translation, summarization, generation, etc.
Two types of elementary trees VP ADVP ADV still VP* Initial tree:Auxiliary tree: S NP VP VNP draft
Derivation tree Elementary trees Derived tree Derivation tree
Derived tree vs. derivation tree The mapping is not 1-to-1. Finding the best derivation is not the same as finding the best derived tree.
S V do S* they PN NP Wh-movement What do they draft ? i S i NP S VP V NP draft N what do PN they i i S NP S V S VP VNP draft what NP N
What does John think they draft ? S V does S* S NP VP V S* think Long-distance wh-movement S S NP VP V NP draft i i does think i i S NP S VS VP S NP VP V draft NP what John they
Who did you have dinner with? have S NP VP NP V S S* PN who iPP P NP with VP VP* i S NP PN whoPP P NP with VP have S NP V i i
TAG extension Lexicalized TAG (LTAG) Synchronized TAG (STAG) Multi-component TAG (MCTAG) ….
STAG The primitive elements in STAG are elementary tree pairs. Used for MT
Summary of TAG A formalism beyond CFG Primitive elements are trees, not rules Extended domain of locality Two operations: substitution and adjoining Parsing algorithm: Statistical parser for TAG Algorithms for extracting TAG from treebanks.
Types of parsers Phrase structure vs. dependency tree Statistical vs. rule-based Grammar-based or not Supervised vs. unsupervised Our focus: Phrase structure Mainly statistical Mainly Grammar-based: CFG, TAG Supervised
Grammars Chomsky hierarchy: –Unstricted grammar (type 0) –Context-sensitive grammar –Context-free grammar –Regular grammar Human languages are beyond context-free Other formalism –HPSG, LFG –TAG –Dependency grammars
Parsing algorithm for CFG Top-down Bottom-up Top-down with bottom-up filter Earley algorithm CYK algorithm –Requiring CFG to be in CNF –Can be augmented to deal with PCFG, lexicalized CFG, etc.
Extensions of CFG PCFG: find the most likely parse trees Lexicalized CFG: –use less strong independence assumption –Account for certain types of lexical and structural dependency