Giorgio Satta University of Padua Parsing Techniques for Lexicalized Context-Free Grammars* * Joint work with : Jason Eisner, Mark-Jan Nederhof
Summary Part I: Lexicalized Context-Free Grammars –motivations and definition –relation with other formalisms Part II: standard parsing –TD techniques –BU techniques Part III: novel algorithms –BU enhanced –TD enhanced
Lexicalized grammars each rule specialized for one or more lexical items advantages over non-lexicalized formalisms: –express syntactic preferences that are sensitive to lexical words –control word selection
Syntactic preferences adjuncts Workers [ dumped sacks ] into a bin *Workers dumped [ sacks into a bin ] N-N compound [ hydrogen ion ] exchange *hydrogen [ ion exchange ]
Word selection lexical Nora convened the meeting ?Nora convened the party semantics Peggy solved two puzzles ?Peggy solved two goats world knowledge Mary shelved some books ?Mary shelved some cooks
Lexicalized CFG Motivations : study computational properties common to generative formalisms used in state-of-the-art real-world parsers develop parsing algorithm that can be directly applied to these formalisms
Lexicalized CFG dumped sacks into a bin VP[dump][sack] P[into] N[bin]Det[a] NP[bin] PP[into][bin] V[dump] VP[dump][sack] NP[sack] N[sack]
Lexicalized CFG Context-free grammars with : alphabet V T : –dumped, sacks, into,... delexicalized nonterminals V D : –NP, VP,... nonterminals V N : –NP[sack], VP[dump][sack],...
Lexicalized CFG Delexicalized nonterminals encode : word sense N, V,... grammatical features number, tense,... structural information bar level, subcategorization state,... other constraints distribution, contextual features,...
Lexicalized CFG productions have two forms : –V[dump] dumped –VP[dump][sack] VP[dump][sack] PP[into][bin] lexical elements in lhs inherited from rhs
Lexicalized CFG production is k-lexical : k occurrences of lexical elements in rhs –NP[bin] Det[a] N[bin] is 2-lexical –VP[dump][sack] VP[dump][sack] PP[into][bin] is 4-lexical
LCFG at work 2-lexical CFG –Alshawi 1996 : Head Automata –Eisner 1996: Dependency Grammars –Charniak 1997 : CFG –Collins 1997 : generative model
LCFG at work Probabilistic LCFG G is strongly equivalent to probabilistic grammar G iff mapping between derivations each direction is a homomorphism derivation probabilities are preserved
LCFG at work From Charniak 1997 to 2-lex CFG : NP S [profits] N NP [profits]ADJ NP [corporate] Pr 1 (corporate | ADJ, NP, profits) Pr 1 (profits | N, NP, profits) Pr 2 ( NP ADJ N | NP, S, profits)
LCFG at work From Collins 1997 (Model #2) to 2-lex CFG : Pr left (NP, IBM | VP, S, bought, left, {NP-C}) N, [IBM] VP S, {NP-C}, left, S [bought] VP S, {}, left, S [bought]
LCFG at work Major Limitation : Cannot capture relations involving lexical items outside actual constituent (cfr. history based models) V[d 0 ] d1d1 NP[d 1 ] d2d2 PP[d 2 ][d 3 ] d0d0 cannot look at d 0 when computing PP attachment NP[d 1 ][d 0 ]
LCFG at work lexicalized context-free parsers that are not LCFG : –Magerman 1995: Shift-Reduce+ –Ratnaparkhi 1997: Shift-Reduce+ –Chelba & Jelinek 1998: Shift-Reduce+ –Hermjakob & Mooney 1997: LR
Related work Other frameworks for the study of lexicalized grammars : Carroll & Weir 1997 : Stochastic Lexicalized Grammars; emphasis on expressiveness Goodman 1997 : Probabilistic Feature Grammars; emphasis on parameter estimation
Summary Part I: Lexicalized Context-Free Grammars –motivations and definition –relation with other formalisms Part II: standard parsing –TD techniques –BU techniques Part III: novel algorithms –BU enhanced –TD enhanced
Standard Parsing standard parsing algorithms (CKY, Earley, LC,...) run on LCFG in time O ( |G | |w | 3 ) for 2-lex CFG (simplest case) |G | grows with |V D | 3 |V T | 2 !! Goal : Get rid of |V T | factors
Standard Parsing: TD Result (to be refined) : Algorithms satisfying the correct-prefix property are unlikely to run on LCFG in time independent of V T
Correct-prefix property Earley, Left-Corner, GLR,... : w S left-to-right reading position
On-line parsing No grammar precompilation (Earley) : Parser G w Output
Standard Parsing: TD Result : On-line parsers with correct-prefix property cannot run in time O ( f(|V D |, |w |) ), for any function f
Off-line parsing Grammar is precompiled (Left-Corner, LR) : Parser G PreComp C(G ) w Output Parser
Standard Parsing: TD Fact : We can simulate a nondeterministic FA M on w in time O ( |M | |w | ) Conjecture : Fix a polynomial p. We cannot simulate M on w in time p( |w | ) unless we spend exponential time in precompiling M
Standard Parsing: TD Assume our conjecture holds true Result : Off-line parsers with correct-prefix property cannot run in time O ( p(|V D |, |w |) ), for any polynomial p, unless we spend exponential time in precompiling G
Standard Parsing: BU Common practice in lexicalized grammar parsing : select productions that are lexically grounded in w parse BU with selected subset of G Problem : Algorithm removes |V T | factors but introduces new |w | factors !!
Standard Parsing: BU Running time is O ( |V D | 3 |w | 5 ) !! ikj B[d 1 ]C[d 2 ] A[d 2 ] d1d1 d2d2 Time charged : i, k, j |w | 3 A, B, C |V D | 3 d 1, d 2 |w | 2
Standard BU : Exhaustive
Standard BU : Pruning
Summary Part I: Lexicalized Context-Free Grammars –motivations and definition –relation with other formalisms Part II: standard parsing –TD techniques –BU techniques Part III: novel algorithms –BU enhanced –TD enhanced
BU enhanced Result : Parsing with 2-lex CFG in time O ( |V D | 3 |w | 4 ) Remark : Result transfers to models in Alshawi 1996, Eisner 1996, Charniak 1997, Collins 1997 Remark : Technique extends to improve parsing of Lexicalized-Tree Adjoining Grammars
Algorithm #1 Basic step in naive BU : j C[d 2 ] d2d2 ik B[d 1 ] d1d1 Idea : Indices d 1 and j can be processed independently A[d 2 ]
Algorithm #1 Step 1 Step 2 i k B[d 1 ] d1d1 A[d 2 ] C[d 2 ] d2d2 A[d 2 ] i d2d2 j i k A[d 2 ] C[d 2 ] d2d2 j i A[d 2 ] C[d 2 ] d2d2 k
BU enhanced Upper bound provided by Algorithm #1 : O (|w | 4 ) Goal : Can we go down to O (|w | 3 ) ?
Spine last week AdvP[week] NP[IBM] NP[Lotus] Lotus IBM S[buy] VP[buy] V[buy] bought The spine of a parse tree is the path from the root to the roots head S[buy] VP[buy] V[buy] bought
The spine projection is the yield of the sub- tree composed by the spine and all its sibling nodes Spine projection last week AdvP[week] S[buy] NP[IBM] S[buy] VP[buy] V[buy]NP[Lotus] Lotusbought IBM AdvP[week] NP[IBM] NP[Lotus] NP[IBM] bought NP[Lotus] AdvP[week]
Split Grammars Split spine projections at head : Problem : how much information do we need to store in order to construct new grammatical spine projections from splits ? ??
Split Grammars Fact : Set of spine projections is a linear context- free language Definition : 2-lex CFG is split if set of spine projections is a regular language Remark : For split grammars, we can recombine splits using finite information
Split Grammars Non-split grammar : unbounded # of dependencies between left and right dependents of head S[d] AdvP[a] S[d]AdvP[b] S 1 [d] linguistically unattested and unlikely AdvP[a] S[d]AdvP[b] S 1 [d]
Split Grammars Split grammar : finite # of dependencies between left and right dependents of lexical head S[d] AdvP[a] S[d]AdvP[b] S[d]
Split Grammars Precompile grammar such that splits are derived separately AdvP[week] NP[IBM] NP[Lotus] S[buy] VP[buy] V[buy] bought AdvP[week] NP[IBM] NP[Lotus] S[buy] r 3 [buy] r 2 [buy] r 1 [buy] bought r 3 [buy] is a split symbol
Split Grammars t : max # of states per spine automaton g : max # of split symbols per spine automaton (g < t ) m : # of delexicalized nonterminals thare are maximal projections
BU enhanced Result : Parsing with split 2-lexical CFG in time O (t 2 g 2 m 2 |w | 3 ) Remark: Models in Alshawi 1996, Charniak 1997 and Collins 1997 are not split
Algorithm #2 Idea : recognize left and right splits separately collect head dependents one split at a time B[d] d d s s d
NP[IBM] bought NP[Lotus] AdvP[week] Algorithm #2
Step 1 B[d 1 ] d1d1 s1s1 k r1r1 d2d2 s2s2 r2r2 Step 2 B[d 1 ] d1d1 s1s1 d2d2 s2s2 r2r2 d1d1 s1s1 d2d2 s2s2 r2r2 i r2r2 d2d2 i s2s2
Algorithm #2 : Exhaustive
Algorithm #2 : Pruning
Related work Cubic time algorithms for lexicalized grammars : Sleator & Temperley 1991 : Link Grammars Eisner 1997 : Bilexical Grammars (improved by transfer of Algorithm #2)
TD enhanced Goal : Introduce TD prediction for 2-lexical CFG parsing, without |V T | factors Remark : Must relax left-to-right parsing (because of previous results)
TD enhanced Result : TD parsing with 2-lex CFG in time O ( |V D | 3 |w | 4 ) Open : O ( |w | 3 ) extension to split grammars
TD enhanced Strongest version of correct-prefix property : S w reading position
Data Structures Prods with lhs A[d] : A[d] X 1 [d 1 ] X 2 [d 2 ] A[d] Y 1 [d 3 ] Y 2 [d 2 ] A[d] Z 1 [d 2 ] Z 2 [d 1 ] d3d3 d2d2 d1d1 Trie for A[d] : d1d1 d2d2
Data Structures Rightmost subsequence recognition by precompiling input w into a deterministic FA : a b a c c b a b b a a c
Algorithm #3 Item representation : i, j indicate extension of A[d] partial analysis A[d] i j k S k indicates rightmost possible position for completion of A[d] analysis
Algorithm #3 : Prediction Step 1 : find rightmost subsequence before k for some A[d 2 ] production Step 2 : make Earley prediction C[d 2 ] B[d 1 ] k i A[d 2 ] k d2d2 d1d1
Conclusions standard parsing techniques are not suitable for processing lexicalized grammars novel algorithms have been introduced using enhanced dynamic programming work to be done : extension to history-based models
The End Many thanks for helpful discussion to : Jason Eisner, Mark-Jan Nederhof