Presentation is loading. Please wait.

Presentation is loading. Please wait.

IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Parsing with Soft and Hard Constraints on Dependency Length.

Similar presentations


Presentation on theme: "IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Parsing with Soft and Hard Constraints on Dependency Length."— Presentation transcript:

1 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Parsing with Soft and Hard Constraints on Dependency Length Jason Eisner and Noah A. Smith Department of Computer Science / Center for Language and Speech Processing Johns Hopkins University {jason,nasmith}@cs.jhu.edu

2 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Premise Many parsing consumers (IE, ASR, MT) will benefit more from fast, precise partial parsing than from full, deep parses that are slow to build. here at IWPT 2005: Burstein Sagae & Lavie Tsuruoka & Tsujii Dzikovska and Rosé...

3 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Outline of the Talk The Short Dependency Preference Review of split bilexical grammars (SBGs) – O ( n 3 ) algorithm Modeling dependency length –Experiments Constraining dependency length in a parser – O ( n ) algorithm, same grammar constant as SBG –Experiments Soft constraints Hard constraints

4 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length A word’s dependents (adjuncts, arguments) tend to fall near it in the string. Short-Dependency Preference

5 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length 1 1 1 3 length of a dependency ≈ surface distance

6 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length fraction of all dependencies 50% of English dependencies have length 1, another 20% have length 2, 10% have length 3... length

7 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Related Ideas Score parses based on what’s between a head and child (Collins, 1997; Zeman, 2004; McDonald et al., 2005) Assume short → faster human processing (Church, 1980; Gibson, 1998) “Attach low” heuristic for PPs (English) (Frazier, 1979; Hobbs and Bear, 1990) Obligatory and optional re-orderings (English) (see paper)

8 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Split Bilexical Grammars (Eisner, 1996; 2000) Bilexical : capture relationships between two words using rules of the form X [ p ] → Y [ p ] Z [ c ] X [ p ] → Z [ c ] Y [ p ] X [ w ] → w grammar size = N 3 |Σ| 2 Split : left children conditionally independent of right children, given parent (equivalent to split HAGs; Eisner and Satta, 1999) head

9 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Generating with SBGs 1.Start with left wall $ 2.Generate root w 0 3.Generate left children w - 1, w -2,..., w -ℓ from the FSA λ w 0 4.Generate right children w 1, w 2,..., w r from the FSA ρ w 0 5.Recurse on each w i for i in {-ℓ,..., -1, 1,..., r }, sampling α i (steps 2-4) 6.Return α ℓ...α -1 w 0 α 1...α r w0w0 w -1 w -2 w -ℓ wrwr w2w2 w1w1... w -ℓ.-1 $ λ w -ℓ λw0λw0 ρw0ρw0

10 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Naïve Recognition/Parsing Ittakestwototango ItIt takestwoto tango totakes O ( n 5 ) combinations ItIt p pc ij k O ( n 5 N 3 ) if N nonterminals r 0 n goal

11 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Cubic Recognition/Parsing (Eisner & Satta, 1999) Ittakestwototango goal One trapezoid per dependency. A triangle is a head with some left (or right) subtrees.

12 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Cubic Recognition/Parsing (Eisner & Satta, 1999) ij k ij k ij k ij k O ( n 3 ) combinations 0 in goal O ( n 3 g 2 N ) if N nonterminals, polysemy g O ( n ) combinations

13 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Implementation Augment items with (Viterbi) weights ; order by weight. Agenda-based, best-first algorithm. We use Dyna [see the HLT-EMNLP paper] to implement all parsers here. Count the number of items built → a measure of runtime.

14 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Very Simple Model for λ w and ρ w p ( child | first, parent, direction ) p ( stop | first, parent, direction ) p ( child | not first, parent, direction ) p ( stop | not first, parent, direction ) *We parse POS tag sequences, not words. Ittakestwoto λ takes ρ takes

15 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Baseline test set recall (%) test set runtime (items/word) 736177 9014949

16 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Improvements parse words, not tags smoothing/max ent bigger FSAs/ more nonterminals LTAG, CCG, etc. special NP -treatment, punctuation 73% model dependency length? train discriminatively

17 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Modeling Dependency Length r b a e d c f p’p’ = p · p (3 | r, a, L ) · p (2 | r, b, L ) · p (1 | b, c, R ) · p (1 | r, d, R ) r b a e d c f · p (1 | d, e, R ) · p (1 | e, f, R ) *When running parsing algorithm, just multiply in these probabilities at the appropriate time. DEFICIENT

18 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Modeling Dependency Length test set recall (%) test set runtime (items/word) 7361 77 9014949 7662 75 6710331 +4.1%+1.6% -2.6% -26%-31%-37% + length

19 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Conclusion (I) Modeling dependency length can cut runtime of simple models by 26-37% with effects ranging from -3% to +4% on recall. (Loss on recall perhaps due to deficient/MLE estimation.)

20 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Going to Extremes Longer dependencies are less likely. What if we eliminate them completely?

21 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Hard Constraints Disallow dependencies between words of distance > b... Risk: best parse contrived, or no parse at all! Solution: allow fragments (partial parsing; Hindle, 1990 inter alia ). Why not model the sequence of fragments?

22 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length From SBG to Vine SBG $ $ L( λ $ ) = {ε} L( ρ $ )  Σ L( ρ $ )  Σ + An SBG wall ($) has one child. A vine SBG wall has a sequence of children.

23 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Building a Vine SBG Parser Grammar : generates sequence of trees from $ Parser : recognizes sequences of trees without long dependencies Need to modify training data so the model is consistent with the parser.

24 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length the to filings estimates,According rule changescut insider some by more than a third. 2 2 8 4 2 3 2 9 1 1 1 1 1 1 1 1 1 $ would (from the Penn Treebank)

25 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length the to filings estimates,According rule changes would cut insider some by more than a third. 2 2 4 2 3 2 1 1 1 1 1 1 1 1 1 b = 4 $ (from the Penn Treebank)

26 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length the to filings estimates,According rule changes would cut insider some by more than a third. 2 2 2 3 2 1 1 1 1 1 1 1 1 1 $ (from the Penn Treebank) b = 3

27 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length the to filings estimates,According rule changes would cut insider some by more than a third. 2 2 2 2 1 1 1 1 1 1 1 1 1 $ (from the Penn Treebank) b = 2

28 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length the to filings estimates,According rule changes would cut insider some by more than a. 1 1 1 1 1 1 1 1 1 $ third (from the Penn Treebank) b = 1

29 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length $ the to filings estimates,According would cut insider some by more than a third. changes rule (from the Penn Treebank) b = 0

30 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Even for small b, “bunches” can grow to arbitrary size: But arbitrary center embedding is out: Observation

31 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Vine SBG is Finite-State Could compile into an FSA and get O ( n ) parsing! Problem : what’s the grammar constant ? According to some estimates, the rule changes would cut insider... FSA insider has no parent cut and would can have more children $ can have more children EXPONENTIAL

32 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Alternative Instead, we adapt an SBG chart parser which implicitly shares fragments of stack state to the vine case, eliminating unnecessary work.

33 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Quadratic Recognition/Parsing ij k ij k ij k ij k O ( n 3 ) combinations O ( nb 2 ) O(n2b)O(n2b) only construct trapezoids such that k – i ≤ b ** ****... goal O(n2b)O(n2b)

34 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length According to some, the new changes would cut insider filings by more than a third.,According changes would cut. b = 4 $ O ( nb ) vine construction all width ≤ 4

35 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Parsing Algorithm Same grammar constant as Eisner and Satta (1999) O ( n 3 ) → O ( nb 2 ) runtime Includes some overhead (low-order term) for constructing the vine –Reality check... is it worth it?

36 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Results: Penn Treebank b = 1 b = 20 *evaluation against original ungrafted Treebank; non-punctuation only

37 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Results: Chinese Treebank b = 1 b = 20 *evaluation against original ungrafted Treebank; non-punctuation only

38 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Results: TIGER Corpus b = 1 b = 20 *evaluation against original ungrafted Treebank; non-punctuation only

39 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Type-Specific Bounds b can be specific to dependency type: e.g., b (V-O) can be longer than b (S-V) b specific to ‹ parent, child, direction ›: gradually tighten based on training data

40 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length English: 50% runtime, no loss Chinese: 55% runtime, no loss German: 44% runtime, 2% loss

41 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Related Work Nederhof (2000) surveys finite-state approximation of context-free languages. CFG → FSA We limit all dependency lengths (not just center- embedding), and derive weights from the Treebank (not by approximation). Chart parser → reasonable grammar constant.

42 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Future Work $ apply to state-of-the-art parsing models better parameter estimation applications: MT, IE, grammar induction

43 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Conclusion (II) Dependency length can be a helpful feature in improving the speed and accuracy (or trading off between them) of simple parsing models that consider dependencies.

44 IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length This Talk in a Nutshell 1 1 1 3 Empirical results (English, Chinese, German): Hard constraints cut runtime in half or more with no accuracy loss (English, Chinese) or by 44% with -2.2% accuracy (German). Soft constraints affect accuracy of simple models by -3% to +24% and cut runtime by 25% to 40%. length of a dependency ≈ surface distance Formal results: A hard bound b on dependency length results in a regular language. allows O ( nb 2 ) parsing.


Download ppt "IWPT 2005 J. Eisner & N. A. Smith Parsing with Soft & Hard Constraints on Dependency Length Parsing with Soft and Hard Constraints on Dependency Length."

Similar presentations


Ads by Google