Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dependency Parsing Prashanth Mannem

Similar presentations


Presentation on theme: "Dependency Parsing Prashanth Mannem"— Presentation transcript:

1 Dependency Parsing Prashanth Mannem prashanth@research.iiit.ac.in

2 Outline October 20, 2015Dependency Parsing (P. Mannem)2  Introduction  Phrase Structure Grammar  Dependency Grammar  Comparison  Dependency Parsing  Formal definition  Parsing Algorithms  Introduction  Dynamic programming  Constraint satisfaction  Deterministic search

3 Introduction October 20, 2015Dependency Parsing (P. Mannem)3  The syntactic parsing of a sentence consists of finding the correct syntactic structure of that sentence in a given formalism/grammar.  Dependency Grammar (DG) and Phrase Structure Grammar (PSG) are two such formalisms.

4 Phrase Structure Grammar (PSG) October 20, 2015Dependency Parsing (P. Mannem)4  Breaks sentence into constituents (phrases)  Which are then broken into smaller constituents  Describes phrase structure, clause structure  E.g.. NP, PP, VP etc..  Structures often recursive  The clever tall blue-eyed old … man

5 Draw the phrase structure tree  Red figures on the screens indicated falling stocks October 20, 2015Dependency Parsing (P. Mannem)5

6 Phrase Structure Tree October 20, 2015Dependency Parsing (P. Mannem)6

7 Dependency Grammar October 20, 2015Dependency Parsing (P. Mannem)7  Syntactic structure consists of lexical items, linked by binary asymmetric relations called dependencies  Interested in grammatical relations between individual words (governing & dependent words)  Does not propose a recursive structure  Rather a network of relations  These relations can also have labels

8 Draw the dependency tree  Red figures on the screens indicated falling stocks October 20, 2015Dependency Parsing (P. Mannem)8

9 Dependency Tree October 20, 2015Dependency Parsing (P. Mannem)9

10 Dependency Tree Example October 20, 2015Dependency Parsing (P. Mannem)10  Phrasal nodes are missing in the dependency structure when compared to constituency structure.

11 Dependency Tree with Labels October 20, 2015Dependency Parsing (P. Mannem)11

12 Comparison October 20, 2015Dependency Parsing (P. Mannem)12  Dependency structures explicitly represent  Head-dependent relations (directed arcs)  Functional categories (arc labels)  Possibly some structural categories (parts-of-speech)  Phrase structure explicitly represent  Phrases (non-terminal nodes)  Structural categories (non-terminal labels)  Possibly some functional categories (grammatical functions)

13 Learning DG over PSG October 20, 2015Dependency Parsing (P. Mannem)13  Dependency Parsing is more straightforward  Parsing can be reduced to labeling each token w i with w j  Direct encoding of predicate-argument structure  Fragments are directly interpretable  Dependency structure independent of word order  Suitable for free word order languages (like Indian languages)

14 Outline October 20, 2015Dependency Parsing (P. Mannem)14  Introduction  Phrase Structure Grammar  Dependency Grammar  Comparison  Dependency Parsing  Formal definition  Parsing Algorithms  Introduction  Dynamic programming  Constraint satisfaction  Deterministic search

15 Dependency Tree October 20, 2015Dependency Parsing (P. Mannem)15  Formal definition  An input word sequence w 1 …w n  Dependency graph D = (W,E) where  W is the set of nodes i.e. word tokens in the input seq.  E is the set of unlabeled tree edges (w i, w j ) (w i, w j є W).  (w i, w j ) indicates an edge from w i (parent) to w j (child).  Task of mapping an input string to a dependency graph satisfying certain conditions is called dependency parsing

16 Well-formedness October 20, 2015Dependency Parsing (P. Mannem)16  A dependency graph is well-formed iff  Single head: Each word has only one head.  Acyclic: The graph should be acyclic.  Connected: The graph should be a single tree with all the words in the sentence.  Projective: If word A depends on word B, then all words between A and B are also subordinate to B (i.e. dominated by B).

17 Non-projective dependency tree October 20, 2015Dependency Parsing (P. Mannem)17 Ram saw a dog yesterday which was a Yorkshire Terrier * Crossing lines English has very few non-projective cases.

18 Outline October 20, 2015Dependency Parsing (P. Mannem)18  Introduction  Phrase Structure Grammar  Dependency Grammar  Comparison and Conversion  Dependency Parsing  Formal definition  Parsing Algorithms  Introduction  Dynamic programming  Constraint satisfaction  Deterministic search

19 Dependency Parsing October 20, 2015Dependency Parsing (P. Mannem)19  Dependency based parsers can be broadly categorized into  Grammar driven approaches  Parsing done using grammars.  Data driven approaches  Parsing by training on annotated/un-annotated data.

20 Dependency Parsing October 20, 2015Dependency Parsing (P. Mannem)20  Dependency based parsers can be broadly categorized into  Grammar driven approaches  Parsing done using grammars.  Data driven approaches  Parsing by training on annotated/un-annotated data.  These approaches are not mutually exclusive

21 Covington’s Incremental Algorithm October 20, 2015Dependency Parsing (P. Mannem)21  Incremental parsing in O(n 2 ) time by trying to link each new word to each preceding one [Covington 2001]: PARSE(x = (w 1,...,w n )) 1. for i = 1 up to n 2. for j = i − 1 down to 1 3. LINK(w i, w j )

22 Covington’s Incremental Algorithm October 20, 2015Dependency Parsing (P. Mannem)22  Incremental parsing in O(n 2 ) time by trying to link each new word to each preceding one [Covington 2001]: PARSE(x = (w 1,...,w n )) 1. for i = 1 up to n 2. for j = i − 1 down to 1 3. LINK(w i, w j )  Different conditions, such as Single-Head and Projectivity, can be incorporated into the LINK operation.

23 Parsing Methods October 20, 2015Dependency Parsing (P. Mannem)23  Three main traditions  Dynamic programming  CYK, Eisner, McDonald  Constraint satisfaction  Maruyama, Foth et al., Duchier  Deterministic search  Covington, Yamada and Matsumuto, Nivre

24 Dynamic Programming  Basic Idea: Treat dependencies as constituents.  Use, e.g., CYK parser (with minor modifications) October 20, 2015 Dependency Parsing (P. Mannem)24

25 Dependency Chart Parsing October 20, 2015Dependency Parsing (P. Mannem)25  Grammar is regarded as context-free, in which each node is lexicalized  Chart entries are subtrees, i.e., words with all their left and right dependents  Problem: Different entries for different subtrees spanning a sequence of words with different heads  Time requirement: O(n 5 )

26 Generic Chart Parsing October 20, 2015Dependency Parsing (P. Mannem)26 for each of the O(n 2 ) substrings, for each of O(n) ways of splitting it, for each of  S analyses of first half for each of  S analyses of second half, for each of  c ways of combining them: combine, & add result to chart if best [cap spending] + [at $300 million] = [[cap spending] [at $300 million]]  S analyses  cS 2 analyses of which we keep  S O(n 3 S 2 c) Slide from [Eisner, 1997]

27 Headed constituents... October 20, 2015Dependency Parsing (P. Mannem)27... have too many signatures. How bad is  (n 3 S 2 c)? For unheaded constituents, S is constant: NP, VP... (similarly for dotted trees). So  (n 3 ). But when different heads  different signatures, the average substring has  (n) possible heads and S=  (n) possible signatures. So  (n 5 ). Slide from [Eisner, 1997]

28 Dynamic Programming Approaches October 20, 2015Dependency Parsing (P. Mannem)28  Original version [Hays 1964] ( grammar driven )  Link grammar [Sleator and Temperley 1991] ( grammar driven )  Bilexical grammar [Eisner 1996] ( data driven )  Maximum spanning tree [McDonald 2006] ( data driven )

29 Eisner 1996 October 20, 2015Dependency Parsing (P. Mannem)29  Two novel aspects:  Modified parsing algorithm  Probabilistic dependency parsing  Time requirement: O(n 3 )  Modification: Instead of storing subtrees, store spans  Span: Substring such that no interior word links to any word outside the span.  Idea: In a span, only the boundary words are active, i.e. still need a head or a child  One or both of the boundary words can be active

30 Example October 20, 2015Dependency Parsing (P. Mannem)30 Red figuresonthescreenindicatedfallingstocks_ROOT_

31 Example October 20, 2015Dependency Parsing (P. Mannem)31 Spans: Red figuresonthescreenindicatedfallingstocks_ROOT_ indicatedfallingstocks Red figures

32 Assembly of correct parse October 20, 2015Dependency Parsing (P. Mannem)32 Red figuresonthescreenindicatedfallingstocks_ROOT_ Red figures Start by combining adjacent words to minimal spans figureson the

33 Assembly of correct parse October 20, 2015Dependency Parsing (P. Mannem)33 Red figuresonthescreenindicatedfallingstocks_ROOT_ Combine spans which overlap in one word; this word must be governed by a word in the left or right span. onthe screen +→ onthescreen

34 Assembly of correct parse October 20, 2015Dependency Parsing (P. Mannem)34 Red figuresonthescreenindicatedfallingstocks_ROOT_ Combine spans which overlap in one word; this word must be governed by a word in the left or right span. figureson +→ thescreenfiguresonthescreen

35 Assembly of correct parse October 20, 2015Dependency Parsing (P. Mannem)35 Red figuresonthescreenindicatedfallingstocks_ROOT_ Combine spans which overlap in one word; this word must be governed by a word in the left or right span. Invalid span Red figuresonthescreen

36 Assembly of correct parse October 20, 2015Dependency Parsing (P. Mannem)36 Red figuresonthescreenindicatedfallingstocks_ROOT_ Combine spans which overlap in one word; this word must be governed by a word in the left or right span. +→ indicatedfallingstocksfallingstocks indicatedfalling

37 Eisner’s Model October 20, 2015Dependency Parsing (P. Mannem)37  Recursive Generation  Each word generates its actual dependents  Two Markov chains:  Left dependents  Right dependents

38 Eisner’s Model October 20, 2015Dependency Parsing (P. Mannem)38 where tw(i) is i th tagged word lc(i) & rc(i) are the left and right children of i th word where lc j (i) is the j th left child of the i th word t(lc j-1 (i)) is the tag of the preceding left child

39 McDonald’s Maximum Spanning Trees October 20, 2015Dependency Parsing (P. Mannem)39  Score of a dependency tree = sum of scores of dependencies  Scores are independent of other dependencies  If scores are available, parsing can be formulated as maximum spanning tree problem  Two cases:  Projective: Use Eisner’s parsing algorithm.  Non-projective: Use Chu-Liu-Edmonds algorithm [Chu and Liu 1965, Edmonds 1967]  Uses online learning for determining weight vector w

40 Parsing Methods October 20, 2015Dependency Parsing (P. Mannem)40  Three main traditions  Dynamic programming  CYK, Eisner, McDonald  Constraint satisfaction  Maruyama, Foth et al., Duchier  Deterministic parsing  Covington, Yamada and Matsumuto, Nivre

41 Constraint Satisfaction October 20, 2015Dependency Parsing (P. Mannem)41  Uses Constraint Dependency Grammar  Grammar consists of a set of boolean constraints, i.e. logical formulas that describe well-formed trees  A constraint is a logical formula with variables that range over a set of predefined values  Parsing is defined as a constraint satisfaction problem  Constraint satisfaction removes values that contradict constraints

42 Constraint Satisfaction October 20, 2015Dependency Parsing (P. Mannem)42  Parsing is an eliminative process rather than a constructive one such as in CFG parsing  Constraint satisfaction in general is NP complete  Parser design must ensure practical efficiency  Different approaches  Constraint propagation techniques which ensure local consistency [Maruyama 1990]  Weighted CDG [Foth et al. 2000, Menzel and Schroder 1998]

43 Maruyama’s Constraint Propagation October 20, 2015Dependency Parsing (P. Mannem)43  Three steps: 1) Form initial constraint network using a “core” grammar 2) Remove local inconsistencies 3) If ambiguity remains, add new constraints and repeat step 2

44 Weighted Constraint Parsing October 20, 2015Dependency Parsing (P. Mannem)44  Robust parser, which uses soft constraints  Each constraint is assigned a weight between 0.0 and 1.0  Weight 0.0: hard constraint, can only be violated when no other parse is possible  Constraints assigned manually (or estimated from treebank)  Efficiency: uses a heuristic transformation-based constraint resolution method

45 Transformation-Based Constraint Resolution October 20, 2015Dependency Parsing (P. Mannem)45  Heuristic search  Very efficient  Idea: first construct arbitrary dependency structure, then try to correct errors  Error correction by transformations  Selection of transformations based on constraints that cause conflicts  Anytime property: parser maintains a complete analysis at anytime → can be stopped at any time and return a complete analysis

46 Parsing Methods October 20, 2015Dependency Parsing (P. Mannem)46  Three main traditions  Dynamic programming  CYK, Eisner, McDonald  Constraint satisfaction  Maruyama, Foth et al., Duchier  Deterministic parsing  Covington, Yamada and Matsumuto, Nivre

47 Deterministic Parsing October 20, 2015Dependency Parsing (P. Mannem)47  Basic idea:  Derive a single syntactic representation (dependency graph) through a deterministic sequence of elementary parsing actions  Sometimes combined with backtracking or repair  Motivation:  Psycholinguistic modeling  Efficiency  Simplicity

48 Shift-Reduce Type Algorithms October 20, 2015Dependency Parsing (P. Mannem)48  Data structures:  Stack [...,w i ]S of partially processed tokens  Queue [w j,...]Q of remaining input tokens  Parsing actions built from atomic actions:  Adding arcs (w i → w j, w i ← w j )  Stack and queue operations  Left-to-right parsing  Restricted to projective dependency graphs

49 Yamada’s Algorithm October 20, 2015Dependency Parsing (P. Mannem)49  Three parsing actions:  Shift [...]S [w i,...]Q [..., w i ]S [...]Q  Left[..., w i, w j ]S [...]Q [..., w i ]S [...]Q w i → w j  Right[..., w i, w j ]S [...]Q [..., w j ]S [...]Q w i ← w j  Multiple passes over the input give time complexity O(n 2 )

50 Yamada and Matsumoto October 20, 2015Dependency Parsing (P. Mannem)50  Parsing in several rounds: deterministic bottom-up O(n 2 )  Looks at pairs of words  3 actions: shift, left, right  Shift: shifts focus to next word pair

51 Yamada and Matsumoto October 20, 2015Dependency Parsing (P. Mannem)51  Right: decides that the right word depends on the left word  Left: decides that the left word depends on the right one

52 Parsing Algorithm October 20, 2015Dependency Parsing (P. Mannem)52  Go through each pair of words  Decide which action to take  If a relation was detected in a pass, do another pass  E.g. the little girl  First pass: relation between little and girl  Second pass: relation between the and girl  Decision on action depends on word pair and context

53 Nivre’s Algorithm October 20, 2015Dependency Parsing (P. Mannem)53  Four parsing actions: Shift[...]S [w i,...]Q [..., w i ]S [...]Q Reduce[..., w i ]S [...]Q Э w k : w k → w i [...]S [...]Q Left-Arc[..., w i ]S [w j,...]Q ¬ Э w k : w k → w i [...]S [w j,...]Q w i ← w j Right-Arc [...,w i ]S [w j,...]Q ¬ Э w k : w k → w j [..., w i, w j ]S [...]Q w i → w j

54 Nivre’s Algorithm October 20, 2015Dependency Parsing (P. Mannem)54  Characteristics:  Arc-eager processing of right-dependents  Single pass over the input gives time worst case complexity O(2n)

55 Example October 20, 2015Dependency Parsing (P. Mannem)55 Red figuresonthescreenindicatedfallingstocks_ROOT_ SQ

56 Example October 20, 2015Dependency Parsing (P. Mannem)56 Red figuresonthescreenindicatedfallingstocks_ROOT_ SQ Shift

57 Example October 20, 2015Dependency Parsing (P. Mannem)57 Red figuresonthescreenindicatedfallingstocks_ROOT_ SQ Left-arc

58 Example October 20, 2015Dependency Parsing (P. Mannem)58 Red figuresonthescreenindicatedfallingstocks_ROOT_ SQ Shift

59 Example October 20, 2015Dependency Parsing (P. Mannem)59 Red figuresonthescreenindicatedfallingstocks_ROOT_ SQ Right-arc

60 Example October 20, 2015Dependency Parsing (P. Mannem)60 Red figuresonthescreenindicatedfallingstocks_ROOT_ SQ Shift

61 Example October 20, 2015Dependency Parsing (P. Mannem)61 Red figureson the screenindicatedfallingstocks_ROOT_ SQ Left-arc

62 Example October 20, 2015Dependency Parsing (P. Mannem)62 Red figureson the screenindicatedfallingstocks_ROOT_ SQ Right-arc

63 Example October 20, 2015Dependency Parsing (P. Mannem)63 Red figureson thescreen indicatedfallingstocks_ROOT_ SQ Reduce

64 Example October 20, 2015Dependency Parsing (P. Mannem)64 Red figures onthescreen indicatedfallingstocks_ROOT_ SQ Reduce

65 Example October 20, 2015Dependency Parsing (P. Mannem)65 Red figuresonthescreen indicatedfallingstocks_ROOT_ SQ Left-arc

66 Example October 20, 2015Dependency Parsing (P. Mannem)66 Red figuresonthescreen indicatedfallingstocks_ROOT_ SQ Right-arc

67 Example October 20, 2015Dependency Parsing (P. Mannem)67 Red figuresonthescreen indicatedfallingstocks_ROOT_ SQ Shift

68 Example October 20, 2015Dependency Parsing (P. Mannem)68 Red figuresonthescreen indicated falling stocks_ROOT_ SQ Left-arc

69 Example October 20, 2015Dependency Parsing (P. Mannem)69 Red figuresonthescreen indicated falling stocks_ROOT_ SQ Right-arc

70 Example October 20, 2015Dependency Parsing (P. Mannem)70 Red figuresonthescreen indicated fallingstocks _ROOT_ SQ Reduce

71 Example October 20, 2015Dependency Parsing (P. Mannem)71 Red figuresonthescreenindicatedfallingstocks _ROOT_ SQ Reduce

72 Classifier-Based Parsing October 20, 2015Dependency Parsing (P. Mannem)72  Data-driven deterministic parsing:  Deterministic parsing requires an oracle.  An oracle can be approximated by a classifier.  A classifier can be trained using treebank data.  Learning algorithms:  Support vector machines (SVM) [Kudo and Matsumoto 2002, Yamada and Matsumoto 2003,Isozaki et al. 2004, Cheng et al. 2004, Nivre et al. 2006]  Memory-based learning (MBL) [Nivre et al. 2004, Nivre and Scholz 2004]  Maximum entropy modeling (MaxEnt ) [Cheng et al. 2005]

73 Feature Models October 20, 2015Dependency Parsing (P. Mannem)73  Learning problem:  Approximate a function from parser states, represented by feature vectors to parser actions, given a training set of gold standard derivations.  Typical features:  Tokens and POS tags of :  Target words  Linear context (neighbors in S and Q)  Structural context (parents, children, siblings in G)  Can not be used in dynamic programming algorithms.

74 Dependency Parsers for download  MST parser by Ryan McDonald  Malt parser by Joakim Nivre  Stanford parser October 20, 2015Dependency Parsing (P. Mannem)74

75 Summary October 20, 2015Dependency Parsing (P. Mannem)75  Provided an intro to dependency parsing and various dependency parsing algorithms  Read up Nivre’s and McDonald’s tutorial on dependency parsing at ESSLLI’ 07

76 October 20, 2015Dependency Parsing (P. Mannem)76


Download ppt "Dependency Parsing Prashanth Mannem"

Similar presentations


Ads by Google