Experiments with a Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.

Experiments with a Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa

Aims and Motivation Efficient parser for use in demanding applications like QA, Opinion Mining Efficient parser for use in demanding applications like QA, Opinion Mining Can tolerate small drop in accuracy Can tolerate small drop in accuracy Customizable to the need of the application Customizable to the need of the application Deterministic bottom-up parser Deterministic bottom-up parser

Annotator for Italian TreeBank

Statistical Parsers Probabilistic Generative Model of Language which include parse structure (e.g. Collins 1997) Probabilistic Generative Model of Language which include parse structure (e.g. Collins 1997) Conditional parsing models (Charniak 2000; McDonald 2005) Conditional parsing models (Charniak 2000; McDonald 2005)

Global Linear Model X : set of sentences X : set of sentences Y: set of possible parse trees Y: set of possible parse trees Learn function F : X → Y Learn function F : X → Y Choose the highest scoring tree as the most plausible: Choose the highest scoring tree as the most plausible: Involves just learning weights W Involves just learning weights W

Feature Vector A set of functions h 1 …h d define a feature vector  (x) =

Constituent Parsing GEN : e.g. CFG GEN : e.g. CFG h i (x) are based on aspects of the tree h i (x) are based on aspects of the treee.g. h(x) = # of times occurs in x A B C

Dependency Parsing GEN generates all possible maximum spanning trees GEN generates all possible maximum spanning trees First order factorization: First order factorization:  (y) =  (y) = Second order factorization (McDonald 2006): Second order factorization (McDonald 2006):  (y) =  (y) =

Dependency Tree Word-word dependency relations Word-word dependency relations Far easier to understand and to annotate Far easier to understand and to annotate Rolls-Royce Inc. said it expects its sales to remain steady

Shift/Reduce Dependency Parser Traditional statistical parsers are trained directly on the task of selecting a parse tree for a sentence Traditional statistical parsers are trained directly on the task of selecting a parse tree for a sentence Instead a Shift/Reduce parser is trained and learns the sequence of parse actions required to build the parse tree Instead a Shift/Reduce parser is trained and learns the sequence of parse actions required to build the parse tree

Grammar Not Required A traditional parser requires a grammar for generating candidate trees A traditional parser requires a grammar for generating candidate trees A Shift/Reduce parser needs no grammar A Shift/Reduce parser needs no grammar

Parsing as Classification Parsing based on Shift/Reduce actions Parsing based on Shift/Reduce actions Learn from annotated corpus which action to perform at each step Learn from annotated corpus which action to perform at each step Proposed by (Yamada-Matsumoto 2003) and (Nivre 2003) Proposed by (Yamada-Matsumoto 2003) and (Nivre 2003) Uses only local information, but can exploit history Uses only local information, but can exploit history

Variants for Actions Shift, Left, Right Shift, Left, Right Shift, Reduce, Left-arc, Right-arc Shift, Reduce, Left-arc, Right-arc Shift, Reduce, Left, WaitLeft, Right, WaitRight Shift, Reduce, Left, WaitLeft, Right, WaitRight Shift, Left, Right, Left2, Right2 Shift, Left, Right, Left2, Right2

Parser Actions Right I PP saw VVD a DT girl NN with IN the DT glasses NNS. SENT nexttop ShiftLeft

Dependency Graph Let R = {r 1, …, r m } be the set of permissible dependency types A dependency graph for a sequence of words W = w 1 … w n is a labeled directed graph D = (W, A), where (a) W is the set of nodes, i.e. word tokens in the input string, (b) A is a set of labeled arcs (w i, r, w j ), w i, w j  W, r  R, (c)  w j  W, there is at most one arc (w i, r, w j )  A.

Parser State The parser state is a quadruple  S, I, T, A , where S is a stack of partially processed tokens I is a list of (remaining) input tokens T is a stack of temporary tokens A is the arc relation for the dependency graph ( w, r, h )  A represents an arc w → h, tagged with dependency r

Which Orientation for Arrows? Some authors draw a dependency link as arrow from dependent to head (Yamada-Matsumoto) Some authors draw a dependency link as arrow from dependent to head (Yamada-Matsumoto) Some authors draw a dependency link as arrow from head to dependent (Nivre, McDonalds) Some authors draw a dependency link as arrow from head to dependent (Nivre, McDonalds) Causes confusions, since actions are termed Left/Right according to direction of arrow Causes confusions, since actions are termed Left/Right according to direction of arrow

Parser Actions Shift  S, n|I, T, A   n|S, I, T, A  Right  s|S, n|I, T, A   S, n|I, T, A  {(s, r, n)}  Left  s|S, n|I, T, A   S, s|I, T, A  {(n, r, s)} 

Parser Algorithm The parsing algorithm is fully deterministic: The parsing algorithm is fully deterministic: Input Sentence: (w 1, p 1 ), (w 2, p 2 ), …, (w n, p n ) S = <> I = T = <> A = { } while I ≠ <> do begin x = getContext(S, I, T, A); y = estimateAction(model, x); performAction(y, S, I, T, A); end

Learning Phase

Learning Features featureValue Wword Llemma Ppart of speech (POS) tag Mmorphology: e.g. singular/plural W<word of the leftmost child node L<lemma of the leftmost child node P<POS tag of the leftmost child node, if present M<whether the rightmost child node is singular/plural W>word of the rightmost child node L>lemma of the rightmost child node P>POS tag of the rightmost child node, if present M>whether the rightmost child node is singular/plural

Learning Event leggi NOM le DET anti ADV che PRO, PON Serbia NOM erano VER discusse ADJ che PRO Sosteneva VER context left contexttarget nodesright context (-3, W, che), (-3, P, PRO), (-2, W, leggi), (-2, P, NOM), (-2, M, P), (-2, W<, le), (-2, P<, DET), (-2, M<, P), (-1, W, anti), (-1, P, ADV), (0, W, Serbia), (0, P, NOM), (0, M, S), (+1, W, che), ( +1, P, PRO), (+1, W>, erano), (+1, P>, VER), (+1, M>, P), (+2, W,,), (+2, P, PON)

Parser Architecture Modular learners architecture: Modular learners architecture: –MaxEntropy, MBL, SVM, Winnow, Perceptron Classifier combinations: e.g. multiple MEs, SVM + ME Classifier combinations: e.g. multiple MEs, SVM + ME Features can be selected Features can be selected

Feature used in Experiments LemmaFeatures-2 -1 0 1 2 3 PosFeatures-2 -1 0 1 2 3 MorphoFeatures-1 0 1 2 PosLeftChildren2 PosLeftChild-1 0 DepLeftChild-1 0 PosRightChildren2 PosRightChild-1 0 DepRightChild-1 PastActions1

Projectivity An arc w i → w k is projective iff  j, i j > k, w i →* w k An arc w i → w k is projective iff  j, i j > k, w i →* w k A dependency tree is projective iff every arc is projective A dependency tree is projective iff every arc is projective Intuitively: arcs can be drawn on a plane without intersections Intuitively: arcs can be drawn on a plane without intersections

Non Projective Většinu těchto přístrojů lze take používat nejen jako fax, ale

Actions for non-projective arcs Right2  s 1 |s 2 |S, n|I, T, A   s 1 |S, n|I, T, A  {(s 2, r, n)}  Left2  s 1 |s 2 |S, n|I, T, A   s 2 |S, s 1 |I, T, A  {(n, r, s 2 )}  Right3  s 1 |s 2 |s 3 |S, n|I, T, A   s 1 |s 2 |S, n|I, T, A  {(s 3, r, n)}  Left3  s 1 |s 2 |s 3 |S, n|I, T, A   s 2 |s 3 |S, s 1 |I, T, A  {(n, r, s 3 )}  Extract  s 1 |s 2 |S, n|I, T, A   n|s 1 |S, I, s 2 |T, A  Insert  S, I, s 1 |T, A   s 1 |S, I, T, A 

Example Right2 (nejen → ale) and Left3 (fax → Většinu) Right2 (nejen → ale) and Left3 (fax → Většinu) Většinu těchto přístrojů lze take používat nejen jako fax, ale

Example Většinu těchto přístrojů lze take používat nejen fax ale jako,

Examples zou gemaakt moeten worden in zou moeten worden gemaakt in Extract followed by Insert

Effectiveness for Non-Projectivity Training data for Czech contains 28081 non-projective relations Training data for Czech contains 28081 non-projective relations 26346 (93%) can be handled by Left2/Right2 26346 (93%) can be handled by Left2/Right2 1683 (6%) by Left3/Right3 1683 (6%) by Left3/Right3 52 (0.2%) require Extract/Insert 52 (0.2%) require Extract/Insert

Experiments 3 classifiers: one to decide between Shift/Reduce, one to decide which Reduce action and a third one to chose the dependency in case of Left/Right action 3 classifiers: one to decide between Shift/Reduce, one to decide which Reduce action and a third one to chose the dependency in case of Left/Right action 2 classifiers: one to decide which action to perform and a second one to chose the dependency 2 classifiers: one to decide which action to perform and a second one to chose the dependency

CoNLL-X Shared Task To assign labeled dependency structures for a range of languages by means of a fully automatic dependency parser To assign labeled dependency structures for a range of languages by means of a fully automatic dependency parser Input: tokenized and tagged sentences Input: tokenized and tagged sentences Tags: token, lemma, POS, morpho features, ref. to head, dependency label Tags: token, lemma, POS, morpho features, ref. to head, dependency label For each token, the parser must output its head and the corresponding dependency relation For each token, the parser must output its head and the corresponding dependency relation

CoNLL-X: Collections ArCnCzDkDuDeJpPtSlSpSeTrBu K tokens543371,24994195700151207298919158190 K sents1.557.072.75.213.339.217.09.11.53.311.05.012.8 Tokens/sentence37.25.917.218.214.617.88.922.818.727.017.311.514.8 CPOSTAG14221210135220151115371411 POSTAG1930363243025277212838373053 FEATS19061478104146513308250 DEPREL2782785226467552521562518 % non-project. relations 0.40.01.91.05.42.31.11.31.90.11.01.50.4 % non-project. sentences 11.20.023.215.636.427.85.318.922.21.79.811.65.4

CoNLL: Evaluation Metrics Labeled Attachment Score (LAS) Labeled Attachment Score (LAS) –proportion of “scoring” tokens that are assigned both the correct head and the correct dependency relation label Unlabeled Attachment Score (UAS) Unlabeled Attachment Score (UAS) –proportion of “scoring” tokens that are assigned the correct head

Shared Task Unofficial Results Language Maximum EntropyMBL LAS % UAS % Train sec Parse sec LAS % UAS % Train sec Parse sec Arabic56.4370.961812.659.7074.6924950 Bulgarian82.8887.394521.579.1785.9288353 Chinese81.6986.761,1561.872.1783.08540478 Czech62.1073.4413,80012.869.2080.2249613,500 Danish77.4983.033863.278.4685.2152627 Dutch70.4974.996793.372.4777.61132923 Japanese84.1787.151290.885.1987.794497 German80.0183.379,3154.379.7984.311,3993,756 Portuguese79.4087.701,0444.980.9787.74160670 Slovene61.9774.78983.062.6776.6016547 Spanish72.3576.062042.474.3779.7054769 Swedish78.3584.681,4242.974.8583.73961,177 Turkish58.8169.791772.347.5865.2543727

CoNLL-X: Comparative Results LASUAS AverageOursAverageOurs Arabic59.9459.7073.4874.69 Bulgarian 79.9882.8885.8987.39 Chinese 78.3281.6984.8586.76 Czech67.1769.2077.0180.22 Danish78.3178.4684.5285.21 Dutch70.7372.4775.0777.71 Japanese85.8685.1989.0587.79 German78.5880.0182.6084.31 Portuguese 80.6380.9786.4687.74 Slovene 65.1662.6776.5376.60 Spanish 73.5274.3777.7679.70 Swedish 76.4478.3584.2184.68 Turkish 55.9558.8169.3569.79 Average scores from 36 participant submissions

Performance Comparison Running Maltparser 0.4 on same Xeon 2.8 MHz machine Running Maltparser 0.4 on same Xeon 2.8 MHz machine Training on swedish/talbanken: Training on swedish/talbanken: –390 min Test on CoNLL swedish: Test on CoNLL swedish: –13 min

Italian Treebank Official Announcement: Official Announcement: –CNR ILC has agreed to provide the SI- TAL collection for use at CoNLL Working on completing annotation and converting to CoNLL format Working on completing annotation and converting to CoNLL format Semiautomated process: heuristics + manual fixup Semiautomated process: heuristics + manual fixup

DgAnnotator A GUI tool for: A GUI tool for: –Annotating texts with dependency relations –Visualizing and comparing trees –Generating corpora in XML or CoNLL format –Exporting DG trees to PNG Demo Demo Demo Available at: http://medialab.di.unipi.it/Project/QA/Parse r/DgAnnotator/ Available at: http://medialab.di.unipi.it/Project/QA/Parse r/DgAnnotator/ http://medialab.di.unipi.it/Project/QA/Parse r/DgAnnotator/ http://medialab.di.unipi.it/Project/QA/Parse r/DgAnnotator/

Future Directions Opinion Extraction Opinion Extraction –Finding opinions (positive/negative) –Blog track in TREC2006 Intent Analysis Intent Analysis –Determine author intent, such as: problem (description, solution), agreement (assent, dissent), preference (likes, dislikes), statement (claim, denial)

References G. Attardi. 2006. Experiments with a Multilanguage Non-projective Dependency Parser. In Proc. CoNLL-X. G. Attardi. 2006. Experiments with a Multilanguage Non-projective Dependency Parser. In Proc. CoNLL-X. H. Yamada, Y. Matsumoto. 2003. Statistical Dependency Analysis with Support Vector Machines. In Proc. of IWPT-2003. H. Yamada, Y. Matsumoto. 2003. Statistical Dependency Analysis with Support Vector Machines. In Proc. of IWPT-2003. J. Nivre. 2003. An efficient algorithm for projective dependency parsing. In Proc. of IWPT-2003, pages 149–160.

Experiments with a Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.

Similar presentations

Presentation on theme: "Experiments with a Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Experiments with a Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa.

Similar presentations

Presentation on theme: "Experiments with a Multilanguage Non-Projective Dependency Parser Giuseppe Attardi Dipartimento di Informatica Università di Pisa."— Presentation transcript:

Similar presentations

About project

Feedback