Presentation is loading. Please wait.

Presentation is loading. Please wait.

Treebanks as Training Data for Parsers Joakim Nivre Växjö University and Uppsala University

Similar presentations


Presentation on theme: "Treebanks as Training Data for Parsers Joakim Nivre Växjö University and Uppsala University"— Presentation transcript:

1 Treebanks as Training Data for Parsers Joakim Nivre Växjö University and Uppsala University E-mail: nivre@msi.vxu.se

2 Q1:What do you really care about when you’re building a parser? For parsing unrestricted text, I care about the joint optimization of: –Robustness –Disambiguation –Accuracy –Efficiency Requirement on syntactic annotation: –Balance between expressivity and complexity

3 Example:Mildly Non-Projective Dependency Structures Dependency structure in two treebanks: –Strictly projective (efficiently parsable): PDT: 75% DDT: 85% –Unrestricted non-projective (often intractable): PDT: 100% DDT: 100% –Well-nested, gap degree ≤ 1: PDT: 99.5% DDT: 99.7% Design choice in treebank annotation?

4 Q2:What works, what doesn’t? Anything works? –Top systems in CoNLL 2006 shared task: MSTParser: Global, exhaustive, graph-based MaltParser: Local, greedy, stack-based –Features more important than parsers? But not for all languages? –Results from CoNLL 2007 shared task: Configurational languages ≈ 85% LAS (Catalan, Chinese, English, Italian) Richly inflected languages ≈ 75% LAS (Arabic, Basque, Czech, Greek, Hungarian, Turkish) Treebank problem or parser problem?

5 Q3:What information is useful, what is not? Word level: –Morphological analysis (lemma, derivation, inflection) –Hierarchical parts-of-speech (incl. features) Sentence level: –Complete structural annotation (phrases, heads) –Complete functional annotation (syntactic relations) –Deep/non-local dependencies Integrated morpho-syntactic annotation: –The key to parsing richly inflected languages?

6 Skipping a few questions … Q4:How does grammar writing interact with treebanking? –No idea. Not my cup of tea. Q5: What methodological lessons can be drawn for treebanking? Q6: What are advantages and disadvantages of preprocessing the data to be treebanked with an automatic parser? –Don’t know. Never got funding to build a real treebank.

7 Q7:Advantages of a phrase structure and/or a dependency treebank? Obvious answer: –Phrase structure is good for phrase structure parsing. –Dependency is good for dependency parsing. Methodological point: –Parsing lossy conversions can be questionable. Remedy: –Make annotations (just) rich enough to support both. –Annotation scheme: Minimal source annotation Well-defined conversions to target annotations


Download ppt "Treebanks as Training Data for Parsers Joakim Nivre Växjö University and Uppsala University"

Similar presentations


Ads by Google