Presentation is loading. Please wait.

Presentation is loading. Please wait.

April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks: Layering the Annotation Jan Hajič Institute of Formal and Applied Linguistics.

Similar presentations


Presentation on theme: "April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks: Layering the Annotation Jan Hajič Institute of Formal and Applied Linguistics."— Presentation transcript:

1 April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks: Layering the Annotation Jan Hajič Institute of Formal and Applied Linguistics School of Computer Science Faculty of Mathematics and Physics Charles University, Prague Czech Republic

2 April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester2 Layering the PDT 2.0 4 (5) stand-off layers: Deep structure (t) Syntax & semnatics Dependecy & non-dep. links Surface structure (a) Dependency, function Morphology (m) Lemma, tag (detailed) Word (token) (w) Audio/auto transcript (z) z-layer “PML” Scheme (XML based)

3 April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester3 The Links Within t-layer Co-reference links Pronoun to antecedent, (future: full coref chains) Complement to 2 nd governor, etc. Lexicon links Verbs, nouns, adjectives, adverbs to dictionary entry  Word sense disambiguated, valency/frame-based t-layer to a-layer Which a-node the t-node “comes from” No restrictions (crossing, many-to-many, …)

4 April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester4 The Questions I Influence of choices made in the underlying annotation influenced “upper” layer choices? Minimal or none thanks to stand-off annotation style, and many-to- many references/links allowed (XML IDs) Added annotation (over surface syntax): Node order (information structure), deep dependencies, 30+ node labels (time, modalities, semantic POS, number, pronoun classes, …), co- reference, valency dictionary (~ “frame files”) links (word sense annotation), “empty” nodes (args), …

5 April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester5 The Questions II Hard to circumvent syntactic choices? Not really… (again, thanks to XML stand-off) Only 1 label at surface syntactic level (function) Dependency(-only) no problem (no need to refer to phrases – all represented by subtrees) …but there will be a problem with the t-layer When referring from some “higher” (“logic”) layer:  (Probably) need to refer to labels (attributes) Solution:  Add IDs to attributes (should be easy, in fact – XML ID…)

6 April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester6 The Questions III Desirable characteristics … for adding layers Stand-off annotation Proper IDs for in-, between-layer reference In advance, if possible, but usually can be added later Quality Control !! Easier with layers - cross-layer constraints Invisible to annotators -> catch random errors Links (between-layer type) can be pre-annotated PS vs. dep.: impact on additional annotation Not observed


Download ppt "April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks: Layering the Annotation Jan Hajič Institute of Formal and Applied Linguistics."

Similar presentations


Ads by Google