Presentation is loading. Please wait.

Presentation is loading. Please wait.

April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks and Parsing Jan Hajič Institute of Formal and Applied Linguistics School of.

Similar presentations


Presentation on theme: "April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks and Parsing Jan Hajič Institute of Formal and Applied Linguistics School of."— Presentation transcript:

1 April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks and Parsing Jan Hajič Institute of Formal and Applied Linguistics School of Computer Science Faculty of Mathematics and Physics Charles University, Prague Czech Republic

2 April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester2 Questions Covered Q1 What do you care … building a parser? Q2 What works, what doesn’t? Q3 What info is useful, what not? Q4 How does grammar writing interact with treebank building (TB)? Q5 Methodological lessons learned from TB? Q6 (Dis)advantages of pre-parsing for TB? Q7 Phrase-structure vs. dependency?

3 April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester3 Q1 What do we really care about… building a parser What will its output used for: Deep (semantic structure) parsing Translation Question answering … etc. Conversion of annotation into “features” Locality good (with today’s parsers) Accuracy Size, speed, … (the practical things)

4 April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester4 Q3 What info is useful Hard to say MST (McDonald), Collins, Charniak surface syntax parsers (Czech): No function tags used Reduced tagset (1100 -> 43)  Hand-made reduction worked best! (POS, case if possible) Lemmatization, word forms used Empty categories, co-indexation not used (not present) Adjunct/argument distinction not used Subcat frames not used (not present)

5 April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester5 Q5 Lessons learned ! For parsing only ! Separated surface and deep annotation is good Even then, Czech parsing lags behind English 1 million word treebank is far from enough… … for languages with rich inflection, that is Need for tagset reduction “Local” information helps Often can be extracted automatically from the annotated treebank Lexicalized PS/dependency not much difference (so far)

6 April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester6 Q6 (Dis)advantages of pre-parsing (surface) Speed Up to 50% faster (100% increase in throughput) …therefore cheaper Consistency better Labeling Color codes for uncertainty of label assignment Disadvantage: “strange” errors Can be checked for automatically with cross-checking

7 April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester7 Q7 Phrase structure vs. dependency If … (phrase structure) has heads marked AND (dependency) has tags suitable for phrase labels and no non-projectivity Then… essentially the same thing Else... ?? determining heads; branching & labels; projectivization Done on Czech: Collins parser 98, ACL ’99 Dependency -> lexicalized PS (parsing) -> Dep.


Download ppt "April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks and Parsing Jan Hajič Institute of Formal and Applied Linguistics School of."

Similar presentations


Ads by Google