Prague Dependency Treebank(s) Workshop at LSA2011, Part II Jan Hajič, Zdeňka Urešová Institute of Formal and Applied Linguistics School of Computer Science.

Slides:



Advertisements
Similar presentations
The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
Advertisements

Annotation of Grammatemes in the Prague Dependency Treebank 2.0 Magda Razímová Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University.
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
En->Cz MT system based on tectogrammatics Zdeněk Žabokrtský IFAL, Charles University in Prague.
June 6, 20073rd PIRE Meeting1 Tectogrammatical Representation of English in Prague Czech-English Dependency Treebank Lucie Mladová Silvie Cinková, Kristýna.
Overview of the Hindi-Urdu Treebank Fei Xia University of Washington 7/23/2011.
Annotating language data Tomaž Erjavec Institut für Informationsverarbeitung Geisteswissenschaftliche Fakultät Karl-Franzens-Universität Graz Tomaž Erjavec.
Prague Arabic Dependency Treebank Center for Computational Linguistics Institute of Formal and Applied Linguistics Charles University in Prague MorphoTrees.
LING NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006.
1 Words and the Lexicon September 10th 2009 Lecture #3.
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks: Layering the Annotation Jan Hajič Institute of Formal and Applied Linguistics.
Are Linguists Dinosaurs? 1.Statistical language processors seem to be doing away with the need for linguists. –Why do we need linguists when a machine.
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks: Language-specific Issues Czech Jan Hajič Institute of Formal and Applied Linguistics.
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks and Parsing Jan Hajič Institute of Formal and Applied Linguistics School of.
Workshop on Treebanks, Rochester NY, April 26, 2007 The Penn Treebank: Lessons Learned and Current Methodology Ann Bies Linguistic Data Consortium, University.
The LC-STAR project (IST ) Objectives: Track I (duration 2 years) Specification and creation of large word lists and lexica suited for flexible.
1.3 Executing Programs. How is Computer Code Transformed into an Executable? Interpreters Compilers Hybrid systems.
TectoMT two goals of TectoMT –to allow experimenting with MT based on deep- syntactic (tectogrammatical) transfer –to create a software framework into.
Building the Valency Lexicon of Arabic Verbs Viktor Bielický Otakar Smrž LREC 2008, Marrakech, Morocco.
PDT 2.0 Prague Dependency Treebank 2.0 Zdeněk Žabokrtský Dept. of Formal and Applied Linguistics Charles University, Prague.
Introduction to English Syntax Level 1 Course Ron Kuzar Department of English Language and Literature University of Haifa Chapter 2 Sentences: From Lexicon.
Leonid Iomdin Institute for Information Transmission Problems, Russian Academy of Sciences
UAM CorpusTool: An Overview Debopam Das Discourse Research Group Department of Linguistics Simon Fraser University Feb 5, 2014.
PDT Grammatemes and Coreference in the PDT 2.0 Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University in Prague.
1/21 Introduction to TectoMT Zdeněk Žabokrtský, Martin Popel Institute of Formal and Applied Linguistics Charles University in Prague CLARA Course on Treebank.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Lecture 12: 22/6/1435 Natural language processing Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
Machine Translation using Tectogrammatics Zdeněk Žabokrtský IFAL, Charles University in Prague.
Experiments on Building Language Resources for Multi-Modal Dialogue Systems Goals identification of a methodology for adapting linguistic resources for.
March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax.
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
Morphological Meanings in the Prague Dependency Treebank Magda Razímová Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University,
Tree-based Machine Translation using syntax and semantics
April 17, 2007MT Marathon: Tree-based Translation1 Tree-based Translation with Tectogrammatical Representation Jan Hajič Institute of Formal and Applied.
The Prague (Czech-)English Dependency Treebank Jan Hajič Charles University in Prague Computer Science School Institute of Formal and Applied Linguistics.
Treebanks and MWEs (Part 1) Jan Hajič, Pavel Straňák, Jiří Mírovský Institute of Formal and Applied Linguistics & LINDAT/CLARIN School of Computer Science.
Jan Hajič Otakar Smrž Petr Zemánek Jan Šnaidauf Emanuel Beška Faculty of Mathematics and Physics Faculty of Philosophy and Arts Charles University in Prague.
Approaches to Machine Translation CSC 5930 Machine Translation Fall 2012 Dr. Tom Way.
Ideas for 100K Word Data Set for Human and Machine Learning Lori Levin Alon Lavie Jaime Carbonell Language Technologies Institute Carnegie Mellon University.
Resemblances between Meaning-Text Theory and Functional Generative Description Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University,
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
1 / 5 Zdeněk Žabokrtský: Automatic Functor Assignment in the PDT Automatic Functor Assignment (AFA) in the Prague Dependency Treebank PDT : –a long term.
For Monday Read chapter 24, sections 1-3 Homework: –Chapter 23, exercise 8.
PDT Grammatemes in the PDT 2.0 Zdeněk Žabokrtský Dept. of Formal and Applied Linguistics Charles University, Prague
nd PIRE project workshop1 Tectogrammatical Representation of English Silvie Cinková Lucie Mladová, Anja Nedoluzhko, Jiří Semecký, Jana Šindlerová,
March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Intro 1 The Prague Dependency Treebank (PDT) Introduction Jan Hajič Institute.
Annotation Procedure in Building the Prague Czech-English Dependency Treebank Marie Mikulová and Jan Štěpánek Institute of Formal and Applied Linguistics.
Syntactic Annotation of Slovene Corpora (SDT, JOS) Nina Ledinek ISJ ZRC SAZU
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
Machine Translation using Tectogrammatics Zdeněk Žabokrtský IFAL, Charles University in Prague.
Leonid Iomdin Institute for Information Transmission Problems, Russian Academy of Sciences
Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.
Arabic Syntactic Trees Zdeněk Žabokrtský Otakar Smrž Center for Computational Linguistics Faculty of Mathematics and Physics Charles University in Prague.
11/23/00UNU/IAS/UNL Centre1 The Universal Networking Language United Nations University Institute of Advanced Studies United Networking Language ® UNU/IAS.
Overview of Statistical NLP IR Group Meeting March 7, 2006.
NSF PARTNERSHIP FOR RESEARCH AND EDUCATION : M EANING R EPRESENTATION FOR S TATISTICAL L ANGUAGE P ROCESSING 1 TectoMT TectoMT = highly modular software.
March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 1 PDT: Tectogrammatical Representation Jan Hajič Institute.
Semantic annotation of a dialog corpus Silvie Cinková Institute of Formal and Applied Linguistics Charles University in Prague, Czech Republic COMPANIONS.
Prague Czech-English Dependency Treebank 2.0 ufal.mff.cuni.cz/pcedt2.0 Silvie Cinková, Marie Mikulová, Jan Štěpánek & professors, annotators and programmers.
1/16 TectoMT Zdeněk Žabokrtský ÚFAL MFF UK Software framework for developing MT systems (and other NLP applications)
Netgraph – a Tool for Searching in the Prague Dependency Treebank 2.0 Defence of the Doctoral Thesis, Prague, September 3 rd, 2008 Author: Mgr. Jiří Mírovský.
Approaches to Machine Translation
Prague Arabic Dependency Treebank
A Statistical Model for Parsing Czech
Prague Dependency Treebank 2. 0 Zdeněk Žabokrtský Dept
Universal Dependencies
Approaches to Machine Translation
CS224N Section 3: Corpora, etc.
Information Retrieval
Presentation transcript:

Prague Dependency Treebank(s) Workshop at LSA2011, Part II Jan Hajič, Zdeňka Urešová Institute of Formal and Applied Linguistics School of Computer Science Faculty of Mathematics and Physics Charles University, Prague Czech Republic

July 30, 2011LSA 2011 Prague Dependency Treebanks II2 Part II - Syntax and Semantics Tectogrammatical representation Valency lexicon Languages Czech, Arabic and English Technical issues Annotation scheme and format Tools for annotation Applications Summary, pointers, conclusion

July 30, 2011LSA 2011 Prague Dependency Treebanks II3 PDT Annotation Layers L0 (w) Words (tokens) automatic segmentation and markup only L1 (m) Morphology Tag (full morphology, 13 categories), lemma L2 (a) Analytical layer (surface syntax) Dependency, analytical dependency function L3 (t) Tectogrammatical layer (“deep” syntax) Dependency, functor (detailed), grammatemes, ellipsis solution, coreference, topic/focus (deep word order), valency lexicon

July 30, 2011LSA 2011 Prague Dependency Treebanks II4 Layer 3 (t-layer): Tectogrammatical Underlying (deep) syntax 4 sublayers (integrated): dependency structure, (detailed) functors valency annotation topic/focus and deep word order coreference (mostly grammatical only) all the rest (grammatemes): detailed functors underlying gender, number,... Total 39 attributes (vs. 5 at m-layer, 2 at a-layer)

July 30, 2011LSA 2011 Prague Dependency Treebanks II5 Analytical vs. Tectogrammatical Underlying verb + tense Deep function Elided Actor in Prepositions out Another ellipsis... (TR: sublayer 1 only shown)

July 30, 2011LSA 2011 Prague Dependency Treebanks II6 Layer 3: Tectogrammatical Underlying (deep) syntax 4 sublayers: dependency structure, (detailed) functors topic/focus and deep word order coreference (mostly grammatical only) all the rest (grammatemes): detailed functors underlying gender, number,...

July 30, 2011LSA 2011 Prague Dependency Treebanks II7 Tectogrammatical Functors “Actants”: ACT, PAT, EFF, ADDR, ORIG modify: verbs, nouns, adjectives cannot repeat in a clause, usually obligatory Free modifications (~ 50), semantically defined can repeat; optional, sometimes obligatory Ex.: LOC, DIR1,...; TWHEN, TTILL,...; RSTR; BEN, ATT, ACMP, INTT, MANN; MAT, APP; ID, DPHR,... Special Coordination, Rhematizers, Foreign phrases,... syntactic semantic

July 30, 2011LSA 2011 Prague Dependency Treebanks II8 Tectogrammatical Example Analytical verb form:  (he) allowed would-be to-be enrolled  směl by být zapsán Additional attributes (grammatemes): conditional + “allow” Collapsed

July 30, 2011LSA 2011 Prague Dependency Treebanks II9 Tectogrammatical Example Passive construction (action)  (The) book has-been translated [by Mr. X]  Kniha byla přeložena Disappeared Added

July 30, 2011LSA 2011 Prague Dependency Treebanks II10 Tectogrammatical Example Object  (he) gave him a-book  dal mu knihu Obj goes into ACT, PAT, ADDR, EFF or ORIG based on governor’s valency frame

July 30, 2011LSA 2011 Prague Dependency Treebanks II11 Tectogrammatical Example Incomplete phrases  Peter works well, but Paul badly  Petr pracuje dobře, ale Pavel špatně Added

July 30, 2011LSA 2011 Prague Dependency Treebanks II12 Layer 3: Tectogrammatical Underlying (deep) syntax 4 sublayers: dependency structure, (detailed) functors topic/focus and deep word order coreference (mostly grammatical only) all the rest (grammatemes): detailed functors underlying gender, number,...

July 30, 2011LSA 2011 Prague Dependency Treebanks II13 Deep Word Order Topic/Focus Example: Baker bakes rolls. vs. Baker IC bakes rolls. Analytical dep. tree:

July 30, 2011LSA 2011 Prague Dependency Treebanks II14 Deep Word Order Topic/Focus Deep word order: from “old” information to the “new” one (left-to- right) at every level (head included) projectivity by definition (almost...) i.e., partial level-based order -> total d.w.o. Topic/focus/contrastive topic attribute of every node (t, f, c) restricted by d.w.o. and other constraints

July 30, 2011LSA 2011 Prague Dependency Treebanks II15 Layer 3: Tectogrammatical Underlying (deep) syntax 4 sublayers: dependency structure, (detailed) functors topic/focus and deep word order coreference (mostly grammatical only) all the rest (grammatemes): detailed functors underlying gender, number,...

July 30, 2011LSA 2011 Prague Dependency Treebanks II16 Coreference Grammatical relative clauses which, who  Peter and Paul, who... control infinitival constructions  John promised to go... reflexive pronouns {him,her,thme}self(-ves)  Mary saw herself in... John go he home promise PRED ACT PAT ACT DIR3

July 30, 2011LSA 2011 Prague Dependency Treebanks II17 Coreference Textual Ex.: Peter moved to Iowa after he finished his PhD.

July 30, 2011LSA 2011 Prague Dependency Treebanks II18 Layer 3: Tectogrammatical Underlying (deep) syntax 4 sublayers: dependency structure, (detailed) functors topic/focus and deep word order coreference (mostly grammatical only) all the rest (grammatemes): detailed functors underlying gender, number,...

July 30, 2011LSA 2011 Prague Dependency Treebanks II19 Grammatemes Detailed functors (subfunctors) only for some functors: TWHEN: before/after LOC: next-to, behind, in-front-of,... also: ACMP, BEN, CPR, DIR1, DIR2, DIR3, EXT Lexical (underlying) number (SG/PL), tense, modality, degree of comparison,... strictly only where necessary (agreement!)

July 30, 2011LSA 2011 Prague Dependency Treebanks II20 Example - simplified view Se zuby jsem měl v minulosti jen problémy. With teeth I-have had in the-past only problems.

July 30, 2011LSA 2011 Prague Dependency Treebanks II21 Fully Annotated Sentence The boundaries of some problems seem to be clearer after they were revived by Havel’s speech.

July 30, 2011LSA 2011 Prague Dependency Treebanks II22 Arabic Example: Tectogrammatics In the section on literature, the magazine presented the issue of the Arabic language and the dangers that threaten it.

July 30, 2011LSA 2011 Prague Dependency Treebanks II23 English PDT-style Annotation Morphology and Syntax By conversion Tectogrammatical annotation Guidelines (English TR: by S. Cinková) Pre-annotation Transformation from Penn Treebank & Propbank (Palmer, Kingsbury) by Z. Žabokrtský et al. Valency From Propbank Frame Files (Cinková, Šindlerová, Nedolužko, Semecký)

July 30, 2011LSA 2011 Prague Dependency Treebanks II24 Example - English TR Words Dependencies Sem. function Valency (predicates) Coref (BBN) Named Entities (BBN)

July 30, 2011LSA 2011 Prague Dependency Treebanks II25 Valency in the PDT Valency: specific ability of a word to combine itself with other units of meaning dát (give) Eva matka (mother) ACT ADDR pršet (rain) zítra (tomorrow) TWHEN plakat (cry) Adam noc (night) ACT TWHEN Specific behavior dar (gift) PAT neděle (Sunday) TWHEN --- Modifies anything

July 30, 2011LSA 2011 Prague Dependency Treebanks II26 Valency - Basic Principles inner participants vs. free modifications (arguments vs. adjuncts) obligatory vs. optional modifications (the dialogue test)

July 30, 2011LSA 2011 Prague Dependency Treebanks II27 Inner Participant … … Free Modification ACT(or), PAT(ient) ADDR(essee), EFF(ect), ORIG(in) (5)  each occurs just with particular verbs  each modifies the verb only once (in a clause) Location (LOC, DIR1,…) Time (TWHEN, TTILL, …), Manner, Intention,… (70)  can modify in principle any verb  can be repeated (within the same clause)

July 30, 2011LSA 2011 Prague Dependency Treebanks II28 Inner Participants syntactic criteria - Actor and Patient semantic criteria for other inner participants (if a verb has more than two arguments) Argument shifting Actor Patient Addressee Origin Effect Petr has dug a hole. The teacher asked a pupil.  Semantic Effect (as a cognitive role) shifted to the position of Patient.  Semantic Addresse shifted to the position of Patient.

July 30, 2011LSA 2011 Prague Dependency Treebanks II29 Obligatory … Optional A: John left. B: From where? A: *I don't know. A: John left. B: To where? A: I don't know. „ from where“  obligatory modification „to where“  optional modification The Dialogue Test Answering a question about a semantically obligatory modification, the speaker cannot say: I don't know.

July 30, 2011LSA 2011 Prague Dependency Treebanks II30 Valency frame obligatoryoptional argument adjunct Structure: one meaning of the word  one valency frame Contents :  functor  obligatoriness  surface form word: leave meaning 1: sb left sth meaning 2: sb left from somewhere frame1: ACT PAT frame2: ACT DIR1

July 30, 2011LSA 2011 Prague Dependency Treebanks II31 Valency lexicon: PDT-VALLEX  8500 verb senses / valency frames  9000 noun sense / valency frames  some adjectives and adverbs PDT-VALLEX Entry verb: dosáhnout meaning 1: to reach sth meaning 2: to get sb to do sth meaning 3: … meaning 4: …

July 30, 2011LSA 2011 Prague Dependency Treebanks II32 The PDT-VALLEX editor ‘lay down’ resign win ask senses:

July 30, 2011LSA 2011 Prague Dependency Treebanks II33 Valency Lexicon and TrEd to write sth (about sth)

July 30, 2011LSA 2011 Prague Dependency Treebanks II34 Corpus Valency Lexicon Corpus – occurrences of „uzavřít“ (to close) : ENTRY: uzavřít vf 1 : ACT(.1) CPHR({smlouva}.4) ex: u. dohodu (close a contract) vf 2 : ACT(.1) PAT(.4) ex.: u. pokoj (close a room, house) Lexicon: Sentence 2035: Sentence 15345:Sentence 51042:

July 30, 2011LSA 2011 Prague Dependency Treebanks II35 Valency and Text Generation Tectogrammatical Representation has all the information to (re)generate the surface form of the sentence: in a “generalized” form non-redundant (almost... but for generation, it is o.k.)...except the links to a-layer, however links used only for training [statistical models for] parsing/generation modules not present when e.g. doing text planning, translation,... valency dictionary: form of “learned” knowledge

July 30, 2011LSA 2011 Prague Dependency Treebanks II36 Valency and Text Generation Using valency for......getting the correct (lemma, tag) of verb arguments Example: starat_se PRED Martin ACT tygr PAT Martin starat V o tygr VALLEX entry: starat (se) ACT(.1) PAT(o.[.4]) se Martin se stará o tygry. “Martin takes care of tigers.” “to take care of” “tiger”

July 30, 2011LSA 2011 Prague Dependency Treebanks II37 The Annotation Process 4 sublayers work on structure first, rest in parallel Structure automatic preprocessing - programmed conversion from analytical layer annotation Grammatemes mostly automatically (based on lower layers’ annotation), manual checking, corrections Cross-sublayer/cross-layer checking partly automatic, then manual

July 30, 2011LSA 2011 Prague Dependency Treebanks II38 The Annotation Process Scheme

July 30, 2011LSA 2011 Prague Dependency Treebanks II39 Tectogrammatical Annotation Tools Manual annotation 4 groups of annotators ~ 4 sublayers Special graphical tool (TrEd) Customizable graphical tree editor Preprocessing Data from analytical layer, preprocessed Online dependency function preassignment

July 30, 2011LSA 2011 Prague Dependency Treebanks II40 The [Manual] Annotation Tool Perl/PerlTk based, platform-independent Linux, Windows 95/98/2000, Solaris,... Perl as the “macro” language “unlimited” online processing capability Flexibility for interactive checking split screen, graphical “diff” function Customization, printing, “plugins”,...

July 30, 2011LSA 2011 Prague Dependency Treebanks II41 The Annotation Scheme XML + principles of linear- and tree-based standoff annotation  PML (Prague Markup Language) Layer schemes (Relax NG) PDT/PADT: t(ecto), a(nalytic), m(orphology), … English: + phrase-based (p-layer)

July 30, 2011LSA 2011 Prague Dependency Treebanks II42 PML/XML Annotation Layers Strictly top-down links w+m+a can be easily “knitted” API for cross-layer access (programming) PML Schema / Relax NG [z and audio layers: used for spoken data (audio as layer “-1”)] LFG analogy: f-struct Φ c-struct z-layer audio BYL BYS ČELO LESA …

July 30, 2011LSA 2011 Prague Dependency Treebanks II43 The Prague Markup Language Example m-layer data, linked to w-layer: manual w#w-tr/_12941_01_00013.fs-s1w4 basic pocházela pocházet_:T VpQW---XR-AA Pointer to w-layer

July 30, 2011LSA 2011 Prague Dependency Treebanks II44 PDT 2.0: The Data Data sizes

July 30, 2011LSA 2011 Prague Dependency Treebanks II45 Searching the Treebanks TrEd extension: PML-TQ Backend: database server Frontend: TrEd or Web browser Web access Sample data (Czech, English [soon]): anonymous / anonymous Full access (LSA 2011 particiapnts only, 2011): LSA2011 / UC.Boulder Full access: licence needed for the corpora Available later this year at

July 30, 2011LSA 2011 Prague Dependency Treebanks II46 Using the Results: Parsing Several parsers of Czech Analytical layer dependency syntax Trained on PDT 1.0 data, 1.2 mil. words Collins(98), Charniak(00), Žabokrtský(02), Ribarov(04), Nivre(05), Zeman(05), McDonald(05), CoNLL’06 (19 parsers) Best results accuracy: percent of correct dependencies: 84-85% for a single parser, > 86% for a combination

July 30, 2011LSA 2011 Prague Dependency Treebanks II47 Tectogrammatical Parsing Newest results: 4 phases Transformation -based learning FnTBL Largely langu- age independent Coreference: >90% m- and a-layer: Attributemanualauto structure89,3 %76,4 % functor85,5 %77,4 % val_frame.rf 92,3 %90,9 % t_lemma 93,5 %90,9 % nodetype 94,5 %92,6 % gram/sempos 93,8 %91,5 % a/lex.rf 96,5 %95,1 % a/aux.rf 94,3 %90,3 % is_member 94,3 %89,5 % is_generated 96,6 %95,2 % deepord 68,0 %66,7 %

July 30, 2011LSA 2011 Prague Dependency Treebanks II48 Tectogrammatical Layer in Machine Translation The Translation (“Vauquois”) triangle transfer source target Tectogrammatical Representation Surface Syntax Morphology Generation Cz En

July 30, 2011LSA 2011 Prague Dependency Treebanks II49 Dependency trees in MT According to his opinion UAL's executives were misinformed about the financing of the original transaction. Transfer: Podle jeho názoru bylo vedení UAL o financování původní transakce nesprávně informováno. - structure (~0) - lexical - functions - grammatical

July 30, 2011LSA 2011 Prague Dependency Treebanks II50 Analytical Layer Correspondence

July 30, 2011LSA 2011 Prague Dependency Treebanks II51 Tectogrammatical Correspondence The [Homestead’s] only remaining baker bakes the most famous rolls to the north of Long River. ‘al-xabaaz ‘al-’axiir ‘al-baaqii [fii Homestead] yaśmacu ‘ashhar ‘al-kruasaanaat ilaa shimaal min Long River.

July 30, 2011LSA 2011 Prague Dependency Treebanks II52 Valency and Translation leave: leave-1 to leave [from] somewhere leave-2 to leave sth for sb Translating (from English into Czech): which equivalent to chose? nechat vs. odjet/opustit which prepositions, cases,... to use? accusative vs. “z” (“from”) with genitive vs....?

July 30, 2011LSA 2011 Prague Dependency Treebanks II53 Valency and Translation leave-1 nechat-3 ACT() PAT() LOC() ACT(.1) PAT(.4) LOC() leave-2 odjet-1 ACT() DIR1(from.) ACT(.1) DIR1(z.[.2])

July 30, 2011LSA 2011 Prague Dependency Treebanks II54 To summarize… PDT is/has (a)… Dependency-based treebanking project Czech (other languages: – Eng, Ar) Ongoing projects (other inst.): Italian, Old Greek, Latin, … ~ 1mil. words sufficient size for ML experiments 4 layers of annotation token, morphology, syntax, deep syntax/semantics++) independent and full information at all levels, but... interlinked (for the development of parsers/generators) Valency dictionary integrated (links from data)

July 30, 2011LSA 2011 Prague Dependency Treebanks II55 Some pointers Current version of PDT: v2.0, LDC2006T01 all three levels, 1.9/1.5/0.8 Mwords Research -> Corpora (Treebank(s)) LDC2001T10 (PDT v1.0), LDC2004T23 (PADT 1.0), LDC2004T25 (PCEDT 1.0), LDC2006T01 (PDT 2.0) Workshop 2002 Using TL for MT Generation 1 st version of English dep. Treebank This workshp page, many links to resources, tools