March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax.

Slides:



Advertisements
Similar presentations
June 6, 20073rd PIRE Meeting1 Tectogrammatical Representation of English in Prague Czech-English Dependency Treebank Lucie Mladová Silvie Cinková, Kristýna.
Advertisements

Prague Arabic Dependency Treebank Center for Computational Linguistics Institute of Formal and Applied Linguistics Charles University in Prague MorphoTrees.
LING NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006.
Markéta Lopatková Institute of Formal and Applied Linguistics, MFF UK Prague Dependency Treebank: Annotation of Surface Syntax.
1 Words and the Lexicon September 10th 2009 Lecture #3.
Introduction to treebanks Session 1: 7/08/
Introduction to Computational Linguistics Lecture 2.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Word Classes and English Grammar.
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks: Layering the Annotation Jan Hajič Institute of Formal and Applied Linguistics.
Syntax Lecture 4.
NLP and Speech 2004 English Grammar
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 2.
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks: Language-specific Issues Czech Jan Hajič Institute of Formal and Applied Linguistics.
April 26, 2007Workshop on Treebanking, NAACL-HTL 2007 Rochester1 Treebanks and Parsing Jan Hajič Institute of Formal and Applied Linguistics School of.
Young Children Learn a Native English Anat Ninio The Hebrew University, Jerusalem 2010 Conference of Human Development, Fordham University, New York Background:
Workshop on Treebanks, Rochester NY, April 26, 2007 The Penn Treebank: Lessons Learned and Current Methodology Ann Bies Linguistic Data Consortium, University.
PDT 2.0 Prague Dependency Treebank 2.0 Zdeněk Žabokrtský Dept. of Formal and Applied Linguistics Charles University, Prague.
1 LIN 1310B Introduction to Linguistics Prof: Nikolay Slavkov TA: Qinghua Tang CLASS 14, Feb 27, 2007.
Leonid Iomdin Institute for Information Transmission Problems, Russian Academy of Sciences
Leonid Iomdin Institute for Information Transmission Problems, Russian Academy of Sciences
PDT Grammatemes and Coreference in the PDT 2.0 Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University in Prague.
1/21 Introduction to TectoMT Zdeněk Žabokrtský, Martin Popel Institute of Formal and Applied Linguistics Charles University in Prague CLARA Course on Treebank.
1 Introduction to Natural Language Processing ( ) Linguistic Essentials: Syntax AI-lab
THE BIG PICTURE Basic Assumptions Linguistics is the empirical science that studies language (or linguistic behavior) Linguistics proposes theories (models)
Morphological Meanings in the Prague Dependency Treebank Magda Razímová Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University,
Tree-based Machine Translation using syntax and semantics
April 17, 2007MT Marathon: Tree-based Translation1 Tree-based Translation with Tectogrammatical Representation Jan Hajič Institute of Formal and Applied.
The Prague (Czech-)English Dependency Treebank Jan Hajič Charles University in Prague Computer Science School Institute of Formal and Applied Linguistics.
SYNTAX Lecture -1 SMRITI SINGH.
CS : Language Technology for the Web/Natural Language Processing Pushpak Bhattacharyya CSE Dept., IIT Bombay Constituent Parsing and Algorithms (with.
Treebanks and MWEs (Part 1) Jan Hajič, Pavel Straňák, Jiří Mírovský Institute of Formal and Applied Linguistics & LINDAT/CLARIN School of Computer Science.
Chapter 5 Syntax English Linguistics: An Introduction.
Cs target cs target en source Subject-PastParticiple agreement Czech subject and past participle must agree in number and gender. Two-step translation.
Jan Hajič Otakar Smrž Petr Zemánek Jan Šnaidauf Emanuel Beška Faculty of Mathematics and Physics Faculty of Philosophy and Arts Charles University in Prague.
Prague Arabic Dependency Treebank MALACH Workshop in Prague August 28, 2003 Introduction & Related Projects Otakar Smrž et al.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Fall 2005-Lecture 4.
Resemblances between Meaning-Text Theory and Functional Generative Description Zdeněk Žabokrtský Institute of Formal and Applied Linguistics Charles University,
What you have learned and how you can use it : Grammars and Lexicons Parts I-III.
CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.
1 / 5 Zdeněk Žabokrtský: Automatic Functor Assignment in the PDT Automatic Functor Assignment (AFA) in the Prague Dependency Treebank PDT : –a long term.
Prague Dependency Treebank(s) Workshop at LSA2011, Part I Jan Hajič, Zdeňka Urešová Institute of Formal and Applied Linguistics School of Computer Science.
PDT Grammatemes in the PDT 2.0 Zdeněk Žabokrtský Dept. of Formal and Applied Linguistics Charles University, Prague
Phrases and Clauses Adjective, Adverb, Prepositional Phrases. Embedding. Coordination and Apposition. Introduction to Clauses.
Prague Dependency Treebank 1.0 Functional Generative Description.
nd PIRE project workshop1 Tectogrammatical Representation of English Silvie Cinková Lucie Mladová, Anja Nedoluzhko, Jiří Semecký, Jana Šindlerová,
March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Intro 1 The Prague Dependency Treebank (PDT) Introduction Jan Hajič Institute.
Annotation Procedure in Building the Prague Czech-English Dependency Treebank Marie Mikulová and Jan Štěpánek Institute of Formal and Applied Linguistics.
Syntactic Annotation of Slovene Corpora (SDT, JOS) Nina Ledinek ISJ ZRC SAZU
CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Nov, 16, 2015.
1 Introduction to Computational Linguistics Eleni Miltsakaki AUTH Spring 2006-Lecture 2.
Leonid Iomdin Institute for Information Transmission Problems, Russian Academy of Sciences
SYNTAX.
Building Sub-Corpora Suitable for Extraction of Lexico-Syntactic Information Ondřej Bojar, Institute of Formal and Applied Linguistics, ÚFAL.
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
Arabic Syntactic Trees Zdeněk Žabokrtský Otakar Smrž Center for Computational Linguistics Faculty of Mathematics and Physics Charles University in Prague.
March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Tectogrammatics 1 PDT: Tectogrammatical Representation Jan Hajič Institute.
Semantic annotation of a dialog corpus Silvie Cinková Institute of Formal and Applied Linguistics Charles University in Prague, Czech Republic COMPANIONS.
Welcome to the flashcards tool for ‘The Study of Language, 5 th edition’, Chapter 8 This is designed as a simple supplementary resource for this textbook,
1/16 TectoMT Zdeněk Žabokrtský ÚFAL MFF UK Software framework for developing MT systems (and other NLP applications)
Netgraph – a Tool for Searching in the Prague Dependency Treebank 2.0 Defence of the Doctoral Thesis, Prague, September 3 rd, 2008 Author: Mgr. Jiří Mírovský.
Lecture 9: Part of Speech
Beginning Syntax Linda Thomas
Lecture – VIII Monojit Choudhury RS, CSE, IIT Kharagpur
Chapter Eight Syntax.
Introduction to Linguistics
Prague Arabic Dependency Treebank
A Statistical Model for Parsing Czech
Prague Dependency Treebank 2. 0 Zdeněk Žabokrtský Dept
Chapter Eight Syntax.
THE LEXEME WORD-FORM GRAMMATICAL WORD MORPHEME MORPH ALLOMORPH
Presentation transcript:

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 1 The PDT Morphology and Surface Syntax Jan Hajič Institute of Formal and Applied Linguistics School of Computer Science Faculty of Mathematics and Physics Charles University, Prague Czech Republic

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 2 Morphology (m-layer) Prerequisites for the manual annotation process: Tokenized data Annotation guidelines Annotation tool Manual decision making support Offline (or online) morphological analyzer Quality checking tool Process description Results (manually annotated data) to be used for... tagger training, linguistic research, basis for further annotation,...

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 3 Morphological Attributes Tag: 13 categories Example: AAFP3----3N---- Adjective no poss. Gender negated Regular no poss. Number no voice Feminine no person reserve1 Plural no tense reserve2 Dative superlative base var. Lemma: POS-unique identifier Books/verb -> book-1, went -> go, to/prep. -> to-1 Ex.: nejnezajímavějším “(to) the most uninteresting”

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 4 Morphological Tagset 13 categories, 4452 plausible tags (combinations):

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 5 Morphological Analysis Formally: MA: A + → Pow(L x T) MA(f) = { [ l,t ] }; f  A + (the token), l  L (lemma), t  T (tag) tokens taken in isolation no attempt to solve e.g. auxiliaries vs. full verbs Ex.: MA(“má“) = { [mít,VB-S---3P-AA---], lit. “to have” lit. “has”,”my” [můj,PSFS1-S ], lit. “my” [můj,PSFS5-S ], [můj,PSNP1-S ], [můj,PSNP4-S ], [můj,PSNP5-S ] }

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 6 Morphological Analysis: Implementation Dictionary-based covers 800kW (lemmas), ~ 20 mil. forms (w/tag) C code implementation standard (regular) derivations on-the-fly; ex.: spojit spojený spojený spojenost spojitelnýspojitelný spojitelnost irregular forms listed in dictionary (w/tags) no phonological processing (concatenation only) grammatical prefixes only: negation, superlative joinedly joinjoined joinedliness joinably joinablejoinability

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 7 The Morphological Annotation Tool (LAW)

March 5, 2008Companions Semantic Representation and Dialog Interfacing Workshop - Morphology and Surface Syntax 8 The Process of Morphological Annotation From tokenized to annotated text: (Auto) morphological analysis morphological dictionary tokenized text (auto, w-layer) text w/morph. interpretations Manual morphological disambiguation (DA) text w/select. interpretation Manual adjudication annotated text (m-layer) annotation guidelines

9 PDT – Syntactic Annotation Surface syntax annotation Dependency surface syntax Comparable to Penn Treebank annotation Convertible: dependency ↔ parse trees Deep syntactic/semantic annotation Dependency trees Different topology High level of generalization and formalization Many node attributes

10 Analytical Syntax (a-layer) Dependency + Analytical Function dependent governor The influence of the Mexican crisis on Central and Eastern Europe has apparently been underestimated.

11 Analytical Syntax: Functions Main (for [main] semantic lexemes): Pred, Sb, Obj, Adv, Atr, Atv(V), AuxV, Pnom “Double” dependency: AtrAdv, AtrObj, AtrAtr Special (function words, punctuation,...): Reflefives, particles: AuxT, AuxR, AuxO, AuxZ, AuxY Prepositions/Conjunctions: AuxP, AuxC Punctuation, Graphics: AuxX, AuxS, AuxG, AuxK Structural Elipsis: ExD, Coordination etc.: Coord, Apos

12 Example All came from Cray Research.

13 Surface Syntax Example Complete sentence: Sb, Pred, Obj  Resistance needs courage.

14 Surface Syntax Example Analytical verb form:  he would be allowed to be enrolled

15 Surface Syntax Example Predicate with copula (state)‏  you were fired

16 Surface Syntax Example Passive construction (action)‏  (The) book has been translated [by Mr. X]

17 Surface Syntax Example Complement  she left crying

18 Surface Syntax Example Object  he gave Mary a book

19 Surface Syntax Example Object used for infinitive of analytical verb forms  he wants to learn

20 Surface Syntax Example Relative clause (embedded)‏  the woman, who had a French accent, was very pretty

21 Surface Syntax Example Coordination ... (to) magic, mysticism(,) etc.

22 Surface Syntax Example Apposition  cheap, i.e. under five dollars

23 Incomplete phrases  Peter works well, but Paul badly Surface Syntax Example

24 Surface Syntax Example Variants (equality)‏  he bought shoes for his son

25 XML Annotation Layers (English) Strictly top-down links w+m+a can be easily “knitted” API for cross-layer access (programming)‏ PML Schema / Relax NG [With slight modification, can be used for spoken data (audio as layer “-1”)]