CGMIL 2008 - Hyderabad - India An Italian-English dependency parser and its [possible] application to Hindi Leonardo Lesmo Natural.

Slides:



Advertisements
Similar presentations
Language and Grammar Grammar – rules used to organise and describe language Syntax - the way sentences are structured Parts of speech: Nouns – people,
Advertisements

Chapter 4 Syntax Part IV.
The Rule-based Parser of the NLP Group of the University of Torino
Unit 4 Part II.
What you’ll need to know for Freshman DGP
Grammar Rule: Kinds of Sentence Structure: Simple, Compound, Complex, and Compound-Complex Sentences Chapters 15 & 16 in Elements of Language Essential.
Day 1 Punctuation and Capitalization
 Nouns name persons, places, things, or ideas.  Proper: CAPITAL LETTERS  Montana, Sally, United States of America  Common: no capital letters  state,
What is a sentence?. A sentence is a group of words that expresses a complete thought. Ex. This gift is for you. Every sentence has two parts:  Subject.
Parts of Speech 8 Key Terms. Parts of Speech * Nouns* Adverbs * Pronouns * Prepositions * Verbs * Conjunctions * Adjectives * Interjections.
PRONOUNS LESSON 1. WHAT IS A PRONOUN? Pronouns take the place of nouns to name persons, places, things, or ideas.
The Eight Parts of Speech
LING NLP 1 Introduction to Computational Linguistics Martha Palmer April 19, 2006.
Stemming, tagging and chunking Text analysis short of parsing.
Dictionary.
Grade 10 Grammar Notes Eight Parts of Speech Clause and simple sentence AVHS English Department.
Grammar Skills Workshop
Parts of Speech.
Chapter 4 Basics of English Grammar Business Communication Copyright 2010 South-Western Cengage Learning.
Parts of Speech.
THE PARTS OF SPEECH. PART OF SPEECH  All words serve a particular function in a sentence.  A word’s function is determined by what “part of speech”
The 8 Principal Parts of Speech
Chapter 4 Syntax Part II.
Parts of Speech and Functions of Words.
Parts of Speech. Noun 0 Names a person, place, thing, or idea 0 Common Noun: girl, shoe, dog 0 Proper Noun: Julie, Nike, Labrador Retreiver 0 If you an.
ASPECTS OF LINGUISTIC COMPETENCE 4 SEPT 09, 2013 – DAY 6 Brain & Language LING NSCI Harry Howard Tulane University.
The Parts of Speech and Sentence Formulas
Dr. Monira Al-Mohizea MORPHOLOGY & SYNTAX WEEK 11.
Instructor: Jully Yin Meeting Room: Room 209. Ms. Jully Yin has been instructing at National Taipei University since Education: Ms. Jully Yin has.
DGP MONDAY NOTES (Parts of Speech) NOUNPRONOUNADVERB ADJECTIVE PREPOSITIONS CONJUNCTION VERB VERBAL.
Pronouns.   Common Noun  Person, place, or thing  Proper noun  The specific name of a person or place  Article  Identifies a noun as a noun  Definite:
8 Parts of Speech Noun Pronoun Adjective Verb Adverb Preposition Conjunction Interjection.
Parts of Speech A Brief Review. Noun Person, Place, Thing, or Idea Common: begins with lower case letter (city) Proper: begins with capital letter (Detroit)
CPE 480 Natural Language Processing Lecture 4: Syntax Adapted from Owen Rambow’s slides for CSc Fall 2006.
PARTS OF SPEECHPARTS OF SPEECH. NOUNS Definition: A noun names a person, place, or thing. Example: John, computer, honesty, school A singular noun is.
Parts of Speech Major source: Wikipedia. Adjectives An adjective is a word that modifies a noun or a pronoun, usually by describing it or making its meaning.
English Review for Final These are the chapters to review. In Textbook: Chapter 9 Nouns Chapter 10 Pronouns Chapter 11 Adjectives Chapter 12 Verbs Chapter.
Parts of Speech A Brief Review. Noun Person, Place, Thing, or Idea Common: begins with lower case letter (city) Proper: begins with capital letter (Detroit)
What are Determiners? Unit 14 – Presentation 1 “a broad category of the English grammar that contains many subcategories in it, e.g. demonstrative & indefinite.
◦ Process of describing the structure of phrases and sentences Chapter 8 - Phrases and sentences: grammar1.
Parts of Speech Melinda Norris Start. How to navigate through this tutorial At the bottom of each page, you will see buttons that allow you to move to.
Language Arts The Eight Parts of Speech The Eight Parts of Speech.
Unit 1 Language Parts of Speech. Nouns A noun is a word that names a person, place, thing, or idea Common noun - general name Proper noun – specific name.
PARTS OF SPEECH PACKET English 10. NOUNS  A noun is a word used to name a person, place, thing, or idea  A proper noun is ALWAYS capitalized and it.
The Magic Lens Introduction to Grammar. Grammar A way of thinking about language.
The Four Levels of Grammar 1. Parts of Speech 2. Parts of the Sentence 3. Phrases 4. Clauses.
The Eight Parts of Speech Yes!! Awesome!! Finally!! English is so much fun!!
---DGP Instructions--- MONDAY: Parts of Speech. Steps for Mondays 1. Find and label all nouns. Be aware of gerunds or infinitives acting as nouns. 2.
Lecture 1 Sentences Verbs.
Syntax Parts of Speech and Parts of the Sentence.
English Grammar Parts of Speech.
English Basics Mrs.Azzah.
Day 1: Punctuation & Capitalization
What is a sentence? A sentence is a group of words that expresses a complete thought. Ex. This gift is for you. Every sentence has two parts: Subject Predicate.
Day 1: Punctuation & Capitalization
Chapter 4 Basics of English Grammar
Parts of Speech The Basics.
PARTS OF SPEECH L.Nabulsi.
Parts of Speech Friendly Feud
8 Parts of Speech REVIEW: Eleventh Grade
The Eight Parts of Speech
Parts of the speech and abbreviations
DGP TUESDAY NOTES (Parts of Speech)
Parts of Speech Review.
Parts of Speech Nouns Prepositions Pronouns Conjunctions
English parts of speech
Copyright © English3000.org.
PARTS OF SPEECH L.Nabulsi.
Chapter 4 Basics of English Grammar
Parts of Speech Review.
Presentation transcript:

CGMIL Hyderabad - India An Italian-English dependency parser and its [possible] application to Hindi Leonardo Lesmo Natural Language Processing Group (Dip. Informatica – Univ. Torino) (

CGMIL Hyderabad - India OUTLINE §The Turin University Parser §Performances §The Turin University Treebank (TUT) §Mapping between TUT and AnnCorra §Current activities and the future

CGMIL Hyderabad - India Post-processingSegmentation Analysis of Conjunctions Chunking Tagging rules Lexical access Verbal Attachment Verbal subcategories Verbal frames THE PARSER Dictionary Morphology POS tagging Chunking rules

CGMIL Hyderabad - India When the man that you mentioned sent me that beautiful message, I fell in love with him When [the man] that you mentioned sent me [that beautiful message], I fell [in love] [with him] chunking {{When [the man] {that you mentioned} sent me [that beautiful message]}, I fell [in love] [with him] } segmentationcaseframing AN EXAMPLE

CGMIL Hyderabad - India beautiful verb-obj to fall I verb+fin-rmod- time when verb-subj prep-arg in with rmod message conj-arg himto send that det+def- arg adjc+qualif-rmod prep-arg love me verb-indobj verb-subj the man verb- indcomp*locut to mention that verb-rmod+ relcl det+def- arg verb-subjverb-obj you THE FINAL RESULT

CGMIL Hyderabad - India 1 When (WHEN CONJ SUBORD TIME) [7;VERB+FIN-RMOD-TIME] 2 the (THE ART DEF ALLVAL ALLVAL) [7;VERB-SUBJ] 3 man (MAN NOUN COMMON M SING) [2;DET+DEF-ARG] 4 that (THAT PRON RELAT ALLVAL ALLVAL LSUBJ+OBL) [6;VERB-OBJ] 5 you (YOU PRON PERS ALLVAL ALLVAL 2 LSUBJ+LOBJ+LIOBJ+OBL) [6;VERB-SUBJ] 6 mentioned (MENTION VERB MAIN IND PAST ALLVAL ALLVAL) [3;VERB-RMOD-RELCL] 7 sent (SEND VERB MAIN IND PAST ALLVAL ALLVAL) [1;CONJ-ARG] 8 me (I PRON PERS ALLVAL SING 1 LOBJ+LIOBJ+OBL) [7;VERB-INDCOMPL-THEME] 9 that (THAT ADJ DEMONS ALLVAL SING) [7;VERB-OBJ] 10 beautiful (BEAUTIFUL ADJ QUALIF ALLVAL ALLVAL) [11;ADJC+QUALIF-RMOD] 11 message (MESSAGE NOUN COMMON N SING) [9;DET+DEF-ARG] 12, (#\, PUNCT) [14;SEPARATOR] 13 I (I PRON PERS ALLVAL SING 1 LSUBJ) [14;VERB-SUBJ] 14 fell (FALL VERB MAIN IND PAST ALLVAL ALLVAL) [0;TOP-VERB] 15 in (IN PREP MONO) [14;PREP-RMOD] 16 love (LOVE NOUN COMMON N SING) [15;PREP-ARG] 17 with (WITH PREP MONO) [14;PREP-RMOD] 18 him (HE PRON PERS M SING 3 LOBJ+LIOBJ+OBL) [17;PREP-ARG] 19. (#\. PUNCT) [14;END] THE ACTUAL FORMAT

CGMIL Hyderabad - India LASUASLAS2Participant UniTo_Lesmo UniPi_Attardi IIIT_Mannem UniStuttIMS_Schielen *85.46*UPenn_Champollion UniRoma2_Zanzotto Results: Evalita 2007 LAS: Labeled Attachment Score UAS: Correct Attachment Score LAS2: Correct Label Score

CGMIL Hyderabad - India LASUAS CoNLLEPTCoNLLEPT UniPi_Attardi IIIT_Mannem UniStuttIMS_Schielen Comparison with CoNLL CoNLL: International contest for dependency parsers (multilanguage)

CGMIL Hyderabad - India The Turin University Treebank (TUT) Current size: Italian: 2200 sentences tokens (4635 traces; 6704 punctuation) English: 150 sentences 4250 tokens (253 traces; 513 punctuation ) English not yet online (under test)

CGMIL Hyderabad - India 1. ADJ (adjectives) - DEITT (deictic) next - DEMONS (demonstrative) such, this, that - EXCLAM (exclamative) - INDEF (indefinite) numerous, certain, few - INTERR (interrogative) what, which - ORDIN (ordinal) first, twentieth, last - ORDINSUFF (ordinal suffixes) nd, rd, th, st - POSS (possessive) my, your, their - QUALIF (qualificative) nice, big, English 2. ADV (adverbs) - ADFIRM (adfirmative) - ADVERS (adversative) although, though - COMPAR (comparative) less, more - CONCESS (concessive) also - DOUBT (doubt) perhaps - EXPLIC (explicative) that_is - INTERJ (interjections) at_any_rate - INTERR (interrogative) how, where, when, why - LIMIT (limit) just, only - LOC (locative) there, within, below, here - MANNER (manner) aloud, alright, well - NEG (negation) not - QUANT (quantification) little, rather, too - REASON (motivation) in_fact - STRENG (strengthening) even, moreover - SUPERL (superlative) most - TIME (time) sometime, afterward, already Parts of Speech (and “subtypes”)

CGMIL Hyderabad - India 3. ART (articles) - DEF (definite) the - INDEF (indefinite) a, another, - GENITIVE (genitive): 's 4. CONJ (conjunctions) - COORD (coordinative) and, but, or, neither, nor - SUBORD (subordinative) since, that, to, unless - COMPAR (comparative) than 5. DATE (dates) 08/06/ INTERJ (interjections) alas 7. MARKER (markers) 8. NOUN (nouns) - COMMON house, boy, chair - PROPER Mary, Italia, Italy, England 9. NUM (numbers) zero, twenty, 127, PHRAS (phrasals) yes, no 11. PREDET (predeterminers) all, both 12. PREP (prepositions) - MONO of, to, from, in - POLI during, above, under, in front of Parts of Speech (and “subtypes”) 2

CGMIL Hyderabad - India 13. PRON (pronouns) DEMONS (demonstrative) this, that, EXCLAM (exclamative) what INDEF (indefinite) everything, nobody, something INTERR (interrogative) what, who LOC (locative) I: ne, ci, vi ORDIN (ordinals) first, second, fiftieth PERS (personal) I, you, we, her POSS (possessive) mine, yours REFL-IMPERS (reflexive-impersonal) ci, vi, si, se RELAT (relative) that, who, which, where 14. PUNCT (punctuation) 15. SPECIAL (special) 16. VERB (verbs) MAIN (all standard verbs) go, eat, give, be (in “to be intelligent”) AUX (auxiliaries) be (in “to be kissed”) MOD (modals) must, can, will Parts of Speech (and “subtypes”) 3

CGMIL Hyderabad - India The labelling scheme Top Dependent Function Arg Modifier Nofunction adjc-arg advb-arg conj-arg noun-arg verb-arg verb-subj verb-obj verb-indobj verb-indcompl verb-predcompl AppositionRmod

CGMIL Hyderabad - India Nofunction Aux Contin Coordinator Emptycompl Interjection SeparatorVisitor Verb-expletive Aux+passive Aux+tense Aux+progressive Contin+denom Contin+locut Contin+prep Coordantec Coord Coord2nd The NOFUNCTION labels

CGMIL Hyderabad - India Some examples Aux+progressive: I am looking for … Aux+tense: … the debate has – to quite some extent - suffered from … Aux+passive: … whose historical experience is not marked by … Auxiliaries Contin+locut: … convinced of the feasibility … in order to reinforce … Continuations Contin+prep: … grown out of the millenniums … Contin+denom: Samuel Alexander asserted …

CGMIL Hyderabad - India The question of what we might consider to be an adequate … Visitors (and traces) the of prep-rmod question prep-arg what det+def- argverb-obj trace verb-subj trace to verb-predcompl+obj prep-arg be trace verb-subj an verb-obj considerwe verb-rmod+relcl verb-subj trace visitor might verb+modal-indcompl

CGMIL Hyderabad - India Coordination base: … is tautologous and without ontologic commitment … coord+basecoord2nd+base compar: … were more like mythical heroes than like the omnipotent God … coord+comparcoord2nd+comparcoordantec+compar correlat: … neither John nor his friends … coord2nd+correlatcoord+correlatcoordantec+correlat … and “word” traces compar: … Samuel asserted that mentality emerged … and then t asserted t Samuel that … coord+base coord2nd+base

CGMIL Hyderabad - India The AnnCorra scheme It is chunk-based (some elementary subtrees are left unanalysed) It involves 28 relations (arc labels) and 25 different POS (tabel below) There are some non-dependency labels (as for coordination (ccof) Some POS are merged (e.g. Demonstratives include both Adj and Pron)

CGMIL Hyderabad - India AnnCorraTUT Common NounNNNOUN (common) Proper NounNNPNOUN (proper) Location, TimeNSTADV (time), ADV (loc) PronounPRPPRON except the ones in Demonstrative and Question AdjectiveJJADJ except the ones in Demonstrative and Question AdverbRBADV (with some exceptions) DemonstrativeDEMPRON (demons), ADJ (demons) Question WordsWQADJ (interr), ADV (interr), PRON RELAT????, PRON (interr) Main verbVMVERB (main) Verb AuxVAUXVERB (aux), VERB (mod) Post positionPSPPREP ParticlesRPNone ConjunctsCCCONJ QuantifiersQFDET, PREDET Cardinal numbQCNUM Ordinal numbQOADJ (ordin), PRON (ordin) ClassifierCLNone IntensifierINTFADV (quant) InterjectionINJINTERJ NegationNEGADV (neg) QuotativeUTNone SymSYMSPECIAL or PUNCT Compounds*CNone ReduplicativeRDPNone EchoECHNone Mapping category labels

CGMIL Hyderabad - India §k1 (karta): the primary (or “most independent”) participant in the action (similar to agent)  VERB-SUBJ §k2 (karma): this is the secondary participant (often, the patient).  VERB-OBJ §k3 (karana): the instrument.  VERB-INDCOMPL-MEANSMANNER §k4 (sampradana): recipient or the beneficiary of an action  VERB-INDOBJ §k5 (apadana): the stationary element in a separation  ???? §k7 (adhikarana): the locus (spatial or temporal or abstract) of karta or karma. It is tagged as k7p, k7t or k7 depending on the type of location.  VERB-INDCOMPL- LOC The argument (karaka) labels Mapping arc labels

CGMIL Hyderabad - India must read verb+modal- indcompl I verb-subj the verb-obj book t det+def-arg (must read) I k1 (the book) k2 Mapping the structure Chunk-based structure of AnnCorra

CGMIL Hyderabad - India Current activities and the future A word about semantics: DTS theoremsstudents verb-subj heard threetwo verb-obj difficult det+quantif-argdet+indef-arg adjc+qualif-rmod quant(x): quant(y):   x student' 1 y theorem'difficult' 11 restr(x):restr(y): difficult' 1 study' yx student'theorem'

CGMIL Hyderabad - India difficult' 1 study' yx student'theorem' CTX Disambiguation: Semdep arcs 1 study' yx student'theorem' CTX difficult' 2x [ student’(x)  3y [theorem’(y)  study’(x,y) ]] 3y [ theorem’(y)  2x [student’(y)  study’(x,y) ]] Any more reading? 1 study' yx student'theorem' CTX difficult' ??? Branching Quantification (Independent Set)

CGMIL Hyderabad - India Current activities and the future Practical semantic interpretation based on ontological knowledge for DB access Extension of the treebank with semantic annotation (in cooperation with Johan Bos) Development of a graphical interface with a online server (Java implementation and socket-based connection with a Lisp server) Automatic analysis of legal texts for extracting information about trule amendments (date, modified text, new text)

CGMIL Hyderabad - India The future (last but not least) Morphological analysis of Hindi (mid-way) Development and testing of a Hindi parser and of mapping rules from Hindi to English and viceversa In cooperation with IIIT Hyderabad

CGMIL Hyderabad - India HEAD= w i w2w2 w1w1 w i+2 w i-1 w i+1 wnwn ? ?? ? ? ? ….. Function: Structure: (head-category head-subcategory (dependent-position (dependent-category (dependent-constraints))) ARC-LABEL) More on Parsing 1

CGMIL Hyderabad - India Examples: (ART DEF (before (PREDET (agree))) PDETMOD) i (cat=ART, subcat=DEF gender=m, number=pl) tutti (cat=PREDET, gender=m, number=pl) PDETMOD the all (NOUN COMMON (chunk-follows (ADJ (agree) (subcat qualif))) ADJCMOD-QUALIF) bello (cat=ADJ, subcat=QUALIF, gender=m, number=sing) giardino (cat=NOUN, gender=m, number=sing) ADJCMOD-QUALIF nicegarden molto (cat=ADV) very More on Parsing 2

CGMIL Hyderabad - India verbs nosubj- verbs subj- verbs obj- verbs basic-transempty-modal modal ssubj-inf- verbs trans indobj- verbs trans-indobj subcategorization classes bisognare camminare dovere dictionary potere need walk must can Verb subcategorization classes: More on Parsing 3

CGMIL Hyderabad - India Transformations: basic class (e.g. trans)transformed classes (e.g. trans, trans+passivization, trans+infinitivization, trans+prodrop, trans+passivization+infinitivization, ….. ) Example transformation: (infinitivization replacing (subj-verbs) (is-inf-form tr-verb v-casefr) (cancel-case s-subj)) More on Parsing 4