CGMIL Hyderabad - India An Italian-English dependency parser and its [possible] application to Hindi Leonardo Lesmo Natural Language Processing Group (Dip. Informatica – Univ. Torino) (
CGMIL Hyderabad - India OUTLINE §The Turin University Parser §Performances §The Turin University Treebank (TUT) §Mapping between TUT and AnnCorra §Current activities and the future
CGMIL Hyderabad - India Post-processingSegmentation Analysis of Conjunctions Chunking Tagging rules Lexical access Verbal Attachment Verbal subcategories Verbal frames THE PARSER Dictionary Morphology POS tagging Chunking rules
CGMIL Hyderabad - India When the man that you mentioned sent me that beautiful message, I fell in love with him When [the man] that you mentioned sent me [that beautiful message], I fell [in love] [with him] chunking {{When [the man] {that you mentioned} sent me [that beautiful message]}, I fell [in love] [with him] } segmentationcaseframing AN EXAMPLE
CGMIL Hyderabad - India beautiful verb-obj to fall I verb+fin-rmod- time when verb-subj prep-arg in with rmod message conj-arg himto send that det+def- arg adjc+qualif-rmod prep-arg love me verb-indobj verb-subj the man verb- indcomp*locut to mention that verb-rmod+ relcl det+def- arg verb-subjverb-obj you THE FINAL RESULT
CGMIL Hyderabad - India 1 When (WHEN CONJ SUBORD TIME) [7;VERB+FIN-RMOD-TIME] 2 the (THE ART DEF ALLVAL ALLVAL) [7;VERB-SUBJ] 3 man (MAN NOUN COMMON M SING) [2;DET+DEF-ARG] 4 that (THAT PRON RELAT ALLVAL ALLVAL LSUBJ+OBL) [6;VERB-OBJ] 5 you (YOU PRON PERS ALLVAL ALLVAL 2 LSUBJ+LOBJ+LIOBJ+OBL) [6;VERB-SUBJ] 6 mentioned (MENTION VERB MAIN IND PAST ALLVAL ALLVAL) [3;VERB-RMOD-RELCL] 7 sent (SEND VERB MAIN IND PAST ALLVAL ALLVAL) [1;CONJ-ARG] 8 me (I PRON PERS ALLVAL SING 1 LOBJ+LIOBJ+OBL) [7;VERB-INDCOMPL-THEME] 9 that (THAT ADJ DEMONS ALLVAL SING) [7;VERB-OBJ] 10 beautiful (BEAUTIFUL ADJ QUALIF ALLVAL ALLVAL) [11;ADJC+QUALIF-RMOD] 11 message (MESSAGE NOUN COMMON N SING) [9;DET+DEF-ARG] 12, (#\, PUNCT) [14;SEPARATOR] 13 I (I PRON PERS ALLVAL SING 1 LSUBJ) [14;VERB-SUBJ] 14 fell (FALL VERB MAIN IND PAST ALLVAL ALLVAL) [0;TOP-VERB] 15 in (IN PREP MONO) [14;PREP-RMOD] 16 love (LOVE NOUN COMMON N SING) [15;PREP-ARG] 17 with (WITH PREP MONO) [14;PREP-RMOD] 18 him (HE PRON PERS M SING 3 LOBJ+LIOBJ+OBL) [17;PREP-ARG] 19. (#\. PUNCT) [14;END] THE ACTUAL FORMAT
CGMIL Hyderabad - India LASUASLAS2Participant UniTo_Lesmo UniPi_Attardi IIIT_Mannem UniStuttIMS_Schielen *85.46*UPenn_Champollion UniRoma2_Zanzotto Results: Evalita 2007 LAS: Labeled Attachment Score UAS: Correct Attachment Score LAS2: Correct Label Score
CGMIL Hyderabad - India LASUAS CoNLLEPTCoNLLEPT UniPi_Attardi IIIT_Mannem UniStuttIMS_Schielen Comparison with CoNLL CoNLL: International contest for dependency parsers (multilanguage)
CGMIL Hyderabad - India The Turin University Treebank (TUT) Current size: Italian: 2200 sentences tokens (4635 traces; 6704 punctuation) English: 150 sentences 4250 tokens (253 traces; 513 punctuation ) English not yet online (under test)
CGMIL Hyderabad - India 1. ADJ (adjectives) - DEITT (deictic) next - DEMONS (demonstrative) such, this, that - EXCLAM (exclamative) - INDEF (indefinite) numerous, certain, few - INTERR (interrogative) what, which - ORDIN (ordinal) first, twentieth, last - ORDINSUFF (ordinal suffixes) nd, rd, th, st - POSS (possessive) my, your, their - QUALIF (qualificative) nice, big, English 2. ADV (adverbs) - ADFIRM (adfirmative) - ADVERS (adversative) although, though - COMPAR (comparative) less, more - CONCESS (concessive) also - DOUBT (doubt) perhaps - EXPLIC (explicative) that_is - INTERJ (interjections) at_any_rate - INTERR (interrogative) how, where, when, why - LIMIT (limit) just, only - LOC (locative) there, within, below, here - MANNER (manner) aloud, alright, well - NEG (negation) not - QUANT (quantification) little, rather, too - REASON (motivation) in_fact - STRENG (strengthening) even, moreover - SUPERL (superlative) most - TIME (time) sometime, afterward, already Parts of Speech (and “subtypes”)
CGMIL Hyderabad - India 3. ART (articles) - DEF (definite) the - INDEF (indefinite) a, another, - GENITIVE (genitive): 's 4. CONJ (conjunctions) - COORD (coordinative) and, but, or, neither, nor - SUBORD (subordinative) since, that, to, unless - COMPAR (comparative) than 5. DATE (dates) 08/06/ INTERJ (interjections) alas 7. MARKER (markers) 8. NOUN (nouns) - COMMON house, boy, chair - PROPER Mary, Italia, Italy, England 9. NUM (numbers) zero, twenty, 127, PHRAS (phrasals) yes, no 11. PREDET (predeterminers) all, both 12. PREP (prepositions) - MONO of, to, from, in - POLI during, above, under, in front of Parts of Speech (and “subtypes”) 2
CGMIL Hyderabad - India 13. PRON (pronouns) DEMONS (demonstrative) this, that, EXCLAM (exclamative) what INDEF (indefinite) everything, nobody, something INTERR (interrogative) what, who LOC (locative) I: ne, ci, vi ORDIN (ordinals) first, second, fiftieth PERS (personal) I, you, we, her POSS (possessive) mine, yours REFL-IMPERS (reflexive-impersonal) ci, vi, si, se RELAT (relative) that, who, which, where 14. PUNCT (punctuation) 15. SPECIAL (special) 16. VERB (verbs) MAIN (all standard verbs) go, eat, give, be (in “to be intelligent”) AUX (auxiliaries) be (in “to be kissed”) MOD (modals) must, can, will Parts of Speech (and “subtypes”) 3
CGMIL Hyderabad - India The labelling scheme Top Dependent Function Arg Modifier Nofunction adjc-arg advb-arg conj-arg noun-arg verb-arg verb-subj verb-obj verb-indobj verb-indcompl verb-predcompl AppositionRmod
CGMIL Hyderabad - India Nofunction Aux Contin Coordinator Emptycompl Interjection SeparatorVisitor Verb-expletive Aux+passive Aux+tense Aux+progressive Contin+denom Contin+locut Contin+prep Coordantec Coord Coord2nd The NOFUNCTION labels
CGMIL Hyderabad - India Some examples Aux+progressive: I am looking for … Aux+tense: … the debate has – to quite some extent - suffered from … Aux+passive: … whose historical experience is not marked by … Auxiliaries Contin+locut: … convinced of the feasibility … in order to reinforce … Continuations Contin+prep: … grown out of the millenniums … Contin+denom: Samuel Alexander asserted …
CGMIL Hyderabad - India The question of what we might consider to be an adequate … Visitors (and traces) the of prep-rmod question prep-arg what det+def- argverb-obj trace verb-subj trace to verb-predcompl+obj prep-arg be trace verb-subj an verb-obj considerwe verb-rmod+relcl verb-subj trace visitor might verb+modal-indcompl
CGMIL Hyderabad - India Coordination base: … is tautologous and without ontologic commitment … coord+basecoord2nd+base compar: … were more like mythical heroes than like the omnipotent God … coord+comparcoord2nd+comparcoordantec+compar correlat: … neither John nor his friends … coord2nd+correlatcoord+correlatcoordantec+correlat … and “word” traces compar: … Samuel asserted that mentality emerged … and then t asserted t Samuel that … coord+base coord2nd+base
CGMIL Hyderabad - India The AnnCorra scheme It is chunk-based (some elementary subtrees are left unanalysed) It involves 28 relations (arc labels) and 25 different POS (tabel below) There are some non-dependency labels (as for coordination (ccof) Some POS are merged (e.g. Demonstratives include both Adj and Pron)
CGMIL Hyderabad - India AnnCorraTUT Common NounNNNOUN (common) Proper NounNNPNOUN (proper) Location, TimeNSTADV (time), ADV (loc) PronounPRPPRON except the ones in Demonstrative and Question AdjectiveJJADJ except the ones in Demonstrative and Question AdverbRBADV (with some exceptions) DemonstrativeDEMPRON (demons), ADJ (demons) Question WordsWQADJ (interr), ADV (interr), PRON RELAT????, PRON (interr) Main verbVMVERB (main) Verb AuxVAUXVERB (aux), VERB (mod) Post positionPSPPREP ParticlesRPNone ConjunctsCCCONJ QuantifiersQFDET, PREDET Cardinal numbQCNUM Ordinal numbQOADJ (ordin), PRON (ordin) ClassifierCLNone IntensifierINTFADV (quant) InterjectionINJINTERJ NegationNEGADV (neg) QuotativeUTNone SymSYMSPECIAL or PUNCT Compounds*CNone ReduplicativeRDPNone EchoECHNone Mapping category labels
CGMIL Hyderabad - India §k1 (karta): the primary (or “most independent”) participant in the action (similar to agent) VERB-SUBJ §k2 (karma): this is the secondary participant (often, the patient). VERB-OBJ §k3 (karana): the instrument. VERB-INDCOMPL-MEANSMANNER §k4 (sampradana): recipient or the beneficiary of an action VERB-INDOBJ §k5 (apadana): the stationary element in a separation ???? §k7 (adhikarana): the locus (spatial or temporal or abstract) of karta or karma. It is tagged as k7p, k7t or k7 depending on the type of location. VERB-INDCOMPL- LOC The argument (karaka) labels Mapping arc labels
CGMIL Hyderabad - India must read verb+modal- indcompl I verb-subj the verb-obj book t det+def-arg (must read) I k1 (the book) k2 Mapping the structure Chunk-based structure of AnnCorra
CGMIL Hyderabad - India Current activities and the future A word about semantics: DTS theoremsstudents verb-subj heard threetwo verb-obj difficult det+quantif-argdet+indef-arg adjc+qualif-rmod quant(x): quant(y): x student' 1 y theorem'difficult' 11 restr(x):restr(y): difficult' 1 study' yx student'theorem'
CGMIL Hyderabad - India difficult' 1 study' yx student'theorem' CTX Disambiguation: Semdep arcs 1 study' yx student'theorem' CTX difficult' 2x [ student’(x) 3y [theorem’(y) study’(x,y) ]] 3y [ theorem’(y) 2x [student’(y) study’(x,y) ]] Any more reading? 1 study' yx student'theorem' CTX difficult' ??? Branching Quantification (Independent Set)
CGMIL Hyderabad - India Current activities and the future Practical semantic interpretation based on ontological knowledge for DB access Extension of the treebank with semantic annotation (in cooperation with Johan Bos) Development of a graphical interface with a online server (Java implementation and socket-based connection with a Lisp server) Automatic analysis of legal texts for extracting information about trule amendments (date, modified text, new text)
CGMIL Hyderabad - India The future (last but not least) Morphological analysis of Hindi (mid-way) Development and testing of a Hindi parser and of mapping rules from Hindi to English and viceversa In cooperation with IIIT Hyderabad
CGMIL Hyderabad - India HEAD= w i w2w2 w1w1 w i+2 w i-1 w i+1 wnwn ? ?? ? ? ? ….. Function: Structure: (head-category head-subcategory (dependent-position (dependent-category (dependent-constraints))) ARC-LABEL) More on Parsing 1
CGMIL Hyderabad - India Examples: (ART DEF (before (PREDET (agree))) PDETMOD) i (cat=ART, subcat=DEF gender=m, number=pl) tutti (cat=PREDET, gender=m, number=pl) PDETMOD the all (NOUN COMMON (chunk-follows (ADJ (agree) (subcat qualif))) ADJCMOD-QUALIF) bello (cat=ADJ, subcat=QUALIF, gender=m, number=sing) giardino (cat=NOUN, gender=m, number=sing) ADJCMOD-QUALIF nicegarden molto (cat=ADV) very More on Parsing 2
CGMIL Hyderabad - India verbs nosubj- verbs subj- verbs obj- verbs basic-transempty-modal modal ssubj-inf- verbs trans indobj- verbs trans-indobj subcategorization classes bisognare camminare dovere dictionary potere need walk must can Verb subcategorization classes: More on Parsing 3
CGMIL Hyderabad - India Transformations: basic class (e.g. trans)transformed classes (e.g. trans, trans+passivization, trans+infinitivization, trans+prodrop, trans+passivization+infinitivization, ….. ) Example transformation: (infinitivization replacing (subj-verbs) (is-inf-form tr-verb v-casefr) (cancel-case s-subj)) More on Parsing 4