Presentation is loading. Please wait.

Presentation is loading. Please wait.

CGMIL 2008 - Hyderabad - India An Italian-English dependency parser and its [possible] application to Hindi Leonardo Lesmo Natural.

Similar presentations


Presentation on theme: "CGMIL 2008 - Hyderabad - India An Italian-English dependency parser and its [possible] application to Hindi Leonardo Lesmo Natural."— Presentation transcript:

1 CGMIL 2008 - Hyderabad - India An Italian-English dependency parser and its [possible] application to Hindi Leonardo Lesmo (lesmo@di.unito.it) Natural Language Processing Group (Dip. Informatica – Univ. Torino) (http://www.di.unito.it/gull)

2 CGMIL 2008 - Hyderabad - India OUTLINE §The Turin University Parser §Performances §The Turin University Treebank (TUT) §Mapping between TUT and AnnCorra §Current activities and the future

3 CGMIL 2008 - Hyderabad - India Post-processingSegmentation Analysis of Conjunctions Chunking Tagging rules Lexical access Verbal Attachment Verbal subcategories Verbal frames THE PARSER Dictionary Morphology POS tagging Chunking rules

4 CGMIL 2008 - Hyderabad - India When the man that you mentioned sent me that beautiful message, I fell in love with him When [the man] that you mentioned sent me [that beautiful message], I fell [in love] [with him] chunking {{When [the man] {that you mentioned} sent me [that beautiful message]}, I fell [in love] [with him] } segmentationcaseframing AN EXAMPLE

5 CGMIL 2008 - Hyderabad - India beautiful verb-obj to fall I verb+fin-rmod- time when verb-subj prep-arg in with rmod message conj-arg himto send that det+def- arg adjc+qualif-rmod prep-arg love me verb-indobj verb-subj the man verb- indcomp*locut to mention that verb-rmod+ relcl det+def- arg verb-subjverb-obj you THE FINAL RESULT

6 CGMIL 2008 - Hyderabad - India 1 When (WHEN CONJ SUBORD TIME) [7;VERB+FIN-RMOD-TIME] 2 the (THE ART DEF ALLVAL ALLVAL) [7;VERB-SUBJ] 3 man (MAN NOUN COMMON M SING) [2;DET+DEF-ARG] 4 that (THAT PRON RELAT ALLVAL ALLVAL LSUBJ+OBL) [6;VERB-OBJ] 5 you (YOU PRON PERS ALLVAL ALLVAL 2 LSUBJ+LOBJ+LIOBJ+OBL) [6;VERB-SUBJ] 6 mentioned (MENTION VERB MAIN IND PAST ALLVAL ALLVAL) [3;VERB-RMOD-RELCL] 7 sent (SEND VERB MAIN IND PAST ALLVAL ALLVAL) [1;CONJ-ARG] 8 me (I PRON PERS ALLVAL SING 1 LOBJ+LIOBJ+OBL) [7;VERB-INDCOMPL-THEME] 9 that (THAT ADJ DEMONS ALLVAL SING) [7;VERB-OBJ] 10 beautiful (BEAUTIFUL ADJ QUALIF ALLVAL ALLVAL) [11;ADJC+QUALIF-RMOD] 11 message (MESSAGE NOUN COMMON N SING) [9;DET+DEF-ARG] 12, (#\, PUNCT) [14;SEPARATOR] 13 I (I PRON PERS ALLVAL SING 1 LSUBJ) [14;VERB-SUBJ] 14 fell (FALL VERB MAIN IND PAST ALLVAL ALLVAL) [0;TOP-VERB] 15 in (IN PREP MONO) [14;PREP-RMOD] 16 love (LOVE NOUN COMMON N SING) [15;PREP-ARG] 17 with (WITH PREP MONO) [14;PREP-RMOD] 18 him (HE PRON PERS M SING 3 LOBJ+LIOBJ+OBL) [17;PREP-ARG] 19. (#\. PUNCT) [14;END] THE ACTUAL FORMAT

7 CGMIL 2008 - Hyderabad - India LASUASLAS2Participant 86.9490.9091.59UniTo_Lesmo 77.8888.4383.00UniPi_Attardi 75.1285.8182.05IIIT_Mannem 74.8585.8881.59UniStuttIMS_Schielen *85.46*UPenn_Champollion 47.6262.1154.90UniRoma2_Zanzotto Results: Evalita 2007 LAS: Labeled Attachment Score UAS: Correct Attachment Score LAS2: Correct Label Score

8 CGMIL 2008 - Hyderabad - India LASUAS CoNLLEPTCoNLLEPT UniPi_Attardi81.3477.8885.5488.43 IIIT_Mannem78.6775.1282.9185.81 UniStuttIMS_Schielen80.4674.8584.5485.88 Comparison with CoNLL CoNLL: International contest for dependency parsers (multilanguage)

9 CGMIL 2008 - Hyderabad - India The Turin University Treebank (TUT) Current size: Italian: 2200 sentences 62445 tokens (4635 traces; 6704 punctuation) English: 150 sentences 4250 tokens (253 traces; 513 punctuation ) English not yet online (under test)

10 CGMIL 2008 - Hyderabad - India 1. ADJ (adjectives) - DEITT (deictic) next - DEMONS (demonstrative) such, this, that - EXCLAM (exclamative) - INDEF (indefinite) numerous, certain, few - INTERR (interrogative) what, which - ORDIN (ordinal) first, twentieth, last - ORDINSUFF (ordinal suffixes) nd, rd, th, st - POSS (possessive) my, your, their - QUALIF (qualificative) nice, big, English 2. ADV (adverbs) - ADFIRM (adfirmative) - ADVERS (adversative) although, though - COMPAR (comparative) less, more - CONCESS (concessive) also - DOUBT (doubt) perhaps - EXPLIC (explicative) that_is - INTERJ (interjections) at_any_rate - INTERR (interrogative) how, where, when, why - LIMIT (limit) just, only - LOC (locative) there, within, below, here - MANNER (manner) aloud, alright, well - NEG (negation) not - QUANT (quantification) little, rather, too - REASON (motivation) in_fact - STRENG (strengthening) even, moreover - SUPERL (superlative) most - TIME (time) sometime, afterward, already Parts of Speech (and “subtypes”)

11 CGMIL 2008 - Hyderabad - India 3. ART (articles) - DEF (definite) the - INDEF (indefinite) a, another, - GENITIVE (genitive): 's 4. CONJ (conjunctions) - COORD (coordinative) and, but, or, neither, nor - SUBORD (subordinative) since, that, to, unless - COMPAR (comparative) than 5. DATE (dates) 08/06/2008 6. INTERJ (interjections) alas 7. MARKER (markers) 8. NOUN (nouns) - COMMON house, boy, chair - PROPER Mary, Italia, Italy, England 9. NUM (numbers) zero, twenty, 127, 3.14 10. PHRAS (phrasals) yes, no 11. PREDET (predeterminers) all, both 12. PREP (prepositions) - MONO of, to, from, in - POLI during, above, under, in front of Parts of Speech (and “subtypes”) 2

12 CGMIL 2008 - Hyderabad - India 13. PRON (pronouns) DEMONS (demonstrative) this, that, EXCLAM (exclamative) what INDEF (indefinite) everything, nobody, something INTERR (interrogative) what, who LOC (locative) I: ne, ci, vi ORDIN (ordinals) first, second, fiftieth PERS (personal) I, you, we, her POSS (possessive) mine, yours REFL-IMPERS (reflexive-impersonal) ci, vi, si, se RELAT (relative) that, who, which, where 14. PUNCT (punctuation) 15. SPECIAL (special) 16. VERB (verbs) MAIN (all standard verbs) go, eat, give, be (in “to be intelligent”) AUX (auxiliaries) be (in “to be kissed”) MOD (modals) must, can, will Parts of Speech (and “subtypes”) 3

13 CGMIL 2008 - Hyderabad - India The labelling scheme Top Dependent Function Arg Modifier Nofunction adjc-arg advb-arg conj-arg noun-arg verb-arg verb-subj verb-obj verb-indobj verb-indcompl verb-predcompl AppositionRmod

14 CGMIL 2008 - Hyderabad - India Nofunction Aux Contin Coordinator Emptycompl Interjection SeparatorVisitor Verb-expletive Aux+passive Aux+tense Aux+progressive Contin+denom Contin+locut Contin+prep Coordantec Coord Coord2nd The NOFUNCTION labels

15 CGMIL 2008 - Hyderabad - India Some examples Aux+progressive: I am looking for … Aux+tense: … the debate has – to quite some extent - suffered from … Aux+passive: … whose historical experience is not marked by … Auxiliaries Contin+locut: … convinced of the feasibility … in order to reinforce … Continuations Contin+prep: … grown out of the millenniums … Contin+denom: Samuel Alexander asserted …

16 CGMIL 2008 - Hyderabad - India The question of what we might consider to be an adequate … Visitors (and traces) the of prep-rmod question prep-arg what det+def- argverb-obj trace verb-subj trace to verb-predcompl+obj prep-arg be trace verb-subj an verb-obj considerwe verb-rmod+relcl verb-subj trace visitor might verb+modal-indcompl

17 CGMIL 2008 - Hyderabad - India Coordination base: … is tautologous and without ontologic commitment … coord+basecoord2nd+base compar: … were more like mythical heroes than like the omnipotent God … coord+comparcoord2nd+comparcoordantec+compar correlat: … neither John nor his friends … coord2nd+correlatcoord+correlatcoordantec+correlat … and “word” traces compar: … Samuel asserted that mentality emerged … and then t asserted t Samuel that … coord+base coord2nd+base

18 CGMIL 2008 - Hyderabad - India The AnnCorra scheme It is chunk-based (some elementary subtrees are left unanalysed) It involves 28 relations (arc labels) and 25 different POS (tabel below) There are some non-dependency labels (as for coordination (ccof) Some POS are merged (e.g. Demonstratives include both Adj and Pron)

19 CGMIL 2008 - Hyderabad - India AnnCorraTUT Common NounNNNOUN (common) Proper NounNNPNOUN (proper) Location, TimeNSTADV (time), ADV (loc) PronounPRPPRON except the ones in Demonstrative and Question AdjectiveJJADJ except the ones in Demonstrative and Question AdverbRBADV (with some exceptions) DemonstrativeDEMPRON (demons), ADJ (demons) Question WordsWQADJ (interr), ADV (interr), PRON RELAT????, PRON (interr) Main verbVMVERB (main) Verb AuxVAUXVERB (aux), VERB (mod) Post positionPSPPREP ParticlesRPNone ConjunctsCCCONJ QuantifiersQFDET, PREDET Cardinal numbQCNUM Ordinal numbQOADJ (ordin), PRON (ordin) ClassifierCLNone IntensifierINTFADV (quant) InterjectionINJINTERJ NegationNEGADV (neg) QuotativeUTNone SymSYMSPECIAL or PUNCT Compounds*CNone ReduplicativeRDPNone EchoECHNone Mapping category labels

20 CGMIL 2008 - Hyderabad - India §k1 (karta): the primary (or “most independent”) participant in the action (similar to agent)  VERB-SUBJ §k2 (karma): this is the secondary participant (often, the patient).  VERB-OBJ §k3 (karana): the instrument.  VERB-INDCOMPL-MEANSMANNER §k4 (sampradana): recipient or the beneficiary of an action  VERB-INDOBJ §k5 (apadana): the stationary element in a separation  ???? §k7 (adhikarana): the locus (spatial or temporal or abstract) of karta or karma. It is tagged as k7p, k7t or k7 depending on the type of location.  VERB-INDCOMPL- LOC The argument (karaka) labels Mapping arc labels

21 CGMIL 2008 - Hyderabad - India must read verb+modal- indcompl I verb-subj the verb-obj book t det+def-arg (must read) I k1 (the book) k2 Mapping the structure Chunk-based structure of AnnCorra

22 CGMIL 2008 - Hyderabad - India Current activities and the future A word about semantics: DTS theoremsstudents verb-subj heard threetwo verb-obj difficult det+quantif-argdet+indef-arg adjc+qualif-rmod quant(x): quant(y):   x student' 1 y theorem'difficult' 11 restr(x):restr(y): difficult' 1 study' yx student'theorem' 2 11 1

23 CGMIL 2008 - Hyderabad - India difficult' 1 study' yx student'theorem' 2 111 CTX Disambiguation: Semdep arcs 1 study' yx student'theorem' 2 111 CTX difficult' 2x [ student’(x)  3y [theorem’(y)  study’(x,y) ]] 3y [ theorem’(y)  2x [student’(y)  study’(x,y) ]] Any more reading? 1 study' yx student'theorem' 2 111 CTX difficult' ??? Branching Quantification (Independent Set)

24 CGMIL 2008 - Hyderabad - India Current activities and the future Practical semantic interpretation based on ontological knowledge for DB access Extension of the treebank with semantic annotation (in cooperation with Johan Bos) Development of a graphical interface with a online server (Java implementation and socket-based connection with a Lisp server) Automatic analysis of legal texts for extracting information about trule amendments (date, modified text, new text)

25 CGMIL 2008 - Hyderabad - India The future (last but not least) Morphological analysis of Hindi (mid-way) Development and testing of a Hindi parser and of mapping rules from Hindi to English and viceversa In cooperation with IIIT Hyderabad

26 CGMIL 2008 - Hyderabad - India HEAD= w i w2w2 w1w1 w i+2 w i-1 w i+1 wnwn ? ?? ? ? ? ….. Function: Structure: (head-category head-subcategory (dependent-position (dependent-category (dependent-constraints))) ARC-LABEL) More on Parsing 1

27 CGMIL 2008 - Hyderabad - India Examples: (ART DEF (before (PREDET (agree))) PDETMOD) i (cat=ART, subcat=DEF gender=m, number=pl) tutti (cat=PREDET, gender=m, number=pl) PDETMOD the all (NOUN COMMON (chunk-follows (ADJ (agree) (subcat qualif))) ADJCMOD-QUALIF) bello (cat=ADJ, subcat=QUALIF, gender=m, number=sing) giardino (cat=NOUN, gender=m, number=sing) ADJCMOD-QUALIF nicegarden molto (cat=ADV) very More on Parsing 2

28 CGMIL 2008 - Hyderabad - India verbs nosubj- verbs subj- verbs obj- verbs basic-transempty-modal modal ssubj-inf- verbs trans indobj- verbs trans-indobj subcategorization classes bisognare camminare dovere dictionary potere need walk must can Verb subcategorization classes: More on Parsing 3

29 CGMIL 2008 - Hyderabad - India Transformations: basic class (e.g. trans)transformed classes (e.g. trans, trans+passivization, trans+infinitivization, trans+prodrop, trans+passivization+infinitivization, ….. ) Example transformation: (infinitivization replacing (subj-verbs) (is-inf-form tr-verb v-casefr) (cancel-case s-subj)) More on Parsing 4


Download ppt "CGMIL 2008 - Hyderabad - India An Italian-English dependency parser and its [possible] application to Hindi Leonardo Lesmo Natural."

Similar presentations


Ads by Google