Presentation is loading. Please wait.

Presentation is loading. Please wait.

English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC.

Similar presentations


Presentation on theme: "English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC."— Presentation transcript:

1 English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC - Université de Caen TALN 2002

2 English version 24/6/2002 © Jacques Vergne TALN Features of the experience experimenting, exploring, explaining, transmitting deterministic parsing methods choice of a classical task, limited and (apparently) simple : detecting and linking subjects and verbs in clauses with the smaller possible soft (program + resources)

3 English version 24/6/2002 © Jacques Vergne TALN Linking subject verb linking pronoun or chunk subject to the verbal chunk in every clause multilingual corpus (English, German, French, Italian, Spanish) with language identification : genericity of the method ? top-down : document — > clause and chunk, (with partial chunking, without going down to the word level) written in perl : - sentence parsing : 40 Kb - resources : 20 Kb for 5 languages

4 English version 24/6/2002 © Jacques Vergne TALN with beginnings of clauses, beginnings of chunks How doing without a dictionary ? || | L'euro | rend déjà d'éminents services | Dans les deux cas | ces systèmes | d'armes | disposent de radars || | Questo tema | rischia di essere la questione sociale del futuro || | La Bolsa de Tokio | cerró ayer a su nivel más bajo en 17 años with determiner - verbal ending couples

5 English version 24/6/2002 © Jacques Vergne TALN || | Das Sternbild nämlich | steht in dieser Jahreszeit besonders tief am Himmel || Bis Ende Oktober | schließt sich | der Reigen in Connecticut, Massachusetts und Rhode Island || | The costs | mount rapidly, || But | the Pentagon move | represents the first significant federal call-up with beginnings of clauses, beginnings of chunks How doing without a dictionary ? with determiner - verbal ending couples

6 English version 24/6/2002 © Jacques Vergne TALN Resources : all for French "à condition que|à condition qu|ainsi que|ainsi qu|auquel|auxquels|combien|comme|comment|dont|dés que|dés qu|lorsque|lorsqu|même si|où| parce que |parce qu|pourquoi|quand|alors que|alors qu|bien que|bien qu|quoi que|quoi qu|tandis que|tandis qu|tant que|tant qu|puisque|puisqu|sans que|sans qu|que|qu|qui|sauf si|si" "et donc|et encore|et ensuite|et même|et non|et pas|et pourtant| et |ou bien|ou même|ou encore|ou|mais aussi|mais|car|mais|or|puis" "quant à|quant au|quant aux|grâce à|grâce au|grâce aux|face à|face au|face aux|à partir de|à partir du|à partir d|à|À|afin de|afin d|aprés|au-delà d|au-delà de|au-delà du|au-delà des|au|aux|auprés d|auprés de|auprés du|auprés des|autour d|autour de|autour du|autour des|avant| avec |chez|contre|dans|de par|d'entre|d'où|d|de|des|du|depuis|devant|dés|durant| en tant que|en tant qu|en|entre|hors d|hors de|hors du|hors des|jusque|jusqu'à|jusqu'au|jusqu'aux|lors d|lors de|lors du|lors des|malgré|outre|par|parmi|pendant|pour|près de|près d|sans|sauf|sous|selon|sur|vers|via|voire" "un|une|le|la|l|ce|cet| cette |sa|son|notre|leur|tout|toute|chaque|aucun|aucune| Un|Une|Le|La|L|Ce|Cet|Cette|Sa|Son|Notre|Leur|Tout|Toute|Chaque|Aucun|Aucune" "les| ces |ses|leurs|nos|tous|toutes|plusieurs|deux|trois|quatre|cinq|six|sept|huit|neuf|dix|d'autres|certains|quelques| Les|Ces|Ses|Leurs|Nos|Tous|Toutes|Plusieurs|Deux|Trois|Quatre|Cinq|Six|Sept|Huit|Neuf|Dix|D'autres|Certains|Quelques" "je|j|tu| il |elle|l'on|on|c|ça|cela|ceci" " ils |elles|nous|vous" "a|avait|aura|ait|aurait|est|était| sera |serait|va|allait|ira|faisait|fera" "ont|avaient|auront|aient|auraient|sont|étaient|seront|seraient|vont|allaient|iront|font|faisaient|feront" "e| a |ed|pand|end|ond|erd|ord|oud|et|it|ît|tient|vient|pent|sent|eint|ort|ut|ût" "ent|ont" "n'| ne |m'|me |t'|te |s'|se |s'en |s'y |lui |leur |en |y |le |la |les |l'" beginnings of clause beginnings of chunk subject pronouns auxiliaries verbal endings clitics

7 English version 24/6/2002 © Jacques Vergne TALN Resources : all for English "although|as if|as|because|before|how|if|since|than|that|though|unless|until|whatever|what| when | where|whether|while|who|which|whom|whose|why" " and |but|or|nor" "about|according to|across|after|against|along|amid|among|around|such as|at|because of|behind| between|by|despite|due to|during|except for|for|from|in order to| in |inside|into|instead of|like|of|off|on|out of| over|per|prior to|less than|more than|throughout|through|to|toward|under|unlike|via|within|without|with" "such a| a |an|another|this|any|each|one|Such a|A|An|Another|This|Any|Each|One" "many|most|much of|much|plenty of|several|some|such| these |those|both|two|three|four|five|six|seven| eight|nine|ten|a few|Many|Most|Much of|Much|Plenty of|Several|Some|Such|These|Those|Both|Two| Three|Four|Five|Six|Seven|Eight|Nine|Ten|A few" " the |our|your|its|his|her|their|The|Our|Your|Its|His|Her|Their" "I| he |she|it" "we| they |you" "has| is |was|does|says|tells|hasn't|isn't|wasn't|doesn't" "have| are |were|do|say|tell|haven't|aren't|weren't|don't" " had |will|would|shall|should|may|might|must|cannot|can|could|did|said|told|hadn't|wouldn't|shouldn't|may| mustn't|can't|couldn't|didn't|won't " " s |ed" "a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|t|u|v|w|x|y|z" "" beginnings of clause beginnings of chunk subject pronouns auxiliaries verbal endings clitics

8 English version 24/6/2002 © Jacques Vergne TALN Resources : all for German "dass| daß |in denen|indessen|dessen|indem|nachdem|ob|obwohl| was|warum|wer|weil|wenn|wie|wo|wofür|worauf|worin" "aber|oder| und " "dem|den|des|diesem|diesen|dieser|dieses| einem |einen|einer|eines|meinem|meinen|meiner|meines| deinem|deinen|deiner|deines|seinem|seinen|seiner|seines|ab|als|am|an|anhand|auf|aus|bei|bis|durch|für| gegen|gen|hinter|ihren|im|innerhalb|ins|in| mit |nach|neben|ohne|pro|seit|über|‹ber|um|unseren|unter|vom| von|vor|während|wegen|zum|zur|zu|zwischen" "der|das| ein |eine|dieser|diese|kein|keine|ihres|ihr| Der|Das|Ein|Eine|Dieser|Diese|Kein|Keine|Ihres|Ihr|die|meine|seine|viel|Die|Meine|Seine|Viel" " die |meine|seine|ihre|viele|alle|zwei|Die|Meine|Seine|Ihre|Viele|Alle|Zwei" "ich|er|sie| es |man|Ich|Er|Sie|Es|Man" " wir |Sie|sie|Wir" "habe| hat |hatte|bin|ist|sei|wäre|war|wird|werde|wurde|darf|dürfte|kann|konnte|könnte|könne|lässt|muss|so ll|will|wollte" "haben|hatten| sind |waren|werden|wurden|worden|können|könnten|lassen|müssen|mussten| sollen|sollten" "b|nd| te |e|f|ag|ng|ah|hm|t" " en |rn" "" beginnings of clause beginnings of chunk subject pronouns auxiliaries verbal endings clitics

9 English version 24/6/2002 © Jacques Vergne TALN document Parsing and Hierarchies of grains intermediary grains computed grains textual zones proto-clauses extracting validating, segmenting, linking clauses purely top-down parser segmenting / written forms proto-chunks tagging / written forms going down in the hierarchy of physical grains chunks physical grains sentences segmenting / punctuation

10 English version 24/6/2002 © Jacques Vergne TALN proto-clauses (= hypotheses on clauses) post- processing standard process Parsing process cutting, linking proto-clauses clauses (= 1 proto-clause) 1 sentence diagnostic clauses (= 1/2 proto-clause, 2 proto-clauses) beginnings of clause auxiliaries, subject pronouns, verbal endings partial chunking subject & verb ? sentence ? linking subject - verb beginnings of chunks no segmentation / written forms

11 English version 24/6/2002 © Jacques Vergne TALN Standard process : example 1 0 : Je n'ai jamais dit que 1 : que l'euro allait remplacer le dollar. 2 :. Je n'ai jamais dit que l'euro allait remplacer le dollar. (Ouest-France of 18/10/2001) tagging beginnings of proto-clauses — > segmentation into proto-clauses : proto-clause = clause

12 English version 24/6/2002 © Jacques Vergne TALN Standard process : example 1 0 : Je n'ai jamais dit [nbpp=1 nbV=1] que 1 : que l'euro allait remplacer le dollar [nbpp=0 nbV=0]. 2 :. tagging beginnings of chunks — > partial chunking in the written form of the proto-clause tagging subject pronouns, auxiliaries — > counting subject pronouns and auxiliaries

13 English version 24/6/2002 © Jacques Vergne TALN Standard process : example 1 || 0 : | Je | n'ai jamais dit [nbV=1 saturS=1] que|| 1 : que | l'euro | allait remplacer le dollar [nbV=1 saturS=1]. 2 :. for every proto-clause : detecting and linking subject and verb

14 English version 24/6/2002 © Jacques Vergne TALN Standard process : example 1 || 0 : | Je | n'ai jamais dit [nbV=1 saturS=1] que|| 1 : que | l'euro | allait remplacer le dollar [nbV=1 saturS=1]. 2 :. diagnostic of every clause and of the sentence every clause has its subject and its verb and the sentence has a main clause (without a mark)

15 English version 24/6/2002 © Jacques Vergne TALN Standard process : example 2 Eine spektakuläre Operation gelang ihm im November 1974, als er ein Spenderherz transplantierte, ohne das Herz des Empfängers zu entfernen. (Der Spiegel - 2/9/2001) 0 : Eine spektakuläre Operation gelang ihm im November 1974, als 1 : als er ein Spenderherz transplantierte, ohne 2 : ohne das Herz des Empfängers zu entfernen. 3 :. tagging of the beginnings of proto-clauses — > segmentation into proto-clauses :

16 English version 24/6/2002 © Jacques Vergne TALN Standard process : example 2 0 : Eine spektakuläre Operation gelang ihm im November 1974, [nbpp=0 nbV=0] als 1 : als er ein Spenderherz transplantierte, [nbpp=1 nbV=0] ohne 2 : ohne das Herz des Empfängers zu entfernen. 3 :. tagging beginnings of chunks — > partial chunking in the written form of the proto-clause tagging pronouns, auxiliaries — > counting pronouns and auxiliaries

17 English version 24/6/2002 © Jacques Vergne TALN Standard process : example 2 for every proto-clause : detecting and linking subject and verb || 0 : | Eine spektakuläre Operation | gelang ihm im November 1974, [nbV=1 saturS=1] als|| 1 : als | er ein Spenderherz | transplantierte, [nbV=1 saturS=1] ohne 2 : ohne das Herz des Empfängers zu entfernen. 3 :.

18 English version 24/6/2002 © Jacques Vergne TALN Standard process : example 2 || 0 : | Eine spektakuläre Operation | gelang ihm im November 1974, [nbV=1 saturS=1] als|| 1 : als | er ein Spenderherz | transplantierte, [nbV=1 saturS=1] ohne 2 : ohne das Herz des Empfängers zu entfernen. 3 :. diagnostic of every clause and of the sentence every clause has its subject and its verb and the sentence has a main clause (without a mark)

19 English version 24/6/2002 © Jacques Vergne TALN Post-processing : proto-clause  clause 2 operations are possible : cutting 1 proto-clause => 2 clauses linking 2 proto-clauses => 1 clause

20 English version 24/6/2002 © Jacques Vergne TALN Post-processing : cutting a proto-clause into 2 clauses Result of the standard process : 2 verbs in 1 proto-clause => searching a cut point Although|| 0 : Although | they | have not ruled out a possibility [nbV=1 saturS=1] that 1 : that another criminal could be behind the anthrax attacks, investigators are intensely looking at evidentiary threads linking the letters to the hijackers [nbV=2]. 2 :.

21 English version 24/6/2002 © Jacques Vergne TALN Post-processing : cutting a proto-clause into 2 clauses Although|| 0 : Although | they | have not ruled out a possibility [nbV=1 saturS=1] that||, 1 : that | another criminal | could be behind the anthrax attacks, [nbV=1 saturS=1] || 2 : |investigators | are intensely looking at evidentiary threads linking the letters to the hijackers [nbV=1 saturS=1]. 3 :. Cut on the comma : every clause now has its subject and its verb and the sentence has a main clause (without a mark)

22 English version 24/6/2002 © Jacques Vergne TALN Post-processing : linking 2 proto-clauses 0 : Eine junge Südafrikanerin, [nbV=0] |die| 1 : |die 1969 ein neues Herz | erhielt, [nbV=1 saturS=1] 2 : überlebte damit zwölf Jahre [nbV=0]. 3 :. Result of the standard process : 2 proto-clauses have no verb => trying to link them

23 English version 24/6/2002 © Jacques Vergne TALN Post-processing : linking 2 proto-clauses 0 : | Eine junge Südafrikanerin, [nbV=0 S_en_attente=1] (ping of the subject) |die| 1 : |die 1969 ein neues Herz | erhielt, [nbV=1 saturS=1] 2 : überlebte damit zwölf Jahre [nbV=0] linking the proto-clause 0 to the proto-clause 2 by the "ping-pong" process :

24 English version 24/6/2002 © Jacques Vergne TALN | 0 : | Eine junge Südafrikanerin, [nbV=0 S_en_attente=0 lienS=2] (ping of the subject) |die| 1 : |die 1969 ein neues Herz | erhielt, [nbV=1 saturS=1] | 2 : | überlebte damit zwölf Jahre [nbV=1 saturS=1 lienS=0] (pong of the verb) Post-processing : linking 2 proto-clauses linking the proto-clause 0 to the proto-clause 2 by the "ping-pong" process :. 3 :. every clause now has its subject and its verb and the sentence has a main clause (without a mark)

25 English version 24/6/2002 © Jacques Vergne TALN : Les tueurs, [nbV=0] |qui| 1 : |qui | ont assassiné Rehavam Zeevi, ministre israélien du Tourisme, appartiennent au camp des ennemis de la paix [nbV=1 saturS=1]. 2 :. Post-processing : cutting a proto-clause into 2 clauses + linking 2 proto-clauses Result of the standard process : 1 proto-clause has no verb => trying to cut and link

26 English version 24/6/2002 © Jacques Vergne TALN "ping-pong" process : ping of the subject = putting a subject candidate in a waiting position 0 : | Les tueurs, [nbV=0 S_en_attente=plur] (ping of the subject?) |qui|, 1 : |qui | ont assassiné Rehavam Zeevi, ministre israélien du Tourisme, appartiennent au camp des ennemis de la paix [nbV=1 saturS=1] cutting the proto-clause 1 into 2 proto-clauses : Post-processing : cutting a proto-clause into 2 clauses + linking 2 proto-clauses

27 English version 24/6/2002 © Jacques Vergne TALN : | Les tueurs, [nbV=0 S_en_attente=plur] (ping of the subject?) |qui|, 1 : |qui | ont assassiné Rehavam Zeevi, ministre israélien du Tourisme, [nbV=1 saturS=1] 2 : appartiennent au camp des ennemis de la paix [nbV=0] Post-processing : cutting a proto-clause into 2 clauses + linking 2 proto-clauses cutting the proto-clause 1 into 2 proto-clauses :

28 English version 24/6/2002 © Jacques Vergne TALN | 0 : | Les tueurs, [nbV=0 S_en_attente=0 lienS=2] (ping of the subject?) |qui| 1 : |qui | ont assassiné Rehavam Zeevi, ministre israélien du Tourisme, [nbV=1 saturS=1] | 2 : | appartiennent au camp des ennemis de la paix [nbV=1 saturS=1 lienS=0] (pong of the verb). 3 :. every clause now has its subject and its verb and the sentence has a main clause (without a mark) Post-processing : cutting a proto-clause into 2 clauses + linking 2 proto-clauses "ping-pong" process : pong of the verb =  a waiting subject candidate & agreeing verbal ending

29 English version 24/6/2002 © Jacques Vergne TALN Implementation of the linguistic model physical grains computed grains clauses sentences proto-chunks these grains are represented in a repetitive structure these grains are tagged in the written forms of the (proto-)clauses proto-clauses chunks intermediary grains in the repetitive structure of the (proto-)clauses

30 English version 24/6/2002 © Jacques Vergne TALN Aims of the "Groupe Syntaxe" of the GREYC searching minimal solutions : for a given task, minimising means - very little programs - very simple algorithms - deterministic solutions (without combination enumeration) :. computing on forms and their positions - linguistic minimal bases :. using very few properties, only ones which are useful in the process. very few resources (typographical, morphological)

31 English version 24/6/2002 © Jacques Vergne TALN Very small programs ! how ? while using very general linguistic properties defined in comprehension and not in extension why ? because these properties are interesting : few, abstract operative efficient understanding, modellingacting

32 English version 24/6/2002 © Jacques Vergne TALN Conclusions classical tasks are feasible with minimal means (quasi absence of dictionary) other tasks : computing reported speech, locating explanations cf. Nadine Lucas (GREYC) and Emmanuel Giguet (LATTICE) with fewer means, work is easier : - fewer lexical resources => lower cost - easy to add a new language - always above the word level beginnings of a promising way still a long way...

33 English version 24/6/2002 © Jacques Vergne TALN your questions ? End of the lecture

34 English version 24/6/2002 © Jacques Vergne TALN to download you can download this presentation on also see my presentation at TALN 2001 Parsing natural languages : from "combinatory" to "deterministic" parsing on also see the tutorial of Coling 2000 "Trends in Robust Parsing" on (presentation and references)

35 English version 24/6/2002 © Jacques Vergne TALN

36 English version 24/6/2002 © Jacques Vergne TALN document Parsing and Hierarchies of grains classical parsers recursives phrases, sentence physical grains computed grains sentences tokens segmenting grouping tokens and phra. top - down in the hierarchy of physical grains bottom - up in the hierarchy of computed grains

37 English version 24/6/2002 © Jacques Vergne TALN document Parsing and Hierarchies of grains 1998 parser chunks physical grains computed grains sentences tokens segmenting grouping tokens linking chunks top - down in the hierarchy of physical grains bottom - up in the hierarchy of computed grains

38 English version 24/6/2002 © Jacques Vergne TALN document Parsing and Hierarchies of grains GREYC parser chunks physical grains computed grains textual zones tokens extracting segmenting grouping and linking clauses sentences grouping and linking top - down in the hierarchy of physical grains bottom - up in the hierarchy of computed grains

39 English version 24/6/2002 © Jacques Vergne TALN


Download ppt "English version A method for top-down and deterministic parsing of multilingual corpora : application : computing subject-verb links Jacques Vergne GREYC."

Similar presentations


Ads by Google