Presentation on theme: "25 juin 2002TALN 2002-Michel Généreux1 Un analyseur sémantique pour les langues naturelles basé sur des exemples (An Example-Based Semantic Parser for."— Presentation transcript:
25 juin 2002TALN 2002-Michel Généreux1 Un analyseur sémantique pour les langues naturelles basé sur des exemples (An Example-Based Semantic Parser for Natural Language) Michel Généreux - Austrian Research Institute for Artificial Intelligence
25 juin 2002TALN 2002-Michel Généreux2 Introduction - Motivations Build a robust, portable (in practice, only a set of new annotated examples is needed), wide covering parser that deals well with real data (hesitations, recognition error, idiomatic expressions,...) In the current context of: Large quantity of Information accessible from the Internet Speech Recognition development Natural Language Interfaces (NLI) have become very attractive to access easily information and/or speak to a computer (e.g. multimodality) Improve upon other corpus-based semantic parsers Provides an open and flexible model: the statistical model could be adapted for different types of training corpus Gives context a crucial role in the parsing process Abstracts over topics for changing domains Not rule-based (burden of creating hand-crafted rules, portability)
25 juin 2002TALN 2002-Michel Généreux3 Architecture du système
25 juin 2002TALN 2002-Michel Généreux4 Introduction - Caractéristiques du système An empirical methods for semantic parsing of natural language: to learn to parse a new sentence by looking at previous examples A shift-reduce type parsing paradigm where the operators are based on domain- specific semantic concepts, obtained from a lexicon. A statistically trained model "specializes" the parser, by guiding the runtime beam-like search of possible parses. Decisions are made on the basis of three criteria for the parsing stage: the similarities between the contexts in which the action took place, the similarities between the final meaning representation and finally, the sheer number of occurrences of those actions and final representations. Finally, a module is provided so that training and parsing can also be done on a changing domain such as newspaper browsing.
25 juin 2002TALN 2002-Michel Généreux5 Introduction What can a semantic parser learn from a training corpus of examples ? Training([Maria,suchst,einen,großen,Hund],suchen(Maria,groß(Hund))). –It can learn that the predicate groß/1 combines best as the suchen/2 argument, and not the other way round. Training([Einen,großen,Hund,suchst,Maria],suchen(Maria,groß(Hund))). –It can learn that word order may not really matter. Training([Na,ja,einen,großen,Hund,um,suchst,Maria],suchen(Maria,groß(Hund))). –It can learn that some words or hesitations dont matter. Training([Paul,kicked,the,bucket],die(Paul)); Training([Paul,kicked,the,ball],kick(Paul,ball)). –Using contextual information, it can learn how to disambiguate meanings. Training([Big, deal, that, Maria, is, looking, for, a, big, dog], search(Maria, big(dog))). –It can learn that the same word may or may not participate to the meaning. The corpus may give the parser direct examples on how to conduct its actions and contextual information found in those examples may give the parser additional clues to interpret word order and disambiguate meanings.
25 juin 2002TALN 2002-Michel Généreux6 Introduction How will the parser use that information to parse new sentences ? When parsing a new sentence, each step (action) is compared (including context) to those generated at the training phase; The highest the similarity and the frequency, the highest the action ranks; A full parse is ranked according to the individual ranking of actions and the ranking of the final state; A final choice is made among a set of full parses (limited by the search beam)
25 juin 2002TALN 2002-Michel Généreux7 Phase " Training " Données
25 juin 2002TALN 2002-Michel Généreux8 "Training" -Données Origin of the train and test data: mainly from Wizard of Oz experiments, completed by some artificial examples, in the domain of newspaper searching and browsing Example of annotation: –Command: training([Zurück,bitte],zurück) –Search training([Ich,suche,jetzt,etwas,über,topic(1)],suchen([topic(1)],zeitung(_),ze it(_)) –Command+search training([Zurück,zu,den,Begriffen,topic(1),und,Politik],section_zurück(Politik,[topic(1)])).
25 juin 2002TALN 2002-Michel Généreux9 "Training" Exemples de Données Written Korpus " Artikel über das Wiener Neujahrskonzert suche ich. " " Artikel über das Neujahrskonzert als festen Bestandteil des kulturellen Lebens suche ich. " Spoken Korpus "ich möchte jetzt eine neue Suche beginnen" "die Kosovo-Krise un und Bill Clinton" "Bill Clinton hält eine Ansprache vor der Öffentlichkeit" Artificial data Bitte geben Sie mir einen Text zum Thema Kosovo-Krise. Aber bitte nur in der Zeitung Salzburger Nachrichten von vor einer Woche. Salzburger Nachrichten und Kosovo-Krise suche ich jetzt.
25 juin 2002TALN 2002-Michel Généreux10 "Training" The training phase uses an overly general parser, which produces all possible paths of actions (only limited by the training beam) to be taken in order to get from each training example utterance to its semantic representation. In the process, it records successful actions (called op), as well as the different final states. For each of them uniquely defined, it assigns a frequency* measure, defined as follows: Frequency**= Occurrence_of_an_action_in_a_specific_context / Total_number_of_occurrence_of_this_action * = the measure is similar for final states ** = a_particular_action is an action WITH arguments
25 juin 2002TALN 2002-Michel Généreux11 "Training" Le fichier de statistiques The overlyGeneralParser parses the training file to generate the statistical file. Every step needed to go from the topicalized_phrase to the meaning is recorded, as well as final states themselves. Final states are simply the states of the parse stack themselves at the end of the parse. Each of them (actions and final states) are assigned a frequency measure as described previously. Each line has either one of the following format (recall that op is a container for any action): op(ACTION#PARSE_STACK#INPUT_STRING#FREQUENCY). final(FINAL_STATE#FREQUENCY). with –The Input string [Ich,suche,einen,Artikel,über,Bush] –The Parse Stack [concept1:[context1],concept2:[context2],...] [suchen(,zeitung(_),zeit(_)):[suche,einen,Artikel],start:[ich]] Here are two examples: op(sHIFT(ich)#[start:]#[ich,suche,einen,Text,for,topic(1),topic(2),bitte,bearbeiten,Sie,meinen,Suchauftrag]#0.3333). final([bestätigen(neue_suche):[bearbeiten,Sie,meinen,Suchauftrag],start:[ich,möchte,jetzt,eine]] #0.2). These lines are used by the specializedParser to compute the best parse.
25 juin 2002TALN 2002-Michel Généreux12 La phase danalyse (parsing)
25 juin 2002TALN 2002-Michel Généreux13 Analyse Extraction de thèmes
25 juin 2002TALN 2002-Michel Généreux14 Analyse sémantique statistique Analyse syntaxique: PCFG P = max t P(t,s|G) we try to find the parse P with the highest probability, given a grammar G, where t is a parse tree, s a sentence and where each grammar rule is assigned a probability according to its frequency in a corpus. P(S) = 0.6*0.3*0.4*0.7*0.2*0.3*0.2*0.1 (= 0.0006048) Semantic parsing: –P = max f P(f,s|L) we try to find the parse P with the highest probability, given a first-order logical language L, where f is a formula, s a sentence and where each formula is assigned a probability according to its frequency in a corpus. –Probability(kick(Paul,ball)) = P(Introduce(Paul)) * P(Introduce(kick(_,_))) * P(DROP(Paul,kick(_,_))) * P(SHIFT(the)) * P(INTRODUCE(ball)) * P(DROP(ball,kick(Paul,_))) * P(kick(Paul,ball)) The actual parsing of the input phrase is done by a specializedParser. It is specialized in the sense that it uses a statistical model to process all the information available from the training phase in order to get the best possible parse (the one with the highest probability). The search space: the search beam parameter
25 juin 2002TALN 2002-Michel Généreux15 Analyse statistique: vue densemble
25 juin 2002TALN 2002-Michel Généreux16 Analyse-Adapter le modèle pour différents types de corpus Small training set –threshold –training beam Small lexicon (narrow domain) –Pop –Pfinal
25 juin 2002TALN 2002-Michel Généreux17 Éléments de lanalyse Elements of the parser: –Actions sHIFT(word_to_be_shifted) puts the first word from the input string into the end of the context of the concept on the top of the parse stack iNTRODUCE(concept_to_be_introduced) takes a concept from the semantic lexicon and puts it on the top of the parse stack dROP(source_term, target_term) attempts to place a term from the parse stack as argument to another term of the parse stack The semantic lexicon lexicon(CONCEPT, [TRIGGERING_PHRASE]). –lexicon(topic(1),[topic(1)]). –lexicon(suchen(,zeitung(_),zeit(_)),[suche]).
25 juin 2002TALN 2002-Michel Généreux18 Analyse-Version du SR Analyseur: Shift-Introduce-Drop analyseur We are now ready to present the variant of the shift-reduce parser we are using. The algorithm previously introduced must be modified as follows: 1. Try to introduce a new concept or shift a word. * 2. If possible, make one dROP action. * 3. If there are more words in the input string, go back to Step 1. Otherwise stop. * The backtracking mechanism will ensure that ALL possible actions will be executed.
25 juin 2002TALN 2002-Michel Généreux19 Analyse - Un exemple We now show a parse for Ich suche einen Artikel über Bush. We assume for simplicity that the parser always takes the best available action. The following trace presents the successive actions taken by the parser. The initial parse stack and input string states are: –[start:] and [ich,suche,einen,Artikel,topic(1)] Note: PP topicalized Here is the complete parse, using the two previous lexical entries (lexicon(topic(1),[topic(1)]), lexicon(suchen(,zeitung(_),zeit(_)),[suche])). Each line represents a Parse State: –sHIFT(ich)#[start:]#[ich,suche,einen,Artikel,topic(1)] The word ich is not in the semantic lexicon, so the only action possible is to shift it on the parse stack. –iNTRODUCE(suchen(,zeitung(_),zeit(_)))#[start:[ich]]#[suche,einen,Artikel,topic(1)] The word suche is in the lexicon, so it can be introduced as a new predicate on the parse stack. Another possibility would be to shift it. –sHIFT(einen)#[suchen(,zeitung(_),zeit(_)):[suche],start:[ich]]#[einen,Artikel,topic(1)] The word einen is not in the lexicon, it must be shifted. –sHIFT(Artikel)#[suchen(,zeitung(_),zeit(_)):[suche,einen],start:[ich]]#[Artikel,topic(1)]
25 juin 2002TALN 2002-Michel Généreux20 Analyse-Un exemple (suite) The word Artikel is not in the lexicon (it is actually an relevant_word for the newspaper domain) and is therefore shifted. –iNTRODUCE(topic(1))#[suchen(,zeitung(_),zeit(_)):[suche,einen,Artikel],start:[ich]]#[topic(1)] topic(1) is in the lexicon, so it can be introduced. –dROP(topic(1),suchen(,zeitung(_),zeit(_)))#[topic(1):[topic(1)],suchen(,zeitung(_),zeit(_)):[suche,einen,Artikel],start:[ich]]# We can drop the predicate topic(1) into the first argument of the suchen predicate. The final parse stack or final state is: –[suchen([topic(1)],zeitung(_),zeit(_)):[suche,einen,Artikel],start:[ich]] In the final stage, the parser simply puts back the meaning for topic(1) collected during the topic_extraction phase and the final parse is, without contextual information: –suchen([(Bush)],zeitung(_),zeit(_)) which signals a search for the topic Bush with no specific newspaper or time frame. Note: a parse WITH statistics is a parse in which each of the previous actions, plus a final state, is given a probability according to similarity and frequence.
25 juin 2002TALN 2002-Michel Généreux21 Résultats Testing: from a pool of 90 examples, 9 different subsets of 10 test examples were used as test data (training beam of 1), the rest (80) being the training examples. The parser averages 62% correctness while parsing a new sentence. Although it is difficult to quantitatively compare the approach with others, accuracy is therefore slightly lower than the other approaches. I believe there are mainly three reasons to that: –The very low number (80) of training examples, compare to 560, 225 and 4000 sentences of other approaches. –The lack of extensive testing on what would be the best setting for default values of weighting parameters. Only a set of rather intuitive values were used. –The assimilation of natural language utterances to sets instead of lists.
25 juin 2002TALN 2002-Michel Généreux22 Résultat-Courbe dapprentissage: "training" avec un "beam" de 1, analyse avec un "beam" de 10
25 juin 2002TALN 2002-Michel Généreux23 Résultat-Démonstration danalyse
25 juin 2002TALN 2002-Michel Généreux24 Recherche en cours Testing with larger domains, bigger corpus to see if it scales up well See if the model can be simplified (in terms of number of switches) Establish a clear relation between the statistical parameters (switches) and the type of corpus in terms of efficiency and accuracy Include a treatment of questions Include an automated acquisition of a lexicon such as the one proposed in C. Thompson and R. Mooney. Semantic lexicon acquisition for learning natural language interfaces. In 6th Workshop on Very Large Corpora, August 1998. Another useful tools to enlarge our semantic dictionary would be WordNet.
25 juin 2002TALN 2002-Michel Généreux25 Étape suivante: Discours Extend the approach to Discourse using Discourse Representation Theory (DRT) –DRT structures can also be mapped to logical structures such as the ones used to represent isolate sentences –By mapping Discourse DRT structures, the parser could train to resolve typical discourse problems such as pronoun resolution.
25 juin 2002TALN 2002-Michel Généreux26 Conclusion Corpus-based methods offer a more effective way to deal with real data, and statistics offers an efficient and robust way to model and implement methods on a computer. There is no need to rely upon hand crafted rules; only a set of training examples is needed. The parser learns efficient ways of parsing new sentences by collecting statistics on the context in which each parsing action takes place. Comparing to similar systems using some machine- learning technique, ours offers an approach in which linguistics, through context, can play a decisive role. The parser configuration can be change in many ways, to fit different types of corpus or domains. Some testing remains to be done to see how well the model scales up, but so far, promising results have paved the way for the use of the system to include even more sophisticated analysis, such as discourse information.
Your consent to our cookies if you continue to use this website.