Presentation on theme: "Interaction Grammars and their implementation in LEOPAR Guy Perrier University Nancy2 - LORIA (Nancy)"— Presentation transcript:
Interaction Grammars and their implementation in LEOPAR Guy Perrier University Nancy2 - LORIA (Nancy)
1- Why a new linguistic formalism ? Some crucial points in the design of a linguistic formalism : o The form of the basic bricks, oThe composition rules, oThe syntax-semantics interface. Among the usual formalisms, none prevails on all others.
1- Why a new linguistic formalism ? Interaction Grammars The originality of Interaction Grammars (CoLing 2000): underspecified trees tree descriptions TAG oFor the syntax, the basic bricks are underspecified trees represented in the form of tree descriptions (this aspect comes from formalisms stemming from TAG ); superpositionpolarity Categorial Grammars oThe composition of underspecified trees to build completely specified trees is performed by superposition under the control of a polarity system. Polarity neutralization expresses the saturation of syntactic structures (this aspect comes from Categorial Grammars ).
2- The importance of an experimental approach real corpora The relevance of a linguistic formalism can only be proved in a confrontation with real corpora. parser The development of the LEOPAR parser answers this ambition. The change of scale requires two conditions : oparsing algorithmsefficient oparsing algorithms that are efficient enough to overcome the explosion of ambiguity which follows; olexicons grammars large coverage olexicons and grammars with large coverage.
3 - The formalism of Interaction Grammars tree descriptions The basic syntactic objects are tree descriptions : a tree description is a set of relations and properties on tree nodes representing syntactic constituents. dominance relations precedence relations Relations are (immediate and large) dominance relations or (immediate and large) precedence relations. feature structures Nodes are labelled with feature structures describing properties of syntactic constituents. Feature values are atoms or atom disjunctions and they can be shared by several features.
3 - The formalism of Interaction Grammars Featurespolarized Features are polarized : negative features o negative features (f v) represent expected resources; opositive features opositive features (f v) represent available resources; oneutral features oneutral features (f = v) represent properties which do not behave as consumable resources.
3 - The formalism of Interaction Grammars models A syntactic description represents an underspecified syntactic tree. In other words, it represents a family of syntactic trees which are the models of the description. neutral and minimal models Among all models of a description, only neutral and minimal models are linguistically relevant: neutral model oA neutral model realizes the neutralisation of every negative feature with a positive feature and conversely. minimal model oA minimal model adds a minimum of information to the description.
3 - The formalism of Interaction Grammars feature neutralisation The construction of neutral and minimal models for a description is performed by iterating the operation of feature neutralisation : this operation consists in merging two nodes labelled with two dual features (f v and f v). partial tree superposition The neutralisation of two features entails a partial tree superposition by propagating constraints defining the description.
3 - Modelling of syntactic phenomena in French Barriers to extraction oLinvitation que Jean demande à Marie oLinvitation que Jean pense demander à Marie o* Linvitation que Marie connaît Jean qui demande Pied piping oA la femme de qui Jean demande-t-il une invitation ? oA la femme de qui Jean pense-t-il demander une invitation ? Negation (ne … personne, ne… aucun) oPersonne ne demande une invitation à Marie. oJean ne demande aucune invitation à Marie. oJean ne demande une invitation à personne. oJean ne demande une invitation à la femme daucun ingénieur.
4 - Principle of the LEOPAR parser Guillaume Bonfante Bruno GuillaumeSylvain PogodallaGuy Perrier LEOPAR is developed inside the Calligramme team by Guillaume Bonfante, Bruno Guillaume, Sylvain Pogodalla and Guy Perrier. This work started in 2003. After a first release of the parser, a second release is now available. It includes 17000 lines of OCAML code. The parser is freely downloadable under Cecill licence at URL : http://www.loria.fr/equipes/calligramme/leopar/download.html. http://www.loria.fr/equipes/calligramme/leopar/download.html
4 - Principle of the LEOPAR parser Parsing of the sentence : Jean a demandé une invitation à Marie tokenizationlexical selection JeandemandéuneinvitationàMariea ProperNoun N0VS1aN2 StandardDetNaN1deN2VerbPrep 120x N0VN1aN2...... x1 x ProperNoun CommonN...... 4x41x= 2560 InfCompl...... 8x N0VN1 Avoir......
4 - Principle of the LEOPAR parser Input filtering ProperNoun N0VS1aN2 StandardDetNaN1deN2VerbPrep 120x N0VN1aN2...... x1 x ProperNoun CommonN...... 4x41x = 2560 InfCompl...... 8x N0VN1 Avoir...... ProperNoun StandardDetVerbPrep N0VN1aN2 ProperNounCommonN Avoir = 3
4 - Principle of the LEOPAR parser Parsing ProperNoun StandardDetVerbPrep N0VN1aN2 ProperNounCommonN Avoir = 3 JeandemandéuneinvitationàMarie PP NP S V DetNPrepNP a Aux V
5 - Input filtering Principle polarity balance null Principle : for every input choice, there is a parse only if the polarity balance is null for every feature and for every feature value. global This is a global input filtering criterion. automaton For every feature value, we build an automaton which counts polarities. A path in the automaton represents an input choice and we keep it only if the polarity balance is null along this path for the considered feature value. nondeterministic Because feature values can take the form of disjunctions, the automaton can be nondeterministic. It is determinised by computing possible polarity intervals instead of precise values. Filtering can be improved in different ways : bounding polarity intervals, using specific properties of coordination, adding probabilities.
6 - Parsing The principle is to build a neutral and minimal model of the syntactic description corresponding to every path in the automaton. left-to-right strategy bound The current strategy implemented in LEOPAR is a left-to-right strategy. In order to reduce the search space, a bound is put on the number of active polarities allowed during the parsing process. shift step reduce step The automaton is visited from left to right. If the number of active polarities in the current description is under the bound, we take a shift step in the automaton, increasing the current description. Otherwise, we take a reduce step : we reduce the number of active polarities under the bound by performing neutralisations.
6 - Parsing not complete order The strategy has two drawbacks: because of the bound on the number of active polarities, it is not complete and, in order to avoid to produce the same solution several times, the sequence of neutralisations must respect a fixed order. top-down strategy The parsing efficiency can be improved by using a top-down strategy. Robustnessbottom-up strategy Robustness can be taken into account by using a bottom-up strategy.
7 - Lexical and grammatical resources with large coverage The construction and the maintenance of large lexicons and grammars require to conciliate the size of such resources with linguistic (readability) and computing (efficiency) constraints. reusable These resources should be reusable as much as possible for other formalisms. freely available All the resources which we produce are freely available.
8 - A lexicon independent of the formalism morpho-syntactic lexicon The lexicon used by the parser is not built directly but it results from the combination of a morpho-syntactic lexicon independent of the formalism with a grammar written in the formalism of Interaction Grammars. morphological lexiconsyntactic lexicon The morpho-syntactic lexicon results from the combination of a morphological lexicon with a syntactic lexicon. TSNLP We have built a syntactic lexicon with 400 entries in order to test LEOPAR on the French sentences of the TSNLP (Test Suite for Natural Language Processing). LADL In a joint work with Claire Gardent, Bruno Guillaume and Ingrid Falk, we have designed a method to extract a lexicon from the LADL tables. With this method, we have produced a lexicon from 11 tables and 2000 verbs.
9 - A two-level grammar : source and object The principle is to consider two levels for the grammar : sourcegrammar oA source grammar is written by a human in a high level language well suited to the expression of linguistic regularities. compiledobjectgrammar oThe source grammar is compiled into an object grammar which is directly usable in a NLP system. Denys Duchier, Joseph Le Roux, Yannick Parmentier and Benoit Crabbé (LORIA) have developed a grammatical description language associated with a compiler. The system is called XMG (eXtendible MetaGrammar). We used XMG to produce a French interaction grammar (740 descriptions).
10 - Prospects robustness To develop more efficient parsing strategies which integrate robustness. semantics To integrate semantics. coverage To extend the coverage of the French grammar. statistics To improve the efficiency of the parser by using statistics.