Presentation is loading. Please wait.

Presentation is loading. Please wait.

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 1 LREC-2002, Las Palmas, May 2002 Mathieur Lafourcade & Christian Boitet.

Similar presentations


Presentation on theme: "M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 1 LREC-2002, Las Palmas, May 2002 Mathieur Lafourcade & Christian Boitet."— Presentation transcript:

1 M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 1 LREC-2002, Las Palmas, May 2002 Mathieur Lafourcade & Christian Boitet LIRMM, Montpellier GETA, CLIPS, IMAG, Grenoble Christian.Boitet@imag.fr http://www-clips.imag.fr/getahttp://www-clips.imag.fr/geta Mathieu.Lafourcade@lirmm.frMathieu.Lafourcade@lirmm.fr http://www.lirmm.fr/~lafourca UNL Lexical Selection with Conceptual Vectors

2 M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 2 Outline The problem: disambiguation in UNL-French deconversion Finding the known UW nearest to an unknown UW Finding the best French lemma for a given UW Conceptual vectors Nature & example on French (873 dimensions) Building (Dec. 201: 64,000 terms, 210,000 CVs) CVD (CV Disambiguation) running for French Recooking the vectors attached to a document tree Placing each recooked vector in the word sense tree Using CVD in UNL-French deconversion: ongoing

3 M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 3 The UNL-FR deconversion process UNL-FRA Graph (UW) UNL-L1 Graph “ UNL Tree ” GMA structure UMA structure UMC structure French utterance Validation & Localization Graph to tree conversion Structural transfer Paraphrase choice Morphological generation Syntactic generation Lexical Transfer Conceptual vectors computations UNL-FRA Graph (French LU)

4 M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 4 The problem: disambiguation in UNL-French deconversion Find the known UW nearest to an unknown UW known UWs:obj(open(icl>occur),door) (in KB context)a door opens obj(open(icl>do),door) one opens a door input graph:obj(open(icl>occur,ins>concrete thing),door) ins(open(icl>occur,ins>concrete thing),key…) a key opens a door / a door opens with a key ==> choose nearest open(icl>occur) for correct result Find best French lemma for a UW in a given context meeting(icl>event) ==> réunion [ACTION, DURATION…] rencontre [EVENT, MOMENT…]

5 M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 5 How to solve them? 1. unknown UW  best known UW 1.Accessing KB in real time impractical (web server) 2.KB not enough: still many possible candidates 2. known UW  best LU 1.Often no clear symbolic conditions for selection 2.Possibility to transform UNL  LUfr dictionary into a kind of neural net (cf. MSR MindNet) 3. a possible unifying solution: Lexical selection through DCV, Disambiguation using Conceptual Vectors which works quite well for French on large scale experiments

6 M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 6 Conceptual vectors CV = vector in concept space (4th level in Larousse) V(to tidy up) = CHANGE [0.84], VARIATION [0.83], EVOLUTION [0.82], ORDER [0.77], SITUATION [0.76], STRUCTURE [0.76], RANK [0.76] … V(to cut) = GAME [0.8], LIQUID [0.8], CROSS [0.79], PART [0.78] MIXTURE [0.78], FRACTION [0.75], TORTURE [0.75] WOUND [0.75], DRINK [0.74] … Global vector of a term = normalized sum of the CVs of its meanings/senses V(head) = HEAD [0.83],. BEGINNING [0.75], ANTERIORITY [0.74], PERSON [0.74] INTELLIGENCE [0.68], HIERARCHY [0.65], …

7 M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 7 Conceptual vectors and sense space Conceptual vector model Reminiscent of Vector Models (Salton and all.) & Sowa Applied on preselected concepts (not terms) Concepts are not independent Set of k basic concepts Thesaurus Larousse = 873 concepts (translation of Roget’s) A vector = a 873 uple of reals in [0..1] Encoding for each dimension C = 2 15 : [0..32767] Sense space = vector space + vector set

8 M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 8 Thematic relatedness Conceptual vector distance Angular Distance D A (x, y) = angle (x, y) 0 <= D A (x, y) <=  Interpretation if D A (x, y) = 0 x // y (colinear):same idea if D A (x, y) =  /2 x  y (orthogonal):nothing in common if D A (x, y) =  D A (x, y) = D A (x, -x):-x anti-idea of x  x’ x y

9 M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 9 Collection process Start from a few handcrafted term/meanings/vectors //running constantly on Lafourcade’s Mac <choose a word at random (with or without a CV)  find NL definitions of its senses (mainly on the Web)  for each sense definition SD analyze SD into linguistic tree TreeDef attach existing or null CVs to lexical nodes of TreeDef iterate  propagation of CVs in TreeDef (ling. rules used here) until CV(root) converges or limit of cycle numbers is reached CV(sense)  CV(root(TreeDef))  use vector distance to arrange the CVs of senses into a binary « discrimination tree »

10 M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 10 An example discrimination tree

11 M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 11 Status on French CVs By Dec. 2001  64,000 terms  210,000 CVs  Average of 3.3 senses/term Method  robot to access web lexicon servers  large coverage French analyzer by J.Chauché in Sigmart See more details on http://www.lirmm.fr/~lafourca

12 M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 12 Disambiguation in French Recook the vectors attached to a document tree –Take a document –Analyze it with Sigmart analyzer into ONE possibly big tree (30 pages OK as a unit) –Use the same process as for processing definitions –Final CV(root) usable as thematic classifier of document –Final CV (lexemes) used as « sense in context » Place each recooked vector in the discrimination tree –Walk down the discrimination tree, using vector distance –Stop at nearest node:  If leave node, full disambiguation (relative to available sense set)  If internal node, partial disambigation (subset of senses)

13 M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 13 Example with some ambiguities The white ants strike rapidly the trusses of the roof

14 M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 14 Initialize: attach CVs to lexemes The white ants strike rapidly the trusses of the roof

15 M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 15 Up / Down propagation of the CVs

16 M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 16 Result: sense selection The white ants strike rapidly the trusses of the roof

17 M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 17 Disambiguation in UNL-French deconversion Our set-up Example input UNL-graph Outline of the process Two usages of DCV (disambiguation with CV) Finding the known UW nearest to an unknown UW Finding the best French lemma for a given UW

18 M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 18 A UNL input graph Ronaldo has headed the ball into the left corner of the goal”

19 M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 19 Corresponding UNL-tree with CVs attached: localization DCV 1- Ronaldo: agt corner: plt left: mod 1- goal(icl>thing): obj score(icl>event,agt>human,fld>sport).@entry.@past.@complete 1- goal(icl>thing): obj V thing (goal) V(human) V place (corner) V(left) V = V event (score) + V human (score) + V sport (score) 2- Ronaldo: pos V(human) V body (head) head(pof>body): ins

20 M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 20 Result of first step: the « best » UWs The vector contextualization generalizes both kinds of localization (lexical and cultural). On each node, the selected UW is the one in the UNL-French database which vector is the closest to the contextualized vector. Formulas used for up and dow propagation:

21 M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 21 Second step: select the « best » LUs Depending on the strategy of the generator, a lexical unit (LU) may be a lemma a whole derivational family (pay, payment, payable…) Dictionay:  { } Input: Output: LU i with nearest CV i

22 M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 22 Conclusion Another case of fruitful integration of symbolic & numerical methods Further work planned integration into running UNL-FR server work on feed-back (Pr SU’s line of thought)  if user corrects the choice of LU for chosen UW  or worse, if user chooses a LU corresponding to another UW! ==> then recompute vectors by giving more weight to chosen CVs


Download ppt "M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 1 LREC-2002, Las Palmas, May 2002 Mathieur Lafourcade & Christian Boitet."

Similar presentations


Ads by Google