M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/2002 1 LREC-2002, Las Palmas, May 2002 Mathieur Lafourcade & Christian Boitet.

Slides:



Advertisements
Similar presentations
CLUSTERING.
Advertisements

S é mantique lexicale Vecteur conceptuels et TALN Mathieu Lafourcade LIRMM - France
Conceptual vectors for NLP Lexical functions
Conceptual vectors for NLP MMA 2001 Mathieu Lafourcade LIRMM - France
Artificial Intelligence 12. Two Layer ANNs
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Slides from: Doug Gray, David Poole
Efficient access to TIN Regular square grid TIN Efficient access to TIN Let q := (x, y) be a point. We want to estimate an elevation at a point q: 1. should.
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Alternative Approach to Systems Analysis Structured analysis
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
C SC 620 Advanced Topics in Natural Language Processing Lecture 22 4/15.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Guessing Hierarchies and Symbols for Word Meanings through Hyperonyms and Conceptual Vectors Mathieu Lafourcade LIRMM - France
Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson
CPSC 411, Fall 2008: Set 4 1 CPSC 411 Design and Analysis of Algorithms Set 4: Greedy Algorithms Prof. Jennifer Welch Fall 2008.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Speaker Adaptation for Vowel Classification
Link Analysis, PageRank and Search Engines on the Web
Antonymy and Conceptual Vectors Didier Schwab, Mathieu Lafourcade, Violaine Prince Laboratoire d’informatique, de robotique Et de microélectronique de.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
CPSC 411, Fall 2008: Set 4 1 CPSC 411 Design and Analysis of Algorithms Set 4: Greedy Algorithms Prof. Jennifer Welch Fall 2008.
Oral Defense by Sunny Tang 15 Aug 2003
CS 484 – Artificial Intelligence
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Presented By: - Chandrika B N
GOOD, MULTILINGUAL interpretation, translation, resources What can we do for the OG-08? Christian BOITET GETA, CLIPS, IMAG-campus UJF & CNRS, Grenoble,
1 Vector Space Model Rong Jin. 2 Basic Issues in A Retrieval Model How to represent text objects What similarity function should be used? How to refine.
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
 1  Outline  stages and topics in simulation  generation of random variates.
1 State Space of a Problem Lecture 03 ITS033 – Programming & Algorithms Asst. Prof.
Multiple-Layer Networks and Backpropagation Algorithms
Zorica Stanimirović Faculty of Mathematics, University of Belgrade
Feature selection LING 572 Fei Xia Week 4: 1/29/08 1.
Arindam K. Das CIA Lab University of Washington Seattle, WA LIFETIME MAXIMIZATION IN ENERGY CONSTRAINED WIRELESS NETWORKS.
Latent Semantic Analysis Hongning Wang Recap: vector space model Represent both doc and query by concept vectors – Each concept defines one dimension.
Arindam K. Das CIA Lab University of Washington Seattle, WA MINIMUM POWER BROADCAST IN WIRELESS NETWORKS.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
A roadmap for MT : four « keys » to handle more languages, for all kinds of tasks, while making it possible to improve quality (on demand) International.
© Ch. Boitet & Wang-Ju Tsai (GETA, CLIPS) ICUKL-2002, Goa, 25-29/11/02 1 Proposals for solving some problems in UNL encoding International Conference on.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
Using Surface Syntactic Parser & Deviation from Randomness Jean-Pierre Chevallet IPAL I2R Gilles Sérasset CLIPS IMAG.
Mining Binary Constraints in Feature Models: A Classification-based Approach Yi Li.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
Improving Translation Selection using Conceptual Vectors LIM Lian Tze Computer Aided Translation Unit School of Computer Sciences Universiti Sains Malaysia.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
11/23/00UNU/IAS/UNL Centre1 The Universal Networking Language United Nations University Institute of Advanced Studies United Networking Language ® UNU/IAS.
1 Perceptron as one Type of Linear Discriminants IntroductionIntroduction Design of Primitive UnitsDesign of Primitive Units PerceptronsPerceptrons.
Lens effects in autonomous terminology and conceptual vector learning Mathieu Lafourcade LIRMM - France
Contextual Text Cube Model and Aggregation Operator for Text OLAP
A Simple English-to-Punjabi Translation System By : Shailendra Singh.
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Kim HS Introduction considering that the amount of MRI data to analyze in present-day clinical trials is often on the order of hundreds or.
Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Instance Based Learning
Conceptual vectors for NLP MMA 2001 Mathieu Lafourcade LIRMM - France
Antonymy and Conceptual Vectors
Artificial Intelligence 12. Two Layer ANNs
Text Categorization Berlin Chen 2003 Reference:
Lens effects in autonomous terminology and conceptual vector learning
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/ LREC-2002, Las Palmas, May 2002 Mathieur Lafourcade & Christian Boitet LIRMM, Montpellier GETA, CLIPS, IMAG, Grenoble UNL Lexical Selection with Conceptual Vectors

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/ Outline The problem: disambiguation in UNL-French deconversion Finding the known UW nearest to an unknown UW Finding the best French lemma for a given UW Conceptual vectors Nature & example on French (873 dimensions) Building (Dec. 201: 64,000 terms, 210,000 CVs) CVD (CV Disambiguation) running for French Recooking the vectors attached to a document tree Placing each recooked vector in the word sense tree Using CVD in UNL-French deconversion: ongoing

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/ The UNL-FR deconversion process UNL-FRA Graph (UW) UNL-L1 Graph “ UNL Tree ” GMA structure UMA structure UMC structure French utterance Validation & Localization Graph to tree conversion Structural transfer Paraphrase choice Morphological generation Syntactic generation Lexical Transfer Conceptual vectors computations UNL-FRA Graph (French LU)

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/ The problem: disambiguation in UNL-French deconversion Find the known UW nearest to an unknown UW known UWs:obj(open(icl>occur),door) (in KB context)a door opens obj(open(icl>do),door) one opens a door input graph:obj(open(icl>occur,ins>concrete thing),door) ins(open(icl>occur,ins>concrete thing),key…) a key opens a door / a door opens with a key ==> choose nearest open(icl>occur) for correct result Find best French lemma for a UW in a given context meeting(icl>event) ==> réunion [ACTION, DURATION…] rencontre [EVENT, MOMENT…]

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/ How to solve them? 1. unknown UW  best known UW 1.Accessing KB in real time impractical (web server) 2.KB not enough: still many possible candidates 2. known UW  best LU 1.Often no clear symbolic conditions for selection 2.Possibility to transform UNL  LUfr dictionary into a kind of neural net (cf. MSR MindNet) 3. a possible unifying solution: Lexical selection through DCV, Disambiguation using Conceptual Vectors which works quite well for French on large scale experiments

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/ Conceptual vectors CV = vector in concept space (4th level in Larousse) V(to tidy up) = CHANGE [0.84], VARIATION [0.83], EVOLUTION [0.82], ORDER [0.77], SITUATION [0.76], STRUCTURE [0.76], RANK [0.76] … V(to cut) = GAME [0.8], LIQUID [0.8], CROSS [0.79], PART [0.78] MIXTURE [0.78], FRACTION [0.75], TORTURE [0.75] WOUND [0.75], DRINK [0.74] … Global vector of a term = normalized sum of the CVs of its meanings/senses V(head) = HEAD [0.83],. BEGINNING [0.75], ANTERIORITY [0.74], PERSON [0.74] INTELLIGENCE [0.68], HIERARCHY [0.65], …

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/ Conceptual vectors and sense space Conceptual vector model Reminiscent of Vector Models (Salton and all.) & Sowa Applied on preselected concepts (not terms) Concepts are not independent Set of k basic concepts Thesaurus Larousse = 873 concepts (translation of Roget’s) A vector = a 873 uple of reals in [0..1] Encoding for each dimension C = 2 15 : [ ] Sense space = vector space + vector set

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/ Thematic relatedness Conceptual vector distance Angular Distance D A (x, y) = angle (x, y) 0 <= D A (x, y) <=  Interpretation if D A (x, y) = 0 x // y (colinear):same idea if D A (x, y) =  /2 x  y (orthogonal):nothing in common if D A (x, y) =  D A (x, y) = D A (x, -x):-x anti-idea of x  x’ x y

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/ Collection process Start from a few handcrafted term/meanings/vectors //running constantly on Lafourcade’s Mac <choose a word at random (with or without a CV)  find NL definitions of its senses (mainly on the Web)  for each sense definition SD analyze SD into linguistic tree TreeDef attach existing or null CVs to lexical nodes of TreeDef iterate  propagation of CVs in TreeDef (ling. rules used here) until CV(root) converges or limit of cycle numbers is reached CV(sense)  CV(root(TreeDef))  use vector distance to arrange the CVs of senses into a binary « discrimination tree »

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/ An example discrimination tree

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/ Status on French CVs By Dec  64,000 terms  210,000 CVs  Average of 3.3 senses/term Method  robot to access web lexicon servers  large coverage French analyzer by J.Chauché in Sigmart See more details on

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/ Disambiguation in French Recook the vectors attached to a document tree –Take a document –Analyze it with Sigmart analyzer into ONE possibly big tree (30 pages OK as a unit) –Use the same process as for processing definitions –Final CV(root) usable as thematic classifier of document –Final CV (lexemes) used as « sense in context » Place each recooked vector in the discrimination tree –Walk down the discrimination tree, using vector distance –Stop at nearest node:  If leave node, full disambiguation (relative to available sense set)  If internal node, partial disambigation (subset of senses)

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/ Example with some ambiguities The white ants strike rapidly the trusses of the roof

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/ Initialize: attach CVs to lexemes The white ants strike rapidly the trusses of the roof

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/ Up / Down propagation of the CVs

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/ Result: sense selection The white ants strike rapidly the trusses of the roof

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/ Disambiguation in UNL-French deconversion Our set-up Example input UNL-graph Outline of the process Two usages of DCV (disambiguation with CV) Finding the known UW nearest to an unknown UW Finding the best French lemma for a given UW

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/ A UNL input graph Ronaldo has headed the ball into the left corner of the goal”

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/ Corresponding UNL-tree with CVs attached: localization DCV 1- Ronaldo: agt corner: plt left: mod 1- goal(icl>thing): obj 1- goal(icl>thing): obj V thing (goal) V(human) V place (corner) V(left) V = V event (score) + V human (score) + V sport (score) 2- Ronaldo: pos V(human) V body (head) head(pof>body): ins

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/ Result of first step: the « best » UWs The vector contextualization generalizes both kinds of localization (lexical and cultural). On each node, the selected UW is the one in the UNL-French database which vector is the closest to the contextualized vector. Formulas used for up and dow propagation:

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/ Second step: select the « best » LUs Depending on the strategy of the generator, a lexical unit (LU) may be a lemma a whole derivational family (pay, payment, payable…) Dictionay:  { } Input: Output: LU i with nearest CV i

M. Lafourcade (LIRMM & Ch. Boitet (GETA, CLIPS)LREC-02, Las Palmas, 31/5/ Conclusion Another case of fruitful integration of symbolic & numerical methods Further work planned integration into running UNL-FR server work on feed-back (Pr SU’s line of thought)  if user corrects the choice of LU for chosen UW  or worse, if user chooses a LU corresponding to another UW! ==> then recompute vectors by giving more weight to chosen CVs