Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Unsupervised WSD Algorithm for a NLP System Iulia Nica, Andrés Montoyo, Sonia Vázquez and Mª Antonia Martí.

Similar presentations


Presentation on theme: "An Unsupervised WSD Algorithm for a NLP System Iulia Nica, Andrés Montoyo, Sonia Vázquez and Mª Antonia Martí."— Presentation transcript:

1 An Unsupervised WSD Algorithm for a NLP System Iulia Nica, Andrés Montoyo, Sonia Vázquez and Mª Antonia Martí

2 2 INDEX Introduction Architecture for the NLP System WSD Method Evaluation Conclusions Future Work

3 3 Introduction Natural Language Processing (NLP) techniques are necessary for current information systems. One problem of natural language is the ambiguity (phonological, morphological, syntactic, semantic or pragmatic). The resolution of lexical ambiguity is necessary for certain NLP applications: Machine Translation, Information Retrieval, Information Extraction, etc.

4 4 Introduction Word Sense Disambiguation (WSD) is an intermediate task that attemps to resolve lexical ambiguity problem, assigning to each word its appropriate meaning. WSD uses two information sources: Context. External Knowledge Sources. WSD approaches: Knowledge-driven. Data-driven.

5 5 Introduction WSD method characteristics: Knowledge-driven. Unsupervised. Information sources: EuroWordNet. Untagged large corpus. Sense assignment uses paradigmatic information. Easily adaptable to other languages.

6 6 Architecture for the PLN System POS-analyser (MACO) POS-tagger (RELAX) Shallow parser (TACAT) WSD module INPUT OUTPUT Corpus Sense Discriminators EWN Untagged text Extracts all possible POS-tags Selects only one morphosyntactic category Identifies sentence’s constituents Text annotated with POS-tags, chunks and noun senses Set of nouns derived from lexical-semantic relations of EWN

7 7 WSD method It operates on paradigmatic information. It extracts paradigmatic information for an ambiguous occurrence and it maps this information to the paradigmatic information from the lexicon. It lays on the base that semantically similar words can substitute each other in the same context and, inversely, words that can commute in a context have a good probability to be close semantically.

8 8 WSD method It uses a POS-tagged corpus for searching syntactic patterns (the corpus of EFE News Agency, over 70M words). For the identification of patterns, it follows a structural criterion, using a list of basic patterns and search schemes. Each syntactic pattern is identified at the lemmas and POS levels.

9 9 WSD method Syntactic patterns:X-R-Y X and Y are lexical content units (nouns, adjectives, verbs and adverbs). R is a relational element (functional words: prepositions, conjunctions,  ). The pattern expresses a syntactic relation between X and Y. Examples: grano - noun de - preposition azúcar - noun pasaje - noun subterráneo - adjective

10 10 WSD method Definition of basic patterns: N, N N C N N P N N A N V A N V N Conjunctions = {y, e, o, u} N  Noun R  Adverb A  Adjective V  Participle Verb C*  Conjunction D  Determinant

11 11 WSD method Each basic pattern has discontinuous realisations in texts. We pre-establish morphosyntactic schemes for the search of patterns; e.g.: N (((R) R) A/V), ((D) D) (((R) R) A/V) N N (((R) R) A/V) C* ((D) D) (((R) R) A/V) N N (((R) R) A/V) P ((D) D) (((R) R) A/V) N N ((R) R) A (C* ((R) R) A/V) N ((R) R) V (C* ((R) R) A/V) (A/V C* ((D) D) (((R) R)) A N (A/V C* ((D) D) (((R) R)) V N The units between brackets are optional, those separated by a bare are alternatives for a position.

12 12 WSD method For each search scheme, we define decomposition rules in order to extract the basic patterns. Example: Each unit of the sequence is considered also at the lemma level. NAC*A NA Coronas danesas y suecas Corona danesaCorona sueca

13 13 WSD method Information is extracted from two sources: Corpus (paradigmatic information). Sentences (syntagmatic information). Paradigmatic information is extracted by exploiting the syntactic patterns Example: obra concierto pieza Paradigmatic relations paraórgano Syntagmatic relations

14 14 WSD method Sense discriminators obtained from EWN: Selection of all nouns related to each sense along the different lexical-semantic relations. Elimination of the common elements between different senses. Disjunctive sets of nouns for the senses of a word.

15 15 WSD method Commutative test: Hypothesis: If two words can commute in a given context, they have a good probability to be semantically close. Application: If the ambiguous word can be substituted with a sense discriminator inside a syntactic pattern, then it has the sense corresponding to that discriminator. The algorithm operates with words from a sense-untagged corpus

16 16 WSD method Commutative Test Algorithm X – R - Y__ – R - YX k – R - YXkXk d ij d i0j d nj SD 1 SD i0 SD n X_i0 – R - Y X _ ? – R - Y Corpus YES NO

17 17 WSD method WSD module has two heuristics: H1: Commutative Test Algorithm applied on the paradigmatic information (the nouns obtained from substituting the ambiguous occurrence in the pattern). H2: Commutative Test Algorithm applied on the syntagmatic information (the nouns obtained from the sentence). The two heuristics act as voters for the sense assignment.

18 18 WSD method Example: Los enormes y continuados progresos científicos y técnicos de la Medicina actual han logrado hacer descender espectacularmente la mortalidad infantil, erradicar multitud de enfermedades hasta hace poco mortales, sustituir mediante trasplante o implantación del cuerpo inutilizadas y alargar las expectativas de vida. 1. Input text POS-tagging. 2. Syntactic patterns identification Use of search schemes Use of decomposition rules. 3. Extraction of information From corpus From sentence. órganos dañados o partes NACN NANCN órgano dañadoórgano o parte Scheme Decomposition Rules Final Result mediador, terreno, chófer, árbol, cabeza, planeta, parte, incremento, totalidad, guerrilla, programa, mitad, país, temporada, artículo, tercio progreso, científico, mortalidad, multitud, enfermedad, mortal, trasplante, implantación, órgano, parte, cuerpo, expectativa, vida From corpus From sentence 4. Extraction of Sense Discriminators. Sense 1: órgano vegetal, espora, flor, pera, manzana, bellota, hinojo, semilla, poro, píleo, carpóforo,... Sense 2: agencia, unidad administrativa, banco central, servicio secreto, seguridad social, FBI,... Sense 3: parte del cuerpo, trozo, músculo, riñón, oreja, ojo, glándula, lóbulo, tórax, dedo, articulación, rasgo, facción,... Sense 4: instrumento de viento, instrumento musical, mecanismo, aparato, teclado, pedal, corneta,... Sense 5: periódico, publicación, medio de comunicación, método, serie, serial, número, ejemplar,... Sense Discriminators Sets 5. Commutative Test. 6. Final sense asignment órgano#3: A fully differentiated structural and functional unit in an animal that is specialized for some particular function. S1  SD1 =  S1  SD2 =  S1  SD3   S1  SD4 =  S1  SD5 =  S2  SD1 =  S2  SD2 =  S2  SD3   S2  SD4 =  S2  SD5 =  Heuristic 1Heuristic 2

19 19 Evaluation The WSD method was tested with the Spanish Lexical Sample task of Senseval-2. For the evaluation, we selected all 17 nouns of this task. We used the two heuristics H1 & H2.

20 20 Evaluation Results obtained: PrecisionRecallCoverage H10,540,110,21 H20,590,040,07 H1 + H20,560,150,27

21 21 Evaluation In Senseval-2, the values for the individual words reached the following level: Precision = 51,4% - 71,2% Recall = 50,3% - 71,2% Coverage = 98% – 100%

22 22 Conclusions This WSD method can be used as a module in a NLP system to prepare an input text to a real application. It is independent of any corpus tagging at syntactic or semantic level. It requires only a minimal preprocessing phase (POS-tagging) of the input text and of the search corpus.

23 23 Future work Study of different possibilities to improve the WSD process. Aplication of new algorithms over information associated to the ambiguous occurrence. Combination with other data-driven WSD methods.

24 An Unsupervised WSD Algorithm for a NLP System Iulia Nica, Andrés Montoyo, Sonia Vázquez and Mª Antonia Martí Thank you!!


Download ppt "An Unsupervised WSD Algorithm for a NLP System Iulia Nica, Andrés Montoyo, Sonia Vázquez and Mª Antonia Martí."

Similar presentations


Ads by Google