Presentation is loading. Please wait.

Presentation is loading. Please wait.

W ORD S ENSE D ISAMBIGUATION By Mahmood Soltani Tehran University 2009/12/24 1.

Similar presentations


Presentation on theme: "W ORD S ENSE D ISAMBIGUATION By Mahmood Soltani Tehran University 2009/12/24 1."— Presentation transcript:

1 W ORD S ENSE D ISAMBIGUATION By Mahmood Soltani Tehran University 2009/12/24 1

2 Introduction Knowledge source Lexical Learned World WSD Approaches Knowledge-base Curpos-base Unsupervised Approaches Supervised Approaches Some approach in detail A GENDA 2 WSD 12/24/09 Tehran University

3 Defining word sense : The phenomenon of lexical ambiguity is traditionally subdivided into polysemy and homonymy Polysemy: one work having several related meaning. Homonymy: two words have the same lexical form but different etymoloies and unrelated meaning. In dictionaries, each word is listed with a number of discrete senses and subsenses. possibly dierent from dictionary to dictionary The first step involved in the task of WSD is the determination of dierent senses for all words in the text to be disambiguated, the sense inventory 3 I NTRODUCTION WSD 12/24/09 Tehran University

4 I NTRODUCTION ( CONT ) What is word sense disambiguation? WSD refers to a task that automatically assigns a sense, selected from a set of pre-defined word senses to an instance of a polysemous word in a particular context. identify the correct sense of an ambiguous word in a sentence. determine which of the senses of an ambiguous word is invoked in a particular use of the word. … 4 WSD 12/24/09 Tehran University

5 I NTRODUCTION ( CONT ) word sense disambiguation step: Step 1: pre-defines sense A list of sense such as those found in everyday dictionaries A group of feature, catagories, or associated words An entry in transfer dictionary which includes translations in another language Step 2 : assignment of words to senses. The context of the word to be disambiguation External knowledge 5 WSD 12/24/09 Tehran University

6 I NTRODUCTION ( CONT ) Where is the use of WSD? Sense disambiguation is an“intermediate task”,which is not an end in itself, but rather is necessary at one level or another to accomplish most natural language processing tasks. Machine translation Information retrieval Speech processing Text processing (spelling detection) 6 WSD 12/24/09 Tehran University

7 I NTRODUCTION ( CONT ) Conceptual Model of WSD WSD is the matching of sense knowledge and word context. Sense knowledge can either be lexical knowledge defined in dictionaries, or world knowledge learned from training corpora. 7 WSD 12/24/09 Tehran University

8 8 Lexical Knowledge Lexical knowledge is usually released with a dictionary. It can be either symbolic, or empirical. It is the foundation of unsupervised WSD approaches. Learned World Knowledge World knowledge is too complex or trivial to be verbalized completely. So it is a smart strategy to automatically acquire world knowledge from the context of training corpora on demand by machine learning techniques Trend Use the interaction of multiple knowledge sources to approach WSD. K NOWLEDGE SOURCES WSD 12/24/09 Tehran University

9 L EXICAL K NOWLEDGE Sense frequency Usage frequency of each sense of a word. Sense gloss Sense definition and examples By counting common words between the gloss and the context of the target word, we can nai Concept Tree Represent the related concepts of the target in the form of semantic network as is done by WordNet. 9 WSD 12/24/09 Tehran University

10 L EXICAL K NOWLEDGE ( CONT ) Selectional Restrictions Syntactic and semantic restric Subject Code Refers to the category to which one sense of the target word belongs. Part of Speech (POS) POS is associated with a subset of the word senses in both WordNet and LDOCE. That is, given the POS of the target, we may fully or partially disambiguate its sense (Stevenson & Wilks, 2001). 10 WSD 12/24/09 Tehran University

11 L EARND WORLD K NOWLEDGE Domain-specific Knowledge Like selectional restrictions, it is the semantic restriction placed on the use of each sense of the target word. The restriction is more specific. Parallel Corpora Also called bilingual corpora, one serving as primary language, and the other as a secondary language. Using some third-party software packages, we can align the major words (verb and noun) between two languages. Because the translation process implies that aligned pair words share the same sense or concept, we can use this information to sense the major words in the primary language (Bhattacharya et al. 2004). 11 WSD 12/24/09 Tehran University

12 E ARLY WSD WORK IN MT The first attempts at automated sense disambiguation were made in the context of machine translation Weaver (1949) : outlines the basis of an approach to WSD. Reifler’s (1955) “semantic coincidences” between a word and its context,the role of syntactic relations, was also recognized Weaver’s (1949) Memorandum discusses the role of the domain in sense disambiguation Oswald,1952, 1957; Oettinger, 1955; etc : micro-glossaries contain only the meaning of a given word relevant for texts in a particular domain Several researchers attempted to devise an “interlingua” based on logical and mathematical principles …. without large-scale resources most of these ideas remained untested and to large extent, forgotten until several decades later. 12 WSD 12/24/09 Tehran University

13 WSD A PPROACHES there are three ways to approach the problem of WSD: a knowledge-based approach, which uses an explicit lexicon (machine readable dictionary (MRD), thesaurus) or ontology (e.g. WordNet). Corpus-based disambiguation, where the relevant information about word senses is gathered from training on a large corpus. Hybrid approach combining aspects of both of the forementioned methodologies 13 WSD 12/24/09 Tehran University

14 K NOWLEDGE - BASED APPROACH WSD systems building on the information contained in MRDs use the available material in various ways: Lesk (1986) was the first to use dictionary denitions to WSD: counts overlapping content words in the sense denitions & in the denitions of context words occurring nearby Yarowsky (1992), the sense of a word is dened as its category in Roget's International Thesaurus Yarowsky (1992), thesense of a word is dened as its category in Roget's International Thesaurus WordNet includes various potential sources of information Leacock et al. (1998) employ WordNet to counter data sparseness. 14 WSD 12/24/09 Tehran University

15 C ORPUS - BASED APPROACH A corpus-based approach extracts information on word senses from a large annotated data collection. Distributional information about an ambiguous word refers to the frequency distribution of its senses collocational or co-occurrence information part-of-speech … 15 WSD 12/24/09 Tehran University

16 C ORPUS - BASED APPROACH ( CONT ) There are two possible approaches to corpus-based WSD systems: Supervised approaches use annotated training data and basically amount to a classication task Unsupervised algorithms are applied to raw text material and annotated data is only needed for evaluation They correspond to a clustering task rather than a classication. Bootstrapping, looks like supervised approaches, but it needs only a few seeds instead of a large number of training examples 16 WSD 12/24/09 Tehran University

17 C ORPUS - BASED APPROACH ( CONT ) Supervised exemplar-based : the k-nearest neighbor technique has been employed most Rule-base :use algorithms, e.g. decision lists, which search for discriminatory features in the training data Probabilistic-base : use of dierent probabilistic classiers. Despite its relative simplicity, naive Bayes has been frequently applied 17 WSD 12/24/09 Tehran University

18 C ORPUS - BASED APPROACH ( CONT ) Unsupervised cluster the contexts of an ambiguous word into a number of groups and discriminate between them without labeling them. A clear disadvantage is that, so far, the performance of unsupervised systems lies a lot lower than that of supervised systems 18 WSD 12/24/09 Tehran University

19 N AIVE BAYES CLASSIFIER 19  Test Training  WSD 12/24/09 Tehran University

20 K-NN 20  Test Training  WSD 12/24/09 Tehran University

21 Any Question? 21 WSD 12/24/09 Tehran University


Download ppt "W ORD S ENSE D ISAMBIGUATION By Mahmood Soltani Tehran University 2009/12/24 1."

Similar presentations


Ads by Google