Finding Predominant Word Senses in Untagged Text Diana McCarthy & Rob Koeling & Julie Weeds & Carroll Department of Indormatics, University of Sussex {dianam,

Slides:

Advertisements

Similar presentations

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.

Advertisements

Improved TF-IDF Ranker

A UTOMATICALLY A CQUIRING A S EMANTIC N ETWORK OF R ELATED C ONCEPTS Date: 2011/11/14 Source: Sean Szumlanski et. al (CIKM’10) Advisor: Jia-ling, Koh Speaker:

How dominant is the commonest sense of a word? Adam Kilgarriff Lexicography MasterClass Univ of Brighton.

1 Extended Gloss Overlaps as a Measure of Semantic Relatedness Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth.

Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.

Extracting an Inventory of English Verb Constructions from Language Corpora Matthew Brook O’Donnell Nick C. Ellis Presentation.

CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?

CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.

Collective Word Sense Disambiguation David Vickrey Ben Taskar Daphne Koller.

Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam

1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.

Creating a Bilingual Ontology: A Corpus-Based Approach for Aligning WordNet and HowNet Marine Carpuat Grace Ngai Pascale Fung Kenneth W.Church.

Chapter 16 Chi Squared Tests.

CS 4705 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised –Dictionary-based.

Latent Semantic Analysis (LSA). Introduction to LSA Learning Model Uses Singular Value Decomposition (SVD) to simulate human learning of word and passage.

Using Information Content to Evaluate Semantic Similarity in a Taxonomy Presenter: Cosmin Adrian Bejan Philip Resnik Sun Microsystems Laboratories.

Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.

Aiding WSD by exploiting hypo/hypernymy relations in a restricted framework MEANING project Experiment 6.H(d) Luis Villarejo and Lluís M à rquez.

McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)

Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.

Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

A Fully Unsupervised Word Sense Disambiguation Method Using Dependency Knowledge Ping Chen University of Houston-Downtown Wei Ding University of Massachusetts-Boston.

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

Word Sense Disambiguation UIUC - 06/10/2004 Word Sense Disambiguation Another NLP working problem for learning with constraints… Lluís Màrquez TALP, LSI,

Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.

1 Statistical NLP: Lecture 9 Word Sense Disambiguation.

Paper Review by Utsav Sinha August, 2015 Part of assignment in CS 671: Natural Language Processing, IIT Kanpur.

Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei.

W ORD S ENSE D ISAMBIGUATION By Mahmood Soltani Tehran University 2009/12/24 1.

Improving Subcategorization Acquisition using Word Sense Disambiguation Anna Korhonen and Judith Preiss University of Cambridge, Computer Laboratory 15.

SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.

Expressing Implicit Semantic Relations without Supervision ACL 2006.

An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee

Efficiently Computed Lexical Chains As an Intermediate Representation for Automatic Text Summarization H.G. Silber and K.F. McCoy University of Delaware.

2014 EMNLP Xinxiong Chen, Zhiyuan Liu, Maosong Sun State Key Laboratory of Intelligent Technology and Systems Tsinghua National Laboratory for Information.

A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:

Page 1 SenDiS Sectoral Operational Programme "Increase of Economic Competitiveness" "Investments for your future" Project co-financed by the European Regional.

CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.

Modelling Human Thematic Fit Judgments IGK Colloquium 3/2/2005 Ulrike Padó.

HyperLex: lexical cartography for information retrieval Jean Veronis Presented by: Siddhanth Jain( ) Samiulla Shaikh( )

LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.

Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,

Detecting a Continuum of Compositionality in Phrasal Verbs Diana McCarthy & Bill Keller & John Carroll University of Sussex This research was supported.

Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏

Finding frequent and interesting triples in text Janez Brank, Dunja Mladenić, Marko Grobelnik Jožef Stefan Institute, Ljubljana, Slovenia.

Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.

Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,

Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.

Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.

1 Measuring the Semantic Similarity of Texts Author ： Courtney Corley and Rada Mihalcea Source ： ACL-2005 Reporter ： Yong-Xiang Chen.

1 Gloss-based Semantic Similarity Metrics for Predominant Sense Acquisition Ryu Iida Nara Institute of Science and Technology Diana McCarthy and Rob Koeling.

Efficient Unsupervised Discovery of Word Categories Using Symmetric Patterns and High Frequency Words Dmitry Davidov, Ari Rappoport The Hebrew University.

From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:

Event-Based Extractive Summarization E. Filatova and V. Hatzivassiloglou Department of Computer Science Columbia University (ACL 2004)

Semantic Grounding of Tag Relatedness in Social Bookmarking Systems Ciro Cattuto, Dominik Benz, Andreas Hotho, Gerd Stumme ISWC 2008 Hyewon Lim January.

Learning Event Durations from Event Descriptions Feng Pan, Rutu Mulkar, Jerry R. Hobbs University of Southern California ACL ’ 06.

Word Sense and Subjectivity (Coling/ACL 2006) Janyce Wiebe Rada Mihalcea University of Pittsburgh University of North Texas Acknowledgements: This slide.

Feature Assignment LBSC 878 February 22, 1999 Douglas W. Oard and Dagobert Soergel.

NTNU Speech Lab 1 Topic Themes for Multi-Document Summarization Sanda Harabagiu and Finley Lacatusu Language Computer Corporation Presented by Yi-Ting.

Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.

Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.

SENSEVAL: Evaluating WSD Systems

WordNet WordNet, WSD.

CS 620 Class Presentation Using WordNet to Improve User Modelling in a Web Document Recommender System Using WordNet to Improve User Modelling in a Web.

Unsupervised Word Sense Disambiguation Using Lesk algorithm

Presentation transcript:

Finding Predominant Word Senses in Untagged Text Diana McCarthy & Rob Koeling & Julie Weeds & Carroll Department of Indormatics, University of Sussex {dianam, robk, juliewe, ACL 2004

Introduction In word sense disambiguation, the heuristic of choosing the most common sense is extremely powerful. Does not take surrounding context into account Assumes some quality of the hand-tagged data One would expect the frequency distribution of the senses to depend on the domain of the text. We present work on the use of automatically acquired thesaurus and WordNet similarity package to find the predominant sense. Does not require any hand-tagged text, such as SemCor

In SENSEVAL-2 even systems which show superior performance to the(above) heuristic often make use of the heuristic where evidence from the context is not sufficient. There is a strong case for obtaining a predominant sense from untagged corpus data so that a WSD system can be tuned to the domain.

SemCor comprises a relatively small sample of 250,000 words. tiger -> audacious person / carnivorous animal Our work is aimed at discovering the predominant senses from raw text. Hand-tagged data is not always available Can produce predominant senses for the domain type required. We believe that automatic means of finding a predominant sense can be useful for systems that use it as backing-off and as lexical acquisition under limiting-size hand-tagges sources.

Many researchers are developing thesaurus from automatically parsed data. Each target word is entered with an ordered list of “ nearest neighbors ” which are ordered in terms of the “ distributional similarity ” with the target word. Distributional similarity is a measure indicating the degree of co- occurrence in contexts between two words. The quality and similarity of the neighbors pertaining to different senses will reflect the dominance of the sense. The neighbors of star in a corpus provided by Lin has the ordered neighbors: superstar, player, termmate, …, galaxy, sun, world,…

Method We use a thesaurus based on the method of Lin(1998) which provides k nearest neighbors to each target word along with distributional similarity scores. Then use the WordNet similarity package to weight the contribution that each neighbor makes to the various senses of the target word. We rank each sense ws i using: N w = { n 1, n 2, …, n k } be the top scoring k neighbors along with DS scores { dss(w, n 1 ), dss(w, n 2 ), …, dss(w, n k ) }

Acquiring the Automatic Thesaurus The thesaurus was acquired using the method described by Lin(1998). For input we use grammatical relation data extracted using an automatic parser. A noun w is described using a set of co-occurrence triples and associated frequencies, where r is a grammatical relation and x is a possible co-occurrence with w in the relation. For every pair of nouns where each noun has total frequency in the tuple>9, compute their distributional similarity. If T(w) is the set of co-occurrence types (r, x) such that I(w, r, x) is positive then distributional similarity of two noun w and n is dss(w, n) :

Automatic Retrieval and Clustering of Similar Words Dekang Lin Proceeding of COLING-ACL 98 The meaning of an unknown word can often be inferred from its context. We use a broad-coverage parser to extract dependency triples from text corpus. A dependency triple consists two words and the grammatical relationship. The triples extracted from “ I have a brown dog ” are: The description of a word w consists of the frequency counts of all dependency triples that match the pattern (w, *, *). For example the description of the word cell is:

Measure the amount of information in the statement that a randomly selected triple is (w, r, w’), when we do not know the value of ||w, r, w’||. An occurrence of the triple (w, r, w’) can be regarded as the co- occurrence of three events. A : a randomly selected word is w. B : a randomly selected dependency type is r. C : a randomly selected word is w’. Assume that A and C are conditionally independent given B, thus the probability is given by:

Measure the amount of information when we know the value of ||w, r, w’||, and the difference is the information contained in ||w, r, w’|| = c. Let T(w) be the set of pairs (r, w’) such that I(w, r, w’) is positive, define the similarity (w 1, w 2 ) between words w 1 and w 2 as follows:

The WordNet Similarity The WordNet similarity package supports a range of similarity scores. lesk : maximizes the number of overlapping words in the gloss, or definition, of the senses. jcn : each synset is incremented with the frequency counts from the corpus of all words belonging to the synset. Calculate the “ information content ” IC(s) = -log(p(s)) D jcn (s 1, s 2 ) = IC(s 1 ) + IC(s 2 ) – 2* IC(s 3 ), where s 3 is the most informative superordinate synset of s 1 and s 2. jcn(s 1, s 2 ) = 1/ D jcn (s 1, s 2 )

Experiment with SemCor We generated a thesaurus entry for all polysemous nouns which occurred in SemCor>2 and BNC>9 times in the grammatical relations. jcn use the BNC corpus, and the thesaurus entry k set to 50. The accuracy of finding the predominant sense in SemCor and the WSD accuracy on SemCor when using our first sense in all contexts are as follows:

We choose jcn on remaining experiments because this gave good results for finding the predominant sense and is more efficient. There are cases where the acquired first sense disagree with SemCor yet is intuitively plausible. pipe -> tobacco pipe / tube made of metal or plastic used to carry water, oil or gas etc… with nearest neighbors tube, cable, wire, tank, hole, cylinder, … soil -> filth, stain, the state of being unclean / dirt, ground, earth, this seems intuitive given our expected usage in modern British English.

Experiment in SENSEVAL-2 English all Words Data To see how well the predominant sense perform on a WSD task we use the SENSEVAL-2 all-words data. We do not assume that ours is not method of WSD, however, it is important to know the performance for any system that use it. Generate a thesaurus entry for all polysemous nouns in WordNet and compare the results using the first sense in SemCor and the SENSEVAL-2 all-words data itself. Trivially label all monosemous items.

The automatically acquired predominant sense performs nearly as well as the hand-tagged SemCor. Use only raw text with no manual labeling. The items not covered by our method were those with insufficient grammatical relations for the tuples employed. today and one each occurred 5 times in the test data. Extending the grammatical relations used for building the thesaurus should improve the coverage.

Experiment with Domain Specific Corpora A major motivation is to try to capture changes in ranking of senses for documents from different domains. We selected two domains: SPORTS (35317 docs) and FINANCE ( docs) from the Reuters corpus and acquire thesaurus from these corpora. We selected a number of words and evaluate these words qualitatively. The words are not chosen randomly since we anticipated different predominant senses for these words. Additionally we evaluated quantitatively using the Subject Field Codes source which annotates WordNet synsets with domain labels. We selected words that have at least one SPORTS and one ECONOMY labels, resulting 38 words.

The results are summarized below with the WordNet sense number for each word. Most words show the change in predominant senses. The first sense of the words like division, tie and goal shift towards the more specific senses. The senses of the word share remains the same, however the sense stock certificate ended up in higher rank for FINANCE domain.

The figure shows the domain labels distribution of the predominant senses using SPORTS and FINANCE corpora for the set of 38 words. Both domains have a similarity percentage of factotum (domain independent) labels. As we expect, the other peaks correspond to the economy label for the FINANCE corpus and sports label for the SPORTS corpus.

Conclusions We have devised a method that use raw corpus data to automatically find a predominant sense of nouns in WordNet. We have demonstrated the possibility of finding predominant sense in domain specific corpora. In the future we will investigate the effect of the frequency and choice of distributional similarity measure and apply our method for words whose PoS other than noun. It would be possible to use this method with another inventory given a measure of semantic relatedness between the neighbors and the senses.