Automatic acquisition for low frequency lexical items Nuria Bel, Sergio Espeja, Montserrat Marimon.

Slides:



Advertisements
Similar presentations
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Advertisements

Cognitive Modelling – An exemplar-based context model Benjamin Moloney Student No:
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Uncertainty in Engineering The presence of uncertainty in engineering is unavoidable. Incomplete or insufficient data Design must rely on predictions or.
Lecture 22: Evaluation April 24, 2010.
LEDIR : An Unsupervised Algorithm for Learning Directionality of Inference Rules Advisor: Hsin-His Chen Reporter: Chi-Hsin Yu Date: From EMNLP.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.
Distributional Cues to Word Boundaries: Context Is Important Sharon Goldwater Stanford University Tom Griffiths UC Berkeley Mark Johnson Microsoft Research/
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Grammar induction by Bayesian model averaging Guy Lebanon LARG meeting May 2001 Based on Andreas Stolcke’s thesis UC Berkeley 1994.
WSD using Optimized Combination of Knowledge Sources Authors: Yorick Wilks and Mark Stevenson Presenter: Marian Olteanu.
Albert Gatt Corpora and Statistical Methods. Probability distributions Part 2.
Albert Gatt Corpora and Statistical Methods Lecture 9.
Today Evaluation Measures Accuracy Significance Testing
(ACM KDD 09’) Prem Melville, Wojciech Gryc, Richard D. Lawrence
McEnery, T., Xiao, R. and Y.Tono Corpus-based language studies. Routledge. Unit A 2. Representativeness, balance and sampling (pp13-21)
Title Extraction from Bodies of HTML Documents and its Application to Web Page Retrieval Microsoft Research Asia Yunhua Hu, Guomao Xin, Ruihua Song, Guoping.
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
WSEAS AIKED, Cambridge, Feature Importance in Bayesian Assessment of Newborn Brain Maturity from EEG Livia Jakaite, Vitaly Schetinin and Carsten.
Bayesian Networks. Male brain wiring Female brain wiring.
NLP superficial and lexic level1 Superficial & Lexical level 1 Superficial level What is a word Lexical level Lexicons How to acquire lexical information.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
Distributional Part-of-Speech Tagging Hinrich Schütze CSLI, Ventura Hall Stanford, CA , USA NLP Applications.
Information Retrieval and Web Search Text properties (Note: some of the slides in this set have been adapted from the course taught by Prof. James Allan.
Bayesian Extension to the Language Model for Ad Hoc Information Retrieval Hugo Zaragoza, Djoerd Hiemstra, Michael Tipping Presented by Chen Yi-Ting.
Improving Subcategorization Acquisition using Word Sense Disambiguation Anna Korhonen and Judith Preiss University of Cambridge, Computer Laboratory 15.
A Language Independent Method for Question Classification COLING 2004.
Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.
1 Statistical Properties for Text Rong Jin. 2 Statistical Properties of Text  How is the frequency of different words distributed?  How fast does vocabulary.
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Supervised Learning of Edges and Object Boundaries Piotr Dollár Zhuowen Tu Serge Belongie.
인공지능 연구실 황명진 FSNLP Introduction. 2 The beginning Linguistic science 의 4 부분 –Cognitive side of how human acquire, produce, and understand.
Classification Techniques: Bayesian Classification
Handling of missing values in lexical acquisition LREC 2010, La Valletta, Malta, May GRUP DE TECNOLOGIES DELS RECURSOS LINGÜÍSTICS (TRL) Handling.
LANGUAGE MODELS FOR RELEVANCE FEEDBACK Lee Won Hee.
Copyright © 2013 by Educational Testing Service. All rights reserved. 14-June-2013 Detecting Missing Hyphens in Learner Text Aoife Cahill *, Martin Chodorow.
CSKGOI'08 Commonsense Knowledge and Goal Oriented Interfaces.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
1 A Web Search Engine-Based Approach to Measure Semantic Similarity between Words Presenter: Guan-Yu Chen IEEE Trans. on Knowledge & Data Engineering,
Matwin Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Information Retrieval using Word Senses: Root Sense Tagging Approach Sang-Bum Kim, Hee-Cheol Seo and Hae-Chang Rim Natural Language Processing Lab., Department.
Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval O. Chum, et al. Presented by Brandon Smith Computer Vision.
Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.
1 1 Slide Simulation Professor Ahmadi. 2 2 Slide Simulation Chapter Outline n Computer Simulation n Simulation Modeling n Random Variables and Pseudo-Random.
Finding document topics for improving topic segmentation Source: ACL2007 Authors: Olivier Ferret (18 route du Panorama, BP6) Reporter:Yong-Xiang Chen.
Discovering Relations among Named Entities from Large Corpora Takaaki Hasegawa *, Satoshi Sekine 1, Ralph Grishman 1 ACL 2004 * Cyberspace Laboratories.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
SIMS 296a-4 Text Data Mining Marti Hearst UC Berkeley SIMS.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
1 Chapter 4, Part 1 Basic ideas of Probability Relative Frequency, Classical Probability Compound Events, The Addition Rule Disjoint Events.
Word classes and part of speech tagging. Slide 1 Outline Why part of speech tagging? Word classes Tag sets and problem definition Automatic approaches.
Word Sense and Subjectivity (Coling/ACL 2006) Janyce Wiebe Rada Mihalcea University of Pittsburgh University of North Texas Acknowledgements: This slide.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
David Mareček and Zdeněk Žabokrtský
Classification Techniques: Bayesian Classification
Data Mining Practical Machine Learning Tools and Techniques
N-Gram Model Formulas Word sequences Chain rule of probability
iSRD Spam Review Detection with Imbalanced Data Distributions
Opening Weka Select Weka from Start Menu Select Explorer Fall 2003
Extracting Why Text Segment from Web Based on Grammar-gram
Presentation transcript:

Automatic acquisition for low frequency lexical items Nuria Bel, Sergio Espeja, Montserrat Marimon

Lexical acquisition Induction of information about the linguistic properties of lexical items on the basis of its occurrences in texts. Historically, two lines of research –Induction of patterns of behavior predict classes and its members. –Linguistically justified patterns are sought as cues for classifying words into predefined classes.

The problem Both approaches use as much data as possible and run into problems with low frequency items (patterns or words). Brent (1993): a pattern that is more frequent than others cannot be hazard (noise). According to Zipf’s principle (1935), most of the words in any length corpus will occur few times, even just one time. And informative patterns have very unbalanced distribution, for instance in a corpus of more 3M: –“applicable” occurs 440 and 37% with “to” –“favorable” occurs 60 and 5% with “to”

State of the art Trying to eliminate noise with preprocessed text (Briscoe and Caroll, 1997; Korhonen, 2002) and different statistical filters. Using linguistic generalizations for better distribution (Chelsey and Salmon-Alt, 2006 and Preiss et al., 2007). Improve precision (80% approx.) but not recall (50% approx.)

State of the art (2) Taking Levin classes based on distributional differences, which are taken as cues, Merlo and Stevenson (2001) and Joanis et al. (2007). obtained 70% of accuracy with a Decision Tree. Korhonen 2002, used WordNet classes to get probabilities to smooth its statistical filters and obtained 71.2% recall. Linguistic information helps because it is independent of frequency information. But the cues are uncertain: optional occurrences of cues (silence or negative), cues shared by two classes …

Our proposal: Classes, features and cues To compensate lack of occurrences with linguistic knowledge Lexical classes are based in particular properties, i.e. grammatical features, each having one or more cues. We have built a Bayesian classifier for each grammatical feature instead than for a class. We use probabilistic information obtained from the linguistic definition of classes, instead than from data: it is not affected by frequency phenomena and unbalanced classes.

A probabilistic version of lexical classes If linguistic classes are based on different grammatical features (typed grammars), and each feature can be related to different contexts of occurrence (cues), then we can predict what are the contexts where a noun having a particular class will appear without looking at the data.

A probabilistic version of lexical classes P(count|trans) = 6/6 P(mass|trans) = 3/6

A probabilistic version of lexical classes Linguistic classes provide us with the likelihood information, P(v|sf), that has to be normally calculated with sample data and which is affected by Zipf frequency phenomena We can tune this likelihood because of known characteristics: uncertainty of silence

Assigning features to words (Anderson, 1991 and Xu and Tanenbaum, 2007): Z: σ→ SF, where σ is a word’s signature, the set of its occurrences  = {v1, v2,..., vz}, in a given corpus.

Evaluation CT-IULA, 1M pos tagged words approx. Gold-standard was the manually encoded lexica from Spanish Resource Grammar. A baseline of a majority based classifier as computed from the gold-standard lexica Same materials used for Bel et al. (2007) to compare with a Decision Tree (Weka, Witten and Frank, 2005).

Results Test set of 50 randomly chosen Spanish nouns occurring just once in the corpus Accuracy BaselineZ trans intrans mass pcomp count Total

Compared with DT (Bel et al. 2007) DT Z Prec. Rec. F nouns occurring just once trans intrans mass pcomp count randomly chosen Spanish nouns trans intrans mass pcomp count

Conclusions Our general conclusion, based on these experiments, is that linguistic knowledge, obtained by abstraction and generalization, can be used in conjunction with most powerful methods and techniques based on probabilistic methods to overcome the problem of the unbalanced distribution of linguistic data in particular, and the acquisition of lexical information in general.