A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:

Slides:

Advertisements

Similar presentations

Automatic Identification of Cognates, False Friends, and Partial Cognates University of Ottawa, Canada University of Ottawa, Canada.

Advertisements

Author : Zhen Hai, Kuiyu Chang, Gao Cong Source : CIKM’12 Speaker : Wei Chang Advisor : Prof. Jia-Ling Koh ONE SEED TO FIND THEM ALL: MINING OPINION FEATURES.

D ETERMINING THE S ENTIMENT OF O PINIONS Presentation by Md Mustafizur Rahman (mr4xb) 1.

Multi-Perspective Question Answering Using the OpQA Corpus Veselin Stoyanov Claire Cardie Janyce Wiebe Cornell University University of Pittsburgh.

Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06.

CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?

Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam

Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.

1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.

Learning Subjective Nouns using Extraction Pattern Bootstrapping Ellen Riloff, Janyce Wiebe, Theresa Wilson Presenter: Gabriel Nicolae.

The Use of Corpora for Automatic Evaluation of Grammar Inference Systems Andrew Roberts & Eric Atwell Corpus Linguistics ’03 – 29 th March Computer Vision.

Learning Subjective Adjectives from Corpora Janyce M. Wiebe Presenter: Gabriel Nicolae.

Learning syntactic patterns for automatic hypernym discovery Rion Snow, Daniel Jurafsky and Andrew Y. Ng Prepared by Ang Sun

Analyzing Sentiment in a Large Set of Web Data while Accounting for Negation AWIC 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam.

1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.

LEARNING WORD TRANSLATIONS Does syntactic context fare better than positional context? NCLT/CNGL Internal Workshop Ankit Kumar Srivastava 24 July 2008.

A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora Benjamin Arai Computer Science and Engineering Department.

Learning Table Extraction from Examples Ashwin Tengli, Yiming Yang and Nian Li Ma School of Computer Science Carnegie Mellon University Coling 04.

Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text Soo-Min Kim and Eduard Hovy USC Information Sciences Institute 4676.

Mining and Summarizing Customer Reviews

Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.

Automatic Extraction of Opinion Propositions and their Holders Steven Bethard, Hong Yu, Ashley Thornton, Vasileios Hatzivassiloglou and Dan Jurafsky Department.

Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.

Carmen Banea, Rada Mihalcea University of North Texas A Bootstrapping Method for Building Subjectivity Lexicons for Languages.

2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.

PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.

1 A Unified Relevance Model for Opinion Retrieval (CIKM 09’) Xuanjing Huang, W. Bruce Croft Date: 2010/02/08 Speaker: Yu-Wen, Hsu.

CROSSMARC Web Pages Collection: Crawling and Spidering Components Vangelis Karkaletsis Institute of Informatics & Telecommunications NCSR “Demokritos”

 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.

Intelligent Database Systems Lab Presenter : WU, MIN-CONG Authors : Jorge Villalon and Rafael A. Calvo 2011, EST Concept Maps as Cognitive Visualizations.

Learn to Comment Lance Lebanoff Mentor: Mahdi. Emotion classification of text  In our neural network, one feature is the emotion detected in the image.

This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.

Theory and Application of Database Systems A Hybrid Approach for Extending Ontology from Text He Wei.

Improving Subcategorization Acquisition using Word Sense Disambiguation Anna Korhonen and Judith Preiss University of Cambridge, Computer Laboratory 15.

A Language Independent Method for Question Classification COLING 2004.

1 Co-Training for Cross-Lingual Sentiment Classification Xiaojun Wan ( 萬小軍 ) Associate Professor, Peking University ACL 2009.

An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee

Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.

Learning Multilingual Subjective Language via Cross-Lingual Projections Mihalcea, Banea, and Wiebe ACL 2007 NLG Lab Seminar 4/11/2008.

Bootstrapping for Text Learning Tasks Ramya Nagarajan AIML Seminar March 6, 2001.

1 Sentence Extraction-based Presentation Summarization Techniques and Evaluation Metrics Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui.

A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.

1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )

Improving Named Entity Translation Combining Phonetic and Semantic Similarities Fei Huang, Stephan Vogel, Alex Waibel Language Technologies Institute School.

Multilingual Opinion Holder Identification Using Author and Authority Viewpoints Yohei Seki, Noriko Kando,Masaki Aono Toyohashi University of Technology.

1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.

Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.

Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.

Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,

Learning Subjective Nouns using Extraction Pattern Bootstrapping Ellen Riloff School of Computing University of Utah Janyce Wiebe, Theresa Wilson Computing.

Comparing Word Relatedness Measures Based on Google n-grams Aminul ISLAM, Evangelos MILIOS, Vlado KEŠELJ Faculty of Computer Science Dalhousie University,

Subjectivity Recognition on Word Senses via Semi-supervised Mincuts Fangzhong Su and Katja Markert School of Computing, University of Leeds Human Language.

1 Measuring the Semantic Similarity of Texts Author ： Courtney Corley and Rada Mihalcea Source ： ACL-2005 Reporter ： Yong-Xiang Chen.

FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.

7/2003EMNLP031 Learning Extraction Patterns for Subjective Expressions Ellen Riloff Janyce Wiebe University of Utah University of Pittsburgh.

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:

Extracting and Ranking Product Features in Opinion Documents Lei Zhang #, Bing Liu #, Suk Hwan Lim *, Eamonn O’Brien-Strain * # University of Illinois.

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

Word Sense and Subjectivity (Coling/ACL 2006) Janyce Wiebe Rada Mihalcea University of Pittsburgh University of North Texas Acknowledgements: This slide.

Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.

Semi-Supervised Recognition of Sarcastic Sentences in Twitter and Amazon -Smit Shilu.

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

Towards Semantic Affect Sensing in Sentences Alexander Osherenko.

Semi-automatic Building Method for a Multidimensional Affect Dictionary for a New Language Guillaume Pitel, Gregory Grefenstette LREC2008.

A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu.

Erasmus University Rotterdam

A method for WSD on Unrestricted Text

Sadov M. A. , NRU HSE, Moscow, Russia Kutuzov A. B

Extracting Why Text Segment from Web Based on Grammar-gram

Presentation transcript:

A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source: LREC 2008

Motivation Most available resource for subjectivity analysis are focused on English –Lexicon –Manually labeled corpora Previously proposed method –Rely on advanced language processing tool Syntactic parsers Information extraction

Goal Minimize the required resources Build a subjectivity lexicon by using –a small seed set of subjective words –an online dictionary –a small raw corpus –A bootstrapping process ranks new candidate words based on a similarity measure

Related Work Starts with seeds and uses PMI similarity method to grow seed list from Web data (Turney, 2002) –Annotated for polarity –Web data as very large corpus

Bootrapping Bootstrapping from manually selected seeds Depend on an online dictionary Expanded with related words at each iteration Filtered by using a measure of word similarity

Seed set 60 seeds as the seed set here –evenhandedly sampled from verbs, nouns, adjectives and adverbs Manually selected from 1.the XI-th grade curriculum for Romanian Language and Literature 2.translations of instances appearing in the OpinionFinder strong subjective lexicon Similar seed set can be easily obtained for any other language

Sample of initial seed set

Dictionary Collect all the open-class words appearing in its definition as new related words –include synonyms and antonyms if available –expand all the possible meanings of a seed word –filtered for incorrect meanings by using the measure of word similarity Use an online Romanian dictionary

Bootstrapping iterations Continue to the next iteration until a maximum number of iterations is reached Part-of-speech information is not maintained throughout the bootstrapping process

Filtering Calculating a measure of similarity between the original seeds and each of the possible candidates Two corpus-based measures of similarity 1.Pointwise Mutual Information 2.Latent Semantic Analysis (LSA) –both methods provided similar results –the LSA method was significantly faster and required less training data Candidates with an LSA score higher than 0.4 are considered to be expanded

LSA Evaluating the semantic similarity Automatically acquired in an unsupervised way from a corpus –Corpus: e.g., British National Corpus –Latent Semantic Analysis (LSA) yields a vector space model –Factor analysis and dimension reduction –Allows for a homogeneous representation of words, word sets, sentences and texts as an vector and then can compute a similarity measure

LSA Here the LSA module was trained on a half-million word Romanian corpus –Consisting of a manually translated version of the SemCor balanced corpus (Miller et al., 1993).

Variable filter The subjectivity lexicons consist of a ranked list of candidates –in decreasing order of similarity A variable filtering threshold can be used to select the most closely related candidates –used thresholds: 0.40, 0.50, 0.55, 0.60

Evaluation Subjectivity lexicon –LSA similarity threshold of 0.5 –five bootstrapping iterations –Resulted in a lexicon of 3,913 entries Used in a rule-based sentence-level subjectivity classifier –Subjective sentence ： contains three or more entries appear in the subjective lexicon Gold-standard corpus: –Consisting of 504 Romanian sentences –manually annotated for subjectivity –The agreement of the two annotators is 0.83% (κ = 0.67) –Resulting in a gold standard dataset with 272 (54%) subjective sentences and 232 (46%) objective sentences

Lexicon Acquisition Entries in the lexicon

Sentence-level subjectivity classification lexicon alone –rule-based subjectivity classifier with an overall F-measure of 61.69%

Compare our results with other rule-based methods Mihalcea et al., 2007 –subjectivity lexicon: translation of the English subjectivity lexicon –2,282 entries with a confidence label of strong, neutral or weak as flagged by the Opinion- Finder lexicon Bootstrapping method achieve significant improvement of 18.03% in the overall F- measure

Conclusion Quickly generate a large subjectivity lexicon Used to build rule-based sentence-level subjectivity classifiers This system proposes a possible path towards identifying subjectivity in low-resource languages Future work –variations of the bootstrapping mechanism –other similarity measures