Generating High-Coverage Semantic Orientation Lexicons From Overtly Marked Words and a Thesaurus † Institute for Advanced Computer Studies and CLIP lab.

Slides:



Advertisements
Similar presentations
Product Review Summarization Ly Duy Khang. Outline 1.Motivation 2.Problem statement 3.Related works 4.Baseline 5.Discussion.
Advertisements

GermanPolarityClues A Lexical Resource for German Sentiment Analysis
CL Research ACL Pattern Dictionary of English Prepositions (PDEP) Ken Litkowski CL Research 9208 Gue Road Damascus,
LINGUISTICA GENERALE E COMPUTAZIONALE SENTIMENT ANALYSIS.
Text Categorization Moshe Koppel Lecture 8: Bottom-Up Sentiment Analysis Some slides adapted from Theresa Wilson and others.
Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis Theresa Wilson Janyce Wiebe Paul Hoffmann University of Pittsburgh.
Sentiment Analysis An Overview of Concepts and Selected Techniques.
A Survey of Opinion Mining
Annotating Topics of Opinions Veselin Stoyanov Claire Cardie.
A Framework for Automated Corpus Generation for Semantic Sentiment Analysis Amna Asmi and Tanko Ishaya, Member, IAENG Proceedings of the World Congress.
807 - TEXT ANALYTICS Massimo Poesio Lecture 4: Sentiment analysis (aka Opinion Mining)
Sentiment Propagation via Implicature Constraints Intelligent Systems Program, Department of Computer Science University of Pittsburgh Lingjia Deng, Janyce.
A Novel Lexicalized HMM-based Learning Framework for Web Opinion Mining Wei Jin Department of Computer Science, North Dakota State University, USA Hung.
Sentiment and Polarity Extraction Arzucan Ozgur SI/EECS 767 January 15, 2010.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
Sentiment Lexicon Creation from Lexical Resources BIS 2011 Bas Heerschop Erasmus School of Economics Erasmus University Rotterdam
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
1 Noun Homograph Disambiguation Using Local Context in Large Text Corpora Marti A. Hearst Presented by: Heng Ji Mar. 29, 2004.
Predicting the Semantic Orientation of Adjectives
Learning Subjective Nouns using Extraction Pattern Bootstrapping Ellen Riloff, Janyce Wiebe, Theresa Wilson Presenter: Gabriel Nicolae.
Learning Subjective Adjectives from Corpora Janyce M. Wiebe Presenter: Gabriel Nicolae.
Article by: Feiyu Xu, Daniela Kurz, Jakub Piskorski, Sven Schmeier Article Summary by Mark Vickers.
A Pattern Matching Method for Finding Noun and Proper Noun Translations from Noisy Parallel Corpora Benjamin Arai Computer Science and Engineering Department.
A Holistic Lexicon-Based Approach to Opinion Mining
Mining and Summarizing Customer Reviews
Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews K. Dave et al, WWW 2003, citations Presented by Sarah.
Mining and Summarizing Customer Reviews Minqing Hu and Bing Liu University of Illinois SIGKDD 2004.
Evaluating the Contribution of EuroWordNet and Word Sense Disambiguation to Cross-Language Information Retrieval Paul Clough 1 and Mark Stevenson 2 Department.
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Lemmatization Tagging LELA /20 Lemmatization Basic form of annotation involving identification of underlying lemmas (lexemes) of the words in.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Carmen Banea, Rada Mihalcea University of North Texas A Bootstrapping Method for Building Subjectivity Lexicons for Languages.
A Holistic Lexicon-Based Approach to Opinion Mining Xiaowen Ding, Bing Liu and Philip Yu Department of Computer Science University of Illinois at Chicago.
2007. Software Engineering Laboratory, School of Computer Science S E Towards Answering Opinion Questions: Separating Facts from Opinions and Identifying.
Computing Word-Pair Antonymy *Saif Mohammad *Bonnie Dorr φ Graeme Hirst *Univ. of Maryland φ Univ. of Toronto EMNLP 2008.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
1 Emotion Classification Using Massive Examples Extracted from the Web Ryoko Tokuhisa, Kentaro Inui, Yuji Matsumoto Toyota Central R&D Labs/Nara Institute.
1 Statistical NLP: Lecture 9 Word Sense Disambiguation.
Exploiting Subjectivity Classification to Improve Information Extraction Ellen Riloff University of Utah Janyce Wiebe University of Pittsburgh William.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
PageRanking WordNet Synsets : An Application to Opinion Mining Andrea Esuli and Fabrizio Sebastiani Istituto di Scienza e Tecnologie dell ’ Informazione.
An Effective Word Sense Disambiguation Model Using Automatic Sense Tagging Based on Dictionary Information Yong-Gu Lee
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
14/12/2009ICON Dipankar Das and Sivaji Bandyopadhyay Department of Computer Science & Engineering Jadavpur University, Kolkata , India ICON.
Opinion Mining of Customer Feedback Data on the Web Presented By Dongjoo Lee, Intelligent Databases Systems Lab. 1 Dongjoo Lee School of Computer Science.
Deeper Sentiment Analysis Using Machine Translation Technology Kanauama Hiroshi, Nasukawa Tetsuya Tokyo Research Laboratory, IBM Japan Coling 2004.
Learning Multilingual Subjective Language via Cross-Lingual Projections Mihalcea, Banea, and Wiebe ACL 2007 NLG Lab Seminar 4/11/2008.
Bootstrapping for Text Learning Tasks Ramya Nagarajan AIML Seminar March 6, 2001.
1 Multi-Perspective Question Answering Using the OpQA Corpus (HLT/EMNLP 2005) Veselin Stoyanov Claire Cardie Janyce Wiebe Cornell University University.
Summarization Focusing on Polarity or Opinion Fragments in Blogs Yohei Seki Toyohashi University of Technology Visiting Scholar at Columbia University.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Your Sentiment Precedes You: Using an author’s historical tweets to predict sarcasm Anupam Khattri 2, Aditya Joshi 1,3, Pushpak Bhattacharyya 1, Mark James.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
1 Generating Comparative Summaries of Contradictory Opinions in Text (CIKM09’)Hyun Duk Kim, ChengXiang Zhai 2010/05/24 Yu-wen,Hsu.
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Opinion Observer: Analyzing and Comparing Opinions on the Web
Learning Subjective Nouns using Extraction Pattern Bootstrapping Ellen Riloff School of Computing University of Utah Janyce Wiebe, Theresa Wilson Computing.
SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining
7/2003EMNLP031 Learning Extraction Patterns for Subjective Expressions Ellen Riloff Janyce Wiebe University of Utah University of Pittsburgh.
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Sentiment and Opinion Sep13, 2012 Analysis of Social Media Seminar William Cohen.
Finding strong and weak opinion clauses Theresa Wilson, Janyce Wiebe, Rebecca Hwa University of Pittsburgh Just how mad are you? AAAI-2004.
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
Semi-automatic Building Method for a Multidimensional Affect Dictionary for a New Language Guillaume Pitel, Gregory Grefenstette LREC2008.
Natural Language Processing (NLP)
Statistical NLP: Lecture 9
Natural Language Processing (NLP)
Statistical NLP : Lecture 9 Word Sense Disambiguation
Natural Language Processing (NLP)
Presentation transcript:

Generating High-Coverage Semantic Orientation Lexicons From Overtly Marked Words and a Thesaurus † Institute for Advanced Computer Studies and CLIP lab ‡ Human-Computer Interaction Lab Department of Computer Science, University of Maryland. *Human Language Technology Center of Excellence. Saif Mohammad †, Cody Dunne ‡, and Bonnie Dorr † ∗

Evaluative sentences Sony’s new digital camera is fabulous. The characters in the movie are flawed. Creative solutions are valued. Singapore has an immaculate transportation system. Our waters have never been more contaminated. Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.2

Evaluative sentences Sony’s new digital camera is fabulous. The characters in the movie are flawed. Creative solutions are valued. Singapore has an immaculate transportation system. Our waters have never been more contaminated. Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.3

Semantic orientation Positive semantic orientation (SO) (or polarity) ◦ Term is often used to convey favorable sentiment or evaluation of the target. ◦ E.g.: excellent, happy, honest, … Negative semantic orientation ◦ Term is often used to convey unfavorable sentiment or evaluation of the target. ◦ E.g.: poor, sad, dishonest, … Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.4

Applications Automatic product recommendation systems (Tatemura, 2000; Terveen1 et al., 1997) Question answering (Somasundaran et al., 2007; Lita et al., 2005) Summarizing multiple view points and opinions (Seki et al., 2004; Mohammad et al., 2008a) Identifying flames (Spertus, 1997) Appropriate ad placement (Jin et al. 2007) Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.5

Manually created lexicons General Inquirer (GI) (Stone et al., 1966) ◦ ◦ has labels for only about 3,600 entries Pittsburgh subjectivity lexicon (PSL) (Wilson et al., 2005) ◦ ◦ draws from the General Inquirer and other sources ◦ has labels for only for about 8,000 words. Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.6

Automatically created lexicons Hatzivassiloglou and McKeown (1997) ◦ a supervised algorithm to determine the semantic orientation of adjectives. Turney and Littman lexicon (TLL) (2003) ◦ Exploit tendency to co-occur with a seed set ◦ Need very large corpora (100 billion words) Esuli and Sebastiani (2006) — SentiWordNet (SWN) ◦ Attach labels to WordNet synsets ◦ Use supervised classifiers ◦ Need significant manual annotation Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.7

Semantic oppositeness scale Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr. antonymousnot antonymous big–smallbig–large many antonym pairs have opposite semantic orientation (one positive, one negative) good–bad; beautiful–ugly; honest–dishonest 8

Detecting word-pair antonymy: Mohammad, Dorr, Hirst (2008) Use affix patterns to identify seed pairs of strong antonyms. Use a Roget-like thesaurus to identify near-synonyms of seed words. Mark pairs of words near-synonymous to seed pairs as contrasting. The degree of antonymy is proportional to their tendency to co-occur. Created a list of more than 3 million strongly antonymous word pairs. Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.9

Our approach Identify a seed set of positive and negative words: ◦ From edicts of marking theory Identify their synonyms: ◦ Use a Roget-like thesaurus Mark as negative: ◦ words synonymous with a negative seed Mark as positive: ◦ words synonymous to a positive seed Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.10

Step 1: Identify seed words From marking theory: ◦ Overtly marked words tend to be negative.  E.g., undo, unhappy, dishonest, immobile ◦ Their unmarked counterparts tend to be positive.  E.g., do, happy, honest, mobile Exceptions exist: ◦ impartial—partial, unbiased—biased, unstuck—stuck Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.11

Affix patterns Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.12 word1word2# of word pairsexample pairs XdisX382honest–dishonest XimX196possible–impossible XinX691consistent–inconsistent XmalX28adroit–maladroit XmisX146fortune–misfortune XnonX73sense–nonsense XunX844happy–unhappy XXless208gut–gutless lXlXillX25legal–illegal rXrXirX48responsible–irresponsible XlessXful51harmless–harmful Total2692

Step 2: Identify synonyms of seed words Take synonyms from a Roget-like thesaurus ◦ We used the Macquarie Thesaurus ◦ Has 98,000 word-types Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.13

Thesaurus categories All words classified into ~1000 categories ability absence accept accompanied action affect affirm agree allow approach ask assemble attack attitude awareness ability absence accept accompanied action affect affirm agree allow approach ask assemble attack attitude awareness be beautiful beings belief better big blood body breath calm care for careful cause certain change be beautiful beings belief better big blood body breath calm care for careful cause certain change choice clean clear collect colors comfort concern conflict connect continue control convex correct count courtesy choice clean clear collect colors comfort concern conflict connect continue control convex correct count courtesy … Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.14

Example category entry 369 HONESTY adj. paragraph honest above board authentic bona fide legit … adj. paragraph honest above board authentic bona fide legit … Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr. noun paragraph bona fides reliability soundness trueness trustiness … noun paragraph bona fides reliability soundness trueness trustiness … adj. paragraph reliable sound steadfast trustworthy trusty … adj. paragraph reliable sound steadfast trustworthy trusty … 15 noun paragraph honesty incorruptness integrity probity sincerity … noun paragraph honesty incorruptness integrity probity sincerity …

369 HONESTY Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.16 Words in each paragraph are near-synonyms. Step 2: Identify synonyms of seed words adj. paragraph honest above board authentic bona fide legit … adj. paragraph honest above board authentic bona fide legit … noun paragraph bona fides reliability soundness trueness trustiness … noun paragraph bona fides reliability soundness trueness trustiness … adj. paragraph reliable sound steadfast trustworthy trusty … adj. paragraph reliable sound steadfast trustworthy trusty … noun paragraph honesty incorruptness integrity probity sincerity … noun paragraph honesty incorruptness integrity probity sincerity …

adj. paragraph reliable sound steadfast trustworthy trusty … adj. paragraph reliable sound steadfast trustworthy trusty … adj. paragraph honest above board authentic bona fide legit … adj. paragraph honest above board authentic bona fide legit … Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.17 Seed pair: honest — dishonest (positive) (negative) Seed pair: reliable — unreliable (positive) (negative) Step 3: Mark as positive synonyms of positive seeds 369 HONESTY noun paragraph bona fides reliability soundness trueness trustiness … noun paragraph bona fides reliability soundness trueness trustiness … noun paragraph honesty incorruptness integrity probity sincerity … noun paragraph honesty incorruptness integrity probity sincerity …

370 DISHONESTY noun paragraph crookedness dishonesty fraudulence improbity trickery … noun paragraph crookedness dishonesty fraudulence improbity trickery … adj. paragraph crooked dishonest knavish shady unjust … adj. paragraph crooked dishonest knavish shady unjust … Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr. … … … … 18 Seed pair: honest — dishonest (positive) (negative) Step 4: Mark as negative synonyms of negative seeds

Majority voting All words in a paragraph assigned identical orientation. If multiple seeds in the same paragraph: ◦ simple voting determines orientation. Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr HONESTY noun paragraph honesty incorruptness integrity probity sincerity … noun paragraph honesty incorruptness integrity probity sincerity … Seed pairs: honesty — dishonesty (positive) (negative) + - corruptness — incorruptness (positive) (negative) + probity … — improbity (positive) (negative) + sincerity.. — insincerity (positive) (negative)

369 HONESTY noun paragraph honesty incorruptness integrity probity sincerity … noun paragraph honesty incorruptness integrity probity sincerity … Majority voting All words in a paragraph have identical orientation. If multiple seeds in the same paragraph: ◦ simple voting determines orientation. Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr Positive orientation has majority, so all words in the paragraph are marked positive.

Sense and word lexicons Macquarie Semantic Orientation Lexicon (MSOL) ◦ Assigns orientation to word—category combinations ◦ Categories are coarse word senses Most natural language text is not sense disambiguated We create word lexicons from MSOL and SentiWordNet ◦ By choosing for each word the orientation most common amongst its senses Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.21

Size of lexicons SentiWordNet (SWN) ◦ 56,200 entries (8 5.1 ◦ sitive and 14.9% negative) Affix seeds lexicon (ASL) ◦ 5,031 entries (47.3% positive and 52.7% negative) MSOL(ASL) ◦ 51,157 entries (66.8% positive and 33.2% negative) ◦ 3,643 multi-word expressions MSOL(ASL and GI) ◦ Uses both affix pairs and GI entries as seeds ◦ 76,400 entries (39.9% positive and 60.1% negative) ◦ Available for download: #SemanticOrientation Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.22

Intrinsic evaluation: The percentage of GI entries that match those of the automatically generated lexicons. Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.23 F-score

Extrinsic evaluation Gold standard of phrases manually annotated with semantic orientation: ◦ MPQA corpus (version 1.1) ◦ positive phrases (1726) and negative phrases (4485) A simple algorithm to determine the polarity of a phrase: ◦ If target phrase has a negative word, then the phrase is marked negative. ◦ If target phrase has no negative word and has at least one positive word, then it is marked positive. ◦ Otherwise, the classifier refrains from assigning a tag. Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.24 Even better accuracies: supervised classifiers and more sophisticated context features (Choi and Cardie, 2008).

Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.25 F-score Extrinsic evaluation: Performance of phrase polarity tagging. No semantic-orientation labeled data used.

Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.26 F-score Extrinsic evaluation: Performance of phrase polarity tagging. Using GI labels.

Orientation of thesaurus categories Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr. Red: negative;Blue: positive; Size of node: intensity; Edge: oppositeness 27

Polyanna Hypothesis People use positive expressions more frequently than negative expressions. (Boucher and Osgood, 1969; Kelly, 2000) Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr entries Percentage of entries

Polyanna Hypothesis People use positive expressions more frequently than negative expressions. (Boucher and Osgood, 1969; Kelly, 2000) Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr entries Percentage of entries entries

Summary Created a high-coverage semantic orientation lexicon: ◦ using only affix rules and a Roget-like thesaurus. ◦ no manually annotated semantic orientation labels required. The lexicon: ◦ has about twenty times the number of entries in GI. ◦ has entries for both single-words and common multi- word expressions. ◦ more useful in phrase-polarity annotation than SentiWordNet, GI, or the Turney lexicon. Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.30

Future work Creating even better semantic orientation lexicons by combining: ◦ our approach (affix rules and thesaurus) ◦ with the Turney–Littman 2003 method (co-occurrence statistics). Create orientation lexicons for resource-poor languages. ◦ use a bilingual dictionary ◦ use English thesaurus ◦ use affix rules from both (multiple) languages. Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.31

Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.32

Automatic approaches: sentiment analysis Those that rely on a lexical-semantic resource ◦ Most use WordNet ◦ Strapparava and Valitutti, 2004; Hu and Liu, 2004; Kamps et al., 2004; Takamura et al., 2005; Esuli and Sebastiani, 2006; Andreevskaia and Bergler, 2006; Kanayama and Nasukawa, 2006 Those that rely only on text corpora ◦ Hatzivassiloglou and McKeown, 1997; Turney and Littman, 2003; Yu and Hatzivassiloglou, 2003; Grefenstette et al., 2004 Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.33

Intrinsic evaluation: The percentage of GI entries that match those of the automatically generated lexicons. Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.34 F-score

Generating Semantic Orientation Lexicons. Mohammad, Dunne, Dorr.35 F-score Extrinsic evaluation: Performance of phrase polarity tagging. Using GI labels.