Identifying Comparative Sentences in Text Documents

Slides:



Advertisements
Similar presentations
Language and Grammar Unit
Advertisements

School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING PoS-Tagging theory and terminology COMP3310 Natural Language Processing.
Extraction and Visualisation of Emotion from News Articles Eva Hanser, Paul Mc Kevitt School of Computing & Intelligent Systems Faculty of Computing &
CPSC 422, Lecture 16Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 16 Feb, 11, 2015.
TEMPLATE DESIGN © Identifying Noun Product Features that Imply Opinions Lei Zhang Bing Liu Department of Computer Science,
Articles, Determiners, and Quantifiers
Extract from various presentations: Bing Liu, Aditya Joshi, Aster Data … Sentiment Analysis January 2012.
Sentiment Analysis An Overview of Concepts and Selected Techniques.
CIS630 Spring 2013 Lecture 2 Affect analysis in text and speech.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining and Summarizing Customer Reviews Advisor : Dr.
Product Review Summarization from a Deeper Perspective Duy Khang Ly, Kazunari Sugiyama, Ziheng Lin, Min-Yen Kan National University of Singapore.
Predicting Text Quality for Scientific Articles AAAI/SIGART-11 Doctoral Consortium Annie Louis : Louis A. and Nenkova A Automatically.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
Language, Mind, and Brain by Ewa Dabrowska Chapter 2: Language processing: speed and flexibility.
Learning Subjective Adjectives from Corpora Janyce M. Wiebe Presenter: Gabriel Nicolae.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dörre, Peter Gerstl, and Roland Seiffert Presented By: Jake Happs,
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
CS224N Interactive Session Competitive Grammar Writing Chris Manning Sida, Rush, Ankur, Frank, Kai Sheng.
Mining and Searching Opinions in User-Generated Contents Bing Liu Department of Computer Science University of Illinois at Chicago.
A Holistic Lexicon-Based Approach to Opinion Mining
1 Extracting Product Feature Assessments from Reviews Ana-Maria Popescu Oren Etzioni
Mining and Summarizing Customer Reviews
Mining and Summarizing Customer Reviews Minqing Hu and Bing Liu University of Illinois SIGKDD 2004.
WEB FORUM MINING BASED ON USER SATISFACTION PAGE 1 WEB FORUM MINING BASED ON USER SATISFACTION By: Suresh Pokharel Information and Communications Technologies.
TagHelper & SIDE Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Sudeshna Sarkar IIT Kharagpur
Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification on Reviews Peter D. Turney Institute for Information Technology National.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
A Holistic Lexicon-Based Approach to Opinion Mining Xiaowen Ding, Bing Liu and Philip Yu Department of Computer Science University of Illinois at Chicago.
1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:
Natural Language Processing Introduction. 2 Natural Language Processing We’re going to study what goes into getting computers to perform useful and interesting.
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
WSDM’08 Xiaowen Ding 、 Bing Liu 、 Philip S. Yu Department of Computer Science University of Illinois at Chicago Conference on Web Search and Data Mining.
English Review for Final These are the chapters to review. In Textbook: Chapter 1 Nouns Chapter 2 Pronouns Chapter 3 Adjectives Chapter 4 Verbs Chapter.
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
Collecting Evaluative Expression for Opinion Extraction Nozomi Kobayasi, Kentaro Inui, Yuji Matsumoto (Nara Institute) Kenji Tateishi, Toshikazu Fukushima.
Opinion Mining of Customer Feedback Data on the Web Presented By Dongjoo Lee, Intelligent Databases Systems Lab. 1 Dongjoo Lee School of Computer Science.
Opinion Holders in Opinion Text from Online Newspapers Youngho Kim, Yuchul Jung and Sung-Hyon Myaeng Reporter: Chia-Ying Lee Advisor: Prof. Hsin-Hsi Chen.
Entity Set Expansion in Opinion Documents Lei Zhang Bing Liu University of Illinois at Chicago.
1 Sentence Extraction-based Presentation Summarization Techniques and Evaluation Metrics Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui.
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
Topic 3: predicates Introduction to Semantics. Definition Any word which can function as the predicator of a sentence. Predicators The parts which are.
English Review for Final These are the chapters to review. In Textbook: Chapter 9 Nouns Chapter 10 Pronouns Chapter 11 Adjectives Chapter 12 Verbs Chapter.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
A Novel Visualization Model for Web Search Results Nguyen T, and Zhang J IEEE Transactions on Visualization and Computer Graphics PAWS Meeting Presented.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Opinion Observer: Analyzing and Comparing Opinions on the Web
IELTS Intensive Writing part two. IELTS Writing Two parts of ielts writing Part one writing about a Graph, chart, diagram Part two is an essay.
Extracting and Ranking Product Features in Opinion Documents Lei Zhang #, Bing Liu #, Suk Hwan Lim *, Eamonn O’Brien-Strain * # University of Illinois.
Machine Learning in Practice Lecture 13 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Basic Syntactic Structures of English CSCI-GA.2590 – Lecture 2B Ralph Grishman NYU.
Using Semantic Relations to Improve Information Retrieval
Opinion Observer: Analyzing and Comparing Opinions on the Web WWW 2005, May 10-14, 2005, Chiba, Japan. Bing Liu, Minqing Hu, Junsheng Cheng.
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
COMP423 Summary Information retrieval and Web search  Vecter space model  Tf-idf  Cosine similarity  Evaluation: precision, recall  PageRank 1.
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Data Mining – Introduction (contd…) Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Different types of Grammer
Introduction to Machine Learning and Text Mining
Erasmus University Rotterdam
Memory Standardization
University of Computer Studies, Mandalay
Automatic Detection of Causal Relations for Question Answering
Text Mining & Natural Language Processing
Natural Language Processing
PARTS OF SPEECH.
LING/C SC/PSYC 438/538 Lecture 3 Sandiway Fong.
Presentation transcript:

Identifying Comparative Sentences in Text Documents Nitin Jindal and Bing Liu University of Illinois SIGIR 2006

Introduction Comparisons are one of the most convincing ways of evaluation. Much of such info is available on the Web (customer reviews), forum discussions, and blogs. Useful for product manufacturers and potential customers (to make purchasing decisions).

Comparisons vs. Opinions Comparisons can be both objective or subjective. Comparative sentences have different language constructs from typical opinion sentences. Comparative sentences may contain some indicators. Car X is much better than Car Y Car X is two feet longer than Car Y

Related Work Linguistics: based on grammars (syntax and semantics) and logic (gradability), which is more for human consumption than for automatic identification. Opinion tasks: opinion extraction and classification problem, which is quite different from this comparison identification.

Comparatives (Linguistic) Comparatives are used to express explicit orderings between objects with respect to the degree or amount to which they possess some gradable property. John is taller than he was => John is tall to degree d

Comparatives (Linguistic) Two broad types: Metalinguistic Comparatives: compare properties of one entity. Ronaldo is angrier than upset. Propositional Comparatives: compare between two propositions. Three subcategories:

Comparatives (Propositional) Nominal Comparatives: (two sets of entities) Paul ate more grapes than bananas. Adjectival Comparatives: (than, as good as) Ford is cheaper than Volvo. Adverbial Comparatives: (occur after a verb phrase) Tom ate more quickly than Jane.

Superlatives Adjectival Superlatives: John is the tallest person. Adverbial Superlatives: Jill did her homework most frequently. Equality: conjunctions like and, or, … John and Sue, both like sushi.

POS involved NN: Noun NNP: Proper Noun VBZ: Verb, present tense, 3rd person singular JJ: Adjective RB: Adverb JJR Adjective, comparatives JJS: Adjective, superlative RBR: Adverb, comparative RBS: Adverb, superlative

Limitations of linguistic classification. Non-comparatives with comparative words: many non-comparatives contain comparative words. In the context of speed, faster means better. John has to try his best to win this game. Limited coverage: many comparatives contain no comparative words. In market capital, Intel is way ahead of Amd. Nokia Samsung, both cell phones perform badly on heat dissipation index. The M7500 earned a World bench score of 85, whereas Asus A3V posted a mark of 89.

Enhancements First limitation: machine learning methods to distinguish comparatives and non-comparatives. Second limitation: User preferences: I prefer Intel to Amd = Intel is better than Amd Implicit comparatives: Camera X has 2 MP, whereas camera Y has 5 MP.

Types of Comparatives Non-Equal Gradable: greater or less than type, including user preferences. Equative (Gradable): equal to type Superlative (Gradable): greater of less than all others type Non-Gradable: A is similar to B; A has feature F1 while B has F2; A has feature F but B doesn’t

Tasks Identifying comparative sentences from a given text data set. Extracting comparative relations from sentences. (Mining comparative sentences and relations, AAAI 2006)

Class Sequential Rules with Multiple Minimum Supports For sequential pattern mining, patterns to the left and class to the right. Select patterns: keywords – POS (JJR, RBR, JJS, RBS) + Words (favor, prefer, win beat, but…) + Phrases (number one, up against) The performance of only using keywords are P=32%, R=94%.

Support and Confidence Using the minimum support of 20% and minimum confidence of 40%, one of the discovered CSRs is:

Building the Sequence DB this/DT camera/NN has/VBZ significantly/RB more/JJR noise/NN at/IN iso/NN 100/CD than/IN the/DT nikon/NN 4500/CD {NN}{VBZ}{RB}{moreJJR}{NN}{IN}{NN} -> comparative Sequences which exceeds 60% confidence threshold become rules. Minimum support = 10%. 13 Manual rules with conjunctions as whereas/IN, but/CC, however/RB, while/IN, though/IN, although/IN, etc..

Classification Learning Machine learning methods: Feature Set = {X | X is the sequential pattern in CSR X → y} ∪ {Z | Z is the pattern in a manual rule Z → y}

Data Preparation Consumer reviews on products such as digital cameras, DBD players, MP3 players and cellular phones. Forum discussions on topics such as Intel vs. AMD, Coke vs. Pepsi, and Microsoft vs. Google. News articles on topics such as automobiles, ipods, and soccer vs. football.

Number of Sentences in Data Sets

Experimental Results (1)

Experimental Results (2) Review: R low P high -> short sentences, hard to find patterns Articles and Forums: R high P low -> long sentences and find patterns too easily or find too many patterns.

Conclusion and Future Work Identifying comparative sentences. Analyzing different types of comparative sentences. Studying how to automatically classify subjective and objective comparisons.