Entity Set Expansion in Opinion Documents Lei Zhang Bing Liu University of Illinois at Chicago.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Chapter 5: Introduction to Information Retrieval
Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.
1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.
Problem Semi supervised sarcasm identification using SASI
COMP423 Intelligent Agents. Recommender systems Two approaches – Collaborative Filtering Based on feedback from other users who have rated a similar set.
Computational Models of Discourse Analysis Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
TEMPLATE DESIGN © Identifying Noun Product Features that Imply Opinions Lei Zhang Bing Liu Department of Computer Science,
A Self Learning Universal Concept Spotter By Tomek Strzalkowski and Jin Wang Original slides by Iman Sen Edited by Ralph Grishman.
Sentiment Analysis An Overview of Concepts and Selected Techniques.
Product Feature Discovery and Ranking for Sentiment Analysis from Online Reviews. __________________________________________________________________________________________________.
Bag-of-Words Methods for Text Mining CSCI-GA.2590 – Lecture 2A
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06.
CS4705.  Idea: ‘extract’ or tag particular types of information from arbitrary text or transcribed speech.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
KnowItNow: Fast, Scalable Information Extraction from the Web Michael J. Cafarella, Doug Downey, Stephen Soderland, Oren Etzioni.
Iterative Set Expansion of Named Entities using the Web Richard C. Wang and William W. Cohen Language Technologies Institute Carnegie Mellon University.
Bing LiuCS Department, UIC1 Learning from Positive and Unlabeled Examples Bing Liu Department of Computer Science University of Illinois at Chicago Joint.
Maximum Entropy Model & Generalized Iterative Scaling Arindam Bose CS 621 – Artificial Intelligence 27 th August, 2007.
Multimedia Data Mining Arvind Balasubramanian Multimedia Lab (ECSS 4.416) The University of Texas at Dallas.
Chapter 5: Information Retrieval and Web Search
Multimedia Data Mining Arvind Balasubramanian Multimedia Lab The University of Texas at Dallas.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Aspect and Entity Extraction from Opinion Documents Lei Zhang Advisor : Bing Liu.
Mining and Summarizing Customer Reviews
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Query Rewriting Using Monolingual Statistical Machine Translation Stefan Riezler Yi Liu Google 2010 Association for Computational Linguistics.
Opinion Mining : A Multifaceted Problem Lei Zhang University of Illinois at Chicago Some slides are based on Prof. Bing Liu’s presentation.
1 Entity Discovery and Assignment for Opinion Mining Applications (ACM KDD 09’) Xiaowen Ding, Bing Liu, Lei Zhang Date: 09/01/09 Speaker: Hsu, Yu-Wen Advisor:
Text Classification, Active/Interactive learning.
Bayesian Sets Zoubin Ghahramani and Kathertine A. Heller NIPS 2005 Presented by Qi An Mar. 17 th, 2006.
1 A Unified Relevance Model for Opinion Retrieval (CIKM 09’) Xuanjing Huang, W. Bruce Croft Date: 2010/02/08 Speaker: Yu-Wen, Hsu.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Universit at Dortmund, LS VIII
A Weakly-Supervised Approach to Argumentative Zoning of Scientific Documents Yufan Guo Anna Korhonen Thierry Poibeau 1 Review By: Pranjal Singh Paper.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Chapter 6: Information Retrieval and Web Search
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Presenter: Shanshan Lu 03/04/2010
From Social Bookmarking to Social Summarization: An Experiment in Community-Based Summary Generation Oisin Boydell, Barry Smyth Adaptive Information Cluster,
Mining Topic-Specific Concepts and Definitions on the Web Bing Liu, etc KDD03 CS591CXZ CS591CXZ Web mining: Lexical relationship mining.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Extracting and Ranking Product Features in Opinion Documents Lei Zhang #, Bing Liu #, Suk Hwan Lim *, Eamonn O’Brien-Strain * # University of Illinois.
A Generation Model to Unify Topic Relevance and Lexicon-based Sentiment for Opinion Retrieval Min Zhang, Xinyao Ye Tsinghua University SIGIR
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
Feature Assignment LBSC 878 February 22, 1999 Douglas W. Oard and Dagobert Soergel.
NTNU Speech Lab 1 Topic Themes for Multi-Document Summarization Sanda Harabagiu and Finley Lacatusu Language Computer Corporation Presented by Yi-Ting.
Semi-Supervised Recognition of Sarcastic Sentences in Twitter and Amazon -Smit Shilu.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.
A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu.
Sentiment analysis algorithms and applications: A survey
Presentation transcript:

Entity Set Expansion in Opinion Documents Lei Zhang Bing Liu University of Illinois at Chicago

Introduction to opinion mining Opinion Mining Computational study of opinions, sentiments expressed in text Why opinion mining now? mainly because of the Web, we can get huge volumes of opinionated text

Why opinion mining is important Whenever we need to make a decision, we would like to hear other’s advice. In the past.  Individual : Friends or family.  Business : Surveys and consultants. Word of mouth on the Web People can express their opinions in reviews, forum discussions, blogs…

An example for opinion mining application

What is an entity in opinion documents An entity can be a product, service, person, organization or event in opinion document. Basically, opinion mining is to extract opinions expressed on entities and their attributes. “I brought a Sony camera yesterday, and its picture quality is great.” picture quality is the product attribute; Sony is the entity.

Why we need entity extraction Without knowing the entity, the piece of opinion has little value. Companies want to know the competitors in the market. This is the first step to understand the competitive landscape from opinion documents.

Related work Named entity recognition (NER) Aims to identity entities such as names of persons, organizations and locations in natural language text. Our problem is similar to NER problem, but with some differences. 1. Fine grained entity classes (products, service) rather than coarse grained entity classes (people, location, organization ) 2. Only want a specific type: e.g. a particular type of drug names. 3. Neologism : e.g. “Sammy” (Sony), “SE” (Sony- Ericsson) 4. Feature sparseness (lack of contextual patterns) 5. Data noise (over-capitalization, under-capitalization)

NER methods Supervised learning methods The current dominant technique for addressing the NER problem Hidden Markov Models (HMM) Maximum Entropy Models (ME) Support Vector Machines (SVM) Conditional Random Field (CRF) Shortcomings: Rely on large sets of labeled examples. Labeling is labor-intensive and time-consuming.

NER methods Unsupervised learning methods Mainly clustering. Gathering named entities from clustered groups based on the similarity of context. The techniques rely on lexical resources (e.g., WordNet), on lexical patterns and on statistics computed on a large unannotated corpus. Shortcomings: low precision and recall for the result

NER methods Semi-supervised learning methods Show promise for identifying and labeling entities. Starting with a set of seed entities, semi-supervised methods use either class specific patterns to populate an entity class or distributional similarity to find terms similar to the seeds. Specific methods: Bootstrapping Co-traning Distributional similarity

Our problem is set expansion problem To find competing entities, the extracted entities must be relevant, i.e., they must be of the same class/type as the user provided entities. The user can only provide a few names because there are so many different brands and models. Our problem is actually a set expansion problem, which expands a set of given seed entities.

Set expansion problem Given a set Q of seed entities of a particular class C, and a set D of candidate entities, we wish to determine which of the entities in D belong to C. That is, we “grow” the class C based on the set of seed examples Q. This is a classification problem. However, in practice, the problem is often solved as a ranking problem.

Distributional similarity Distributional similarity is a classical method for set expansion problem. ( Basic idea: the words with similar meanings tend to appear in similar contexts) It compares the similarity of the word distribution of the surround words of a candidate entity and the seed entities, and then ranking the candidate entities based on the similarity values. Our result shows this approach is inaccurate.

Bayesian sets Based on Bayesian inference and was designed specifically for the set expansion problem. It learns from a seeds set (i.e., a positive set P) and an unlabeled candidate set U.

Bayesian sets Given and, we aim to rank the elements of by how well they would “fit into” a set which includes (query) Define a score for each: From Bayes rule, the score can be re-written as:

Bayesian sets Intuitively, the score compares the probability that e and were generated by the same model with the same unknown parameters θ, to the probability that e and came from models with different parameters θ and θ’.

Bayesian sets Compute following equations:

Bayesian sets The final score can be computed as: Where α and β are hyperparameters obtained from data The top ranked entities should be highly related to seed set Q according to Bayesian sets algorithm

Direct application of Bayesian sets For seeds and candidate entities, the feature vector is created as follows: (1) A set of features is first designed to represent each entity. (2) For each entity, identify all the sentences in the corpus that contain the entity. Based on the context, produce single feature vector to represent the entity. But it produces poor result. (Reason: First, Bayesian sets uses binary feature, multiple occurrences of an entity in the corpus, which give rich contextual information, is not fully exploited; Second, the number of seeds is very small, the result is not reliable)

Improving Bayesian sets We propose a more sophisticated method to use Bayesian Sets. It consists of following two steps. (1) Feature identification: A set of features to represent each entity is designed. (2) Data generation: (a) Multiple feature vector for an entity (b) Feature reweighting (c) Candidate entity ranking

How to get candidate entities Part of Speech (POS) tags – NNP, NNPS and CD as entity indicators; A phrase (could be one word) with a sequence of NNP, NNPS and CD POS tags as one candidate entity. (e.g. “Nokia/NNP N97/CD” as a single entity “Nokia N97”)

How to identity features Like a typical learning algorithm, one has to design a set of features for learning. Our feature set consists of two subsets: Entity word features (EWF) : characterize the words representing entity themselves. This set of features is completely domain independent. (e.g. “Sony”, “IBM”) Surrounding word features (SWF) : the surrounding words of a candidate entity. (e.g. “ I bought the Sony tv yesterday”)

Data generation Because for each candidate entity, there are several feature vector is generated. It causes the problem of feature sparseness and entity ranking. We proposed two techniques to deal with the problems: (1) Feature reweighting (2) Candidate entity ranking

Feature reweighting Recall the score for an entity from Bayesian set N is the number of items in the seed set. q ij is the feature j of seed entity q i m j is the mean of feature j of all possible entities. k is a scaling factor (which we use 1). In order to make a positive contribution to the final score of entity e, w j must be greater than zero.

Feature reweighting It means if feature j is effective (w j > 0), the seed data mean must be greater than the candidate data mean on feature j. Due to the idiosyncrasy of the data, There are many high-quality features, whose seed data mean may be even less than the candidate data mean.

Feature reweighting For example : In drug data set: “prescribe” is a very good entity feature. “Prescribe EN/NNP” (EN represents an entity, NNP is its POS tag) strongly suggests that EN is a drug. However, the problem is that the mean of this feature in the seed set is which is less than its candidate set mean It means it is worse than no feature at all !

Feature reweighting In order to fully utilize all features, we change original m j to by multiplying a scaling factor to force all feature weights w j > 0: The idea is that we lower the candidate data mean intentionally so that all the features found from the seed data can be utilized. we let to be greater than N for all features j. to determine t.

Identifying high-quality Features features A and B, same feature frequency => same feature weight In some cases: for feature A, all feature count may come from only one entity in the seed set; for feature B, the feature counts are from different different entities (e.g. Bought + “ Nokia”, “Motorola” “SE”) feature B is a better feature than feature A shared by or associated with more entities.

Identifying high-quality Features r is used to represent feature quality for feature j. h is the number of unique entities that have j th feature. T is the total number of entities in the seed set.

Candidate entity ranking Each unique candidate entity may generate multiple feature vectors It is highly desirable to rank those correct and frequent entities at the top Ranking candidate entities M d is the median value for all feature vector scores of candidate entity, n is the candidate entity’s frequency fs(d) is the final score for the candidate entity

Additional technique: enlarge the seed sets Enlarging the seed set using some high precision syntactic coordination patterns. EN [or | and] EN from EN to EN neither EN nor EN prefer EN to EN [such as | especially | including] EN (, EN) * [or | and] EN EN is the entity name. e.g “ Nokia and Samsung do not produce smart phones,”

Additional technique: bootstrapping Bayesian set This strategy again tries to find more seeds, but using Bayesian Sets itself. run the algorithm iteratively. At the end of each iteration, pick up k top ranked entities ( k = 5 in our experiments). The iteration ends if no new entity is added to the current seed list.

The whole algorithm

Experiment result

Similar web-based systems Google Sets Boo! Wa!

Experiment result

Thank you