Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining and Summarizing Customer Reviews Advisor : Dr.

Similar presentations


Presentation on theme: "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining and Summarizing Customer Reviews Advisor : Dr."— Presentation transcript:

1 Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining and Summarizing Customer Reviews Advisor : Dr. Hsu Reporter : Chun Kai Chen Author : Minqing Hu and Bing Liu 2004 SIGKDD

2 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outline Motivation Objective Introduction Feature-based opinion summarization Experimental Evaluation Conclusions Personal Opinion

3 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 3 Motivation  As e-commerce is becoming more and more popular, the number of customer reviews that a product receives grows rapidly ─ difficult for a potential customer to read them to make an informed decision on whether to purchase the product ─ difficult for the manufacturer of the product to keep track and to manage customer opinions

4 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 4 Objective  In this research, we aim to mine and to summarize all the customer reviews of a product ─ only mine the features of the product on which the customers have expressed their opinions and whether the opinions are positive or negative ─ do not summarize the reviews

5 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 5 Introduction(1/2)  Given a set of customer reviews of a particular product, the task involves three subtasks: ─ identifying features of the product customers have expressed their opinions on (called product features) ─ for each feature, identifying review sentences give positive or negative opinions ─ producing a summary using the discovered information

6 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 6 Introduction(2/2)  Our task is different from traditional text summarization [15, 39, 36] in a number of ways ─ First a summary in our case is structured rather than another (but shorter) free text document as produced by most text summarization systems ─ Second only interested in features of the product do not summarize the reviews  by selecting or rewriting a subset of the original sentences from the reviews to capture their main points as in traditional text summarization

7 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 7 Feature-based opinion summarization 形容詞 “The pictures are very clear.” WordNet ( 同義 / 反義 ) 形容詞 Apriori algorithm (only N) Compactness pruning Redundancy pruning 3.3 3.1 3.2 3.4 3.7 3.5 3.6 positive orientation(e.g., beautiful, awesome) negative orientation (e.g., disappointing) I am absolutely

8 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 8 Part-of-Speech Tagging (POS)  Product features are usually nouns or noun phrases in review sentences ─ used the NLProcessor linguistic parser [31] to parse each review to split text into sentences and to produce the part-of-speech tag for each word  A transaction file is then created for the generation of frequent features in the next step ─ includes only the identified nouns and noun phrases of the sentence ─ Some pre-processing of words is also performed which includes removal of stopwords, stemming and fuzzy matching

9 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 9 Frequent Features Identification  Due to the difficulty of natural language understanding, some types of sentences are hard to deal with “The pictures are very clear.” “While light, it will not easily fit in pockets.” (size) ─ we focus on finding features that appear explicitly as nouns or noun phrases in the reviews ─ we focus on finding frequent features, (finding infrequent features will be discussed later)  We run the association miner CBA [26] ─ based on the Apriori algorithm in [1] on the transaction set of noun/noun phrases produced in the previous step

10 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 10 Frequent Features Identification -Feature Pruning  Compactness pruning ─ Focus on removing feature that contain at lease two words ─ Association rule doesn ’ t consider the position of the items ─ aims to prune those candidate whose words do not appear together in a specific order  Redundancy pruning ─ Focus on removing features that contain single word ─ Use p-support (pure support) to describe redundant features ─ For instance, life by itself is not a useful feature while battery life is a meaningful feature phrase.

11 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 11 Opinion Words Extraction  We now identify opinion words ─ people use to express a positive or negative opinion ─ primarily used to express subjective opinions ─ this paper uses adjectives as opinion words ─ For example “The strap is horrible and gets in the way of parts of the camera you need access to.” horrible is the effective opinion of strap ─ Effective opinions will be useful when we predict the orientation of opinion sentences

12 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 12 Orientation Identification for Opinion Words(1/2)  For each opinion word ─ identify its semantic orientation (by training) ─ be used to predict the semantic orientation of each opinion sentence  Words that encode a orientation state ─ a positive orientation(e.g., beautiful, awesome) ─ a negative orientation (e.g., disappointing) ─ no orientation (e.g., external, digital) [17].  In this work ─ we are interested in only positive and negative orientations

13 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 13 Orientation Identification for Opinion Words(2/2)  Unfortunately ─ dictionaries and similar sources do not include semantic orientation information for each word  In this research ─ utilizing the adjective synonym set and antonym set in WordNet [29] to predict the semantic orientations of adjectives ─ WordNet cannot recognize they are discarded as they may not be valid words ─ cannot find orientations they will also be removed from the opinion words list

14 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 14 Infrequent Feature Identification  There are some features that only a small number of people talked about ─ association mining is unable to identify such features  How to extract these infrequent features ─ use the nearest noun/noun phrase ─ could also find nouns/noun phrases that are irrelevant to the given product

15 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 15 Predicting the Orientations of Opinion Sentences  In general ─ we use the dominant orientation of the opinion words in the sentence to determine the orientation of the sentence.  In the case where there is the same number of positive and negative opinion words ─ case 1 The user likes or dislikes most or all the features in one sentence “overall this is a good camera with a really good picture clarity & an exceptional close-up shooting capability.” ─ case 2 The user likes or dislikes most of the features in one sentence, but there is an equal number of positive and negative opinion words “the auto and manual along with movie modes are very easy to use, but the software is not intuitive.” ─ All the other cases

16 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 16 Summary Generation  After all the previous steps, we are ready to generate the final feature-based review summary ─ A count is computed to show how many reviews give positive/negative opinions to the feature ─ ranked according to the frequency

17 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 17 Experimental Evaluation  We now evaluate FBS from three perspectives ─ The effectiveness of feature extraction ─ The effectiveness of opinion sentence extraction ─ The accuracy of orientation prediction of opinion sentences  We have conducted experiments on the customer reviews of five electronics products ─ 2 digital cameras, 1 DVD player, 1 mp3 player, and 1 cellular phone.  The two websites where we collected the reviews ─ Amazon.com and C|net.com.

18 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 18  To evaluate the discovered features ─ a human tagger manually read all the reviews and produced a manual feature list for each product ─ Columns 3-8 demonstrate clearly the effectiveness of these two pruning techniques ─ Columns 9 and 10 after infrequent feature identification is done. The recall is improved dramatically

19 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 19

20 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 20 Conclusion  In this paper ─ proposed a set of techniques for mining and summarizing product reviews based on data mining and natural language processing methods  The objective ─ to provide a feature-based summary of a large number of customer reviews of a product sold online  Experimental results ─ indicate that the proposed techniques are very promising in performing their tasks

21 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 21 Personal Opinion  Strength ─ proposed a new valid method of mining customer reviews  Weakness ─ feature must be explicitly mentioned ─ opinion words must be adjectives  Application  Future Work


Download ppt "Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining and Summarizing Customer Reviews Advisor : Dr."

Similar presentations


Ads by Google