Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh.

Similar presentations


Presentation on theme: "1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh."— Presentation transcript:

1 1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh

2 2 Outline Introduction Important concepts Classification Methods - Lexicon-based Methods - Corpus-based Methods Two phases of the proposed model - Basic SELC Model - SELC Model Experiment Discussion and Error Analysis Conclusion and future work

3 3 Introduction People evaluate the products with feelings about the product. Assigning the positive and negative sentiment values to product reviews is referred to as sentiment classification. For the purpose of self-supervised model, both Lexicon-based method and Corpus- based are used in the model.

4 4 Features for proposed model Domain-independence. Exploit the complementarities between lexicon-based and corpus-based to improve the performance. No need to manually annotate training data for corpus-based method.

5 5 Important concepts Indirect expression of negative sentiment - Conveying negative feeling with positive words (Negation + Positive sentiment). Ex: 不 ” 好 ”[bu-hao] = not ‘good’ The frequency of indirect expression of negative sentiment is much higher than indirect expression of positive sentiment (about 6:1). In negative document, 63% positive words are used to express negative sentiment.

6 6 Important concepts (Cont.) Taking a lexical item as a process unit. A Lexical item is a sequence of Chinese characters, excluding punctuation marks and negation words. EX: … , XXXXX 。 …

7 7 Classification Methods- Lexicon-based Method Steps: 1. Given a set of sentiment vocabulary, each item in is assigned a sentiment score. And a set of training data, including many reviews 2., checking whether contains items in, then classify the review according to the summation of sentiment scores. 3. Taking the lexical items in as new vocabulary set. 4. 5. Using the new to repeat the steps 2 to 5

8 8 Classification Methods- Lexicon-based Method (Cont.) Features - Unsupervised. - Domain-independence. - Using a general sentimental item list. - Positive classification bias, that is, higher precision on negative reviews.

9 9 Classification Methods Corpus-based Method Steps: 1. Given two sets P and N, representing Positive and Negative review set respectively. 2. A general sentiment dictionary is used to be the feature set 3. Input a set R, including many new reviews. 4. According to the features of those reviews in P and N to decide the class for those reviews in R.

10 10 Classification Methods Corpus-based Method (Cont.) Features: - Supervised learning gets higher performance then unsupervised learning - Domain-dependence. - Negative-classification bias, that is, higher precision on positive review.

11 11 Two phases of the proposed model Phase 1 Phase1: Basic SELC Model

12 12 Two phases of the proposed model Phase 1 (Cont.) Initiation Step - An sentiment vocabulary V, including a list of items which is initialized by a sentiment dictionary. - Each item in the sentiment dictionary is assigned with a sentiment score. positive word get +1, negative word get -1.

13 13 Two phases of the proposed model Phase 1 (Cont.) Step1:Computation of Review Sentiment score - Each review is divided into zones by punctuation marks. - For a zone, checking whether contains items in V. If exist, taking those items as effective items. - Each effective item i of a zone is score by: where, : length of the item : sentiment score of effective item : negation check coefficient with default value 1 * Longer effective item including more feelings about the product.

14 14 Two phases of the proposed model Phase 1 (Cont.) Step1:Computation of Review Sentiment score (Cont.) - Sum up all the score of the effective items of a zone to get the ZoneScore. - If the ZoneScore is greater than zero, the zone is classified as positive, if smaller than zero, the zone is classified as negative. - Sum up all ZoneScore to get ReviewScore. - If the ReviewScore is greater than zero, the review is classified as positive, if smaller than zero, the review is classified as negative.

15 15 Two phases of the proposed model Phase 1 (Cont.) Step2: Review Sentiment Classification with Ratio Control Cpositve: Number of reviews with a positive ReviewScore Cnegative: Number of reviews with a negative ReviewScore

16 16 Two phases of the proposed model Phase 1 (Cont.) Step3: Iterative Retraining - Taking the lexical items that occur at least twice in those classified reviews as candidate items. -, stand for frequency of candidate item in positive/negative reviews respectively. - if an candidate item in a positive review is preceded by a negation, the of the candidate item is reduced 1, vice versa.

17 17 Two phases of the proposed model Phase 1 (Cont.) Step3: Iterative Retraining (Cont.) - The difference between each candidate is measured by: - Sentiment score of each item in V recalculated by: - Threshold = 1.

18 18 Two phases of the proposed model Phase 1 (Cont.) Step4: Iteration Control - The iterative process completes, if no more difference in the classification result between two iterations. - As long as the iteration process completes, go to the Uncertain Set Process Step.

19 19 Two phases of the proposed model Phase 1 (Cont.) Uncertain Set Processing Step : Number of positive zones in a review : Number of negative zones in a review

20 20 Two phases of the proposed model Phase 2 Phase2: SELC Model

21 21 Two phases of the proposed model Phase 2 (Cont.) Corpus-based Supervised Method - Choose SVM as the machine-learning method. - Using a general sentiment dictionary as feature set. - TF-IDF is used to compute the weight.

22 22 Two phases of the proposed model Phase 2 (Cont.) Integration Process - Designed to process reviews in the Uncertain Set. - To deal with the problem of positive-classification bias for lexicon-based and negative-classification bias for corpus- based.

23 23 Experiments Data and Tools - 7,779 product reviews written in Chinese. - Divide those product reviews into 10 domains and are indexed as C1 to C10. - Each sub-corpus has equal number of positive and negative reviews. - HowNet Sentiment Dictionary is used as the sentiment dictionary (4566 positive words and 4370 negative words) - WEKA 3.4.11 is used to implement SVM - Taking the result report in “Automatic Seed Word Selection for Unsupervised Sentiment Classification of Chinese Test” as baseline.

24 24 Experiments (Cont.)

25 25 Experiments (Cont.) V1: Ratio control is removed V2: Using different seed V3: 6 negation are used ( 不, 不會, 沒有, 沒, 雖然, 雖, 盡管, 缺, 缺乏, 無 )

26 26 Experiments (Cont.) SELC* Model = Basic SELC Model without Uncertain Set Processing. SVM-HowNet: Using 10-fold validation.

27 27 Experiments (Cont.)

28 28 Experiments (Cont.)

29 29 Experiments (Cont.)

30 30 Discussion and Error Analysis Ratio control can decrease the increasing speed of positive item, thus overcome the positive classification bias. A general sentiment dictionary is used to replace a seed set that generated automatically. The utilization of supervised method improves the overall performance. Most of error are caused by ambiguous sentiment. Ex: 優點 ” 多 ” V.S. 缺點 ” 多 ”

31 31 Conclusion and Future work Propose a novel approach, which successful integrates a corpus-based model with lexicon-based model. Present several strategies to overcome the positive/negative classification bias through ratio control There are many complicated constructions involved in the indirection expression of negative sentiment. Ex: 實現 :positive word, 避免 : negative word.

32 32 The End


Download ppt "1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh."

Similar presentations


Ads by Google