Presentation is loading. Please wait.

Presentation is loading. Please wait.

Farag Saad i-KNOW 2014 Graz- Austria,

Similar presentations


Presentation on theme: "Farag Saad i-KNOW 2014 Graz- Austria,"— Presentation transcript:

1 Farag Saad farag.saad@gesis.org i-KNOW 2014 Graz- Austria, 18-09-2014
Baseline Evaluation: An Empirical Study of the Performance of Machine Learning Algorithms in Short Snippet Sentiment Analysis Farag Saad i-KNOW 2014 Graz- Austria,

2 Outline Introduction Sentiment Classification Training data Evaluation
Sentiment Analysis Sentiment Classification Features extraction Feature weighting Classification algorithms (Binarized Multinomial Naïve Bayes) Training data Evaluation Classification performance comparison between various classifiers Does features selection useful for classification? Does the up weighting of adjectives improves classifiers’ performance? Conclusion

3 Introduction Emergence of Web 2.0 Internet is more user interactive
Many users generate content daily Rich opinions are important. However, User-generated content Lacks Organization Contains Improper Structure Reviews are Long Therefore, automatically mining user-generated content is very difficult but a very important task to achieve blog.journals.cambridge.org

4 Sentiment Analysis Sentiment Analysis
aka opinion mining Attempts to identify the opinion/sentiment that a person may hold towards an object (Bing Liu, 2010) Our task is to determine firstly, if a piece of text is Objective or subjective ? e.g.,: “Yesterday I bought a Nikon camera” is an objective text “The video capability is truly amazing” is subjective text Second, is to detect a text polarity: Positive or negative sentiment. However,

5 Sentiment Analysis A piece of text can fall between positive or negative “With the exception of burst shooting, this camera’s performance is excellent” The sentiment might be expressed explicitly or implicitly e.g., ”poor picture quality” explicit sentiment, while “The laptop battery lasted for 3 hour” implicit sentiment The sentiment is domain dependent e.g., “gangster kills a guy in a fight” bears a negative sentiment “fight illness with healthy food” bears positive sentiment

6 Sentiment Classification
Consist of three main steps: Feature extraction Unigram feature Data preprocessing: Remove stop words but keep the rest Features reduction (select only the most useful feature) Feature weighting Term frequency Term frequency & inverse document frequency (TF-IDF) Term Presence Part of Speech (only adjective is selected)

7 Sentiment Classification Algorithms
Naïve Bayes (NB) with its variations, Support Vector Machine (SVM) and J48 We will focus on describing the best performing classifier that is the Binarized Multinomial NB

8 The Binarized Multinomial NB
Given an unlabeled set of sentences Where denotes the ith test sentence and the denotes the word within it, and given a manually annotated training sentences, that contain sentences with their sentiment polarities where denotes the ith labeled training sentence and the refers to its polarity 𝑃 𝐶 𝑗 𝑡 𝑖 )=𝑃 𝑐 𝑗 𝑃 𝑡 𝑖 𝑐 𝑗 ) 𝑃( 𝑡 𝑖 ) −−−−−> (1) 𝑃 𝑡 𝑖 𝑐 𝑗 )= 𝑡=1 𝑛 𝑥𝑡 ! 𝑡=1 |𝑉| 𝑃( 𝑤 𝑡 | 𝑐 𝑗 ) 𝑥 𝑡 𝑥 𝑡 ! −−−−−−> (2)

9 The Binarized Multinomial NB
The probability 𝑃 𝑤 𝑡 𝑐 𝑗 ) based on a set of Documents D is computed as follows: Duplicates in each document in 𝐷 will be eliminated where for each word 𝑤 𝑡 in a document 𝑑 𝑗 only one instance is kept Concatenate all documents resulting in the first step into a single document 𝑑 𝑘 Count the number of occurrences for each 𝑤 𝑡 in 𝑑 𝑘 Laplace to avoid zero estimates: 𝑃 𝑤 𝑡 𝐶 𝑗 = 𝑓 𝑤 𝑡 , 𝑐 𝑗 +𝜇 𝑓 𝑐 𝑗 +|𝑉| −−−−−−> (3)

10 The Training/Test Data
Data collection (Blitzer et al.,2007) The 12 domains dataset were binary annotated (+/-) It consists of reviews for different product types such as books, apparel, health and personal care, magazines etc. The selected products contain an equal number of sentences from positive and negative reviews. Each product contains 1000 sentences positive and 1000 negative The total annotated sentences across all products was 24000 J. Blitzer, M. Dredze, and F. Pereira. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification (ACL 2007)

11 Evaluation Inconsistence in the previous studies in regard to algorithms performance We have designed a binary classification task, where each sentence is a test instance and the target class attribute is either positive or negative Train a selected classifier models to predict the class of any unlabeled test instances

12 Evaluation Three main experiments:
A comparison of classifiers‘ performance has been carried out. What is the effect of feature selection methods (mainly using Information Gain) Does the up-weighting of adjectives leads to a classification improvement?

13 Evaluation / Cross-Validation
Exp.1 Training Exp.2 Exp.3 Exp.4 Exp.5 Exp.10 1 2 3 4 5 10 For each data set: break up the data into 10 folds For each fold Select the current fold as test set Train the classifier on the rest (9 folds) Compute the average classification performance on the 10 runs

14 Evaluation Corpora NB MNB/BMNB SVM J48 F1-Measure TF TP TF-IDF Apparel
0.747 0.781 0.685 0.828 0.827 0.811 0.699 0.762 0.815 0.736 0.737 0.734 Baby 0.706 0.756 0.700 0.797 0.803 0.792 0.655 0.503 0.790 0.693 0.701 0.686 Books 0.658 0.697 0.767 0.769 0.763 0.588 0.555 0.768 0.636 0.638 0.656 Camera & Photo 0.698 0.728 0.798 0.665 0.820 0.735 0.727 DVD 0.682 0.726 0.661 0.773 0.782 0.780 0.597 0.547 0.786 0.672 0.676 0.673 Electronics 0.708 0.680 0.777 0.779 0.614 0.678 0.670 0.671 Health & personal care 0.707 0.72 0.810 0.800 0.640 0.642 0.808 0.695 Kitchen & housewares 0.681 0.739 0.691 0.794 0.690 0.793 0.724 0.717 Magazines 0.722 0.821 0.822 0.823 0.621 0.562 0.826 0.778 software 0.630 0.709 0.731 0.806 0.583 0.719 0.713 0.712 Sport & outdoors 0.758 0.725 0.801 0.789 0.743 Video 0.703 0.745 0.688 0.765 0.647 0.714 0.729 Table 1: The polarity classification results using F1-Measure for different classifiers applied on 12 test data domains (the best performed method for each domain is in bold and underlined).

15 Evaluation Corpora NB MNB/BMNB SVM J48 F1-Measure TF TP TF-IDF Apparel
0.780 0.789 0.746 0.848 0.852 0.839 0.785 0.792 0.826 0.741 0.743 Baby 0.760 0.771 0.735 0.819 0.818 0.808 0.769 0.764 0.812 0.716 0.718 0.705 Books 0.703 0.709 0.701 0.766 0.777 0.763 0.693 0.695 0.774 0.667 0.664 0.674 Camera & Photo 0.744 0.742 0.817 0.828 0.825 0.748 0.739 0.719 DVD 0.751 0.715 0.783 0.805 0.702 0.725 0.691 0.685 0.704 Electronics 0.720 0.707 0.813 0.806 0.738 0.683 Health & personal care 0.757 0.827 0.821 0.824 0.807 0.69 Kitchen & housewares 0.736 0.759 0.833 0.829 0.728 0.734 Magazines 0.761 0.843 0.849 0.858 0.755 0.753 software 0.670 0.724 0.747 0.815 0.669 0.714 0.713 Sport & outdoors 0.778 0.800 0.820 0.781 0.809 0.723 0.706 Video 0.749 0.776 0.801 0.721 0.727 0.730 Table 2: The impact of feature selection method Information Gain (IG) on the classifiers' performance using F1-Measure. (the best performed method for each domain is in bold and underlined).

16 Evaluation Weighting method Improvement TF 6.10% TP 5.62% TF&IDF 2.63%
Corpora NB MNB/BMNB SVM J48 F1-Measure TF TP TF-IDF Apparel 0.780 0.789 0.746 0.848 0.852 0.839 0.785 0.792 0.826 0.741 0.743 Baby 0.760 0.771 0.735 0.819 0.818 0.808 0.769 0.764 0.812 0.716 0.718 0.705 Books 0.703 0.709 0.701 0.766 0.777 0.763 0.693 0.695 0.774 0.667 0.664 0.674 Camera & Photo 0.744 0.742 0.817 0.828 0.825 0.748 0.739 0.719 DVD 0.751 0.715 0.783 0.805 0.702 0.725 0.691 0.685 0.704 Electronics 0.720 0.707 0.813 0.806 0.738 0.683 Health & personal care 0.757 0.827 0.821 0.824 0.807 0.69 Kitchen & housewares 0.736 0.759 0.833 0.829 0.728 0.734 Magazines 0.761 0.843 0.849 0.858 0.755 0.753 software 0.670 0.724 0.747 0.815 0.669 0.714 0.713 Sport & outdoors 0.778 0.800 0.820 0.781 0.809 0.723 0.706 Video 0.749 0.776 0.801 0.721 0.727 0.730 Weighting method Improvement TF 6.10% TP 5.62% TF&IDF 2.63% Overall average 4.7% Table 2: The impact of feature selection method Information Gain (IG) on the classifiers' performance using F1-Measure. (the best performed method for each domain is in bold and underlined).

17 Conclusion We have conducted a series of comparative experiments in order to compare the performance of various machine learning classifiers on the sentiment analysis task We studied the impact of feature selection methods on the classification performance improvement The best achieved classification results were obtained using the BMNB classifier Using feature selection methods have led to a significant increase of all classifiers‘ performance using different feature weighting methods

18 Conclusion Based on the carried out experiments in this paper, our finding raised the possibility that the BMNB classier performs the best in a short snippet sentiment analysis We further support the recent finding done by Wang and Manning (2012*) (in the short snippet sentiment analysis, MNB actually performs better than other classifiers particularly better than SVM classifier * S. Wang and C. D. Manning. Baselines and bigrams: Simple, good sentiment and topic classification. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics- Volume 2, ACL '12, pages 90-94, 2012.


Download ppt "Farag Saad i-KNOW 2014 Graz- Austria,"

Similar presentations


Ads by Google