Presentation on theme: "Baseline Evaluation: An Empirical Study of the Performance of Machine Learning Algorithms in Short Snippet Sentiment Analysis Farag Saad"— Presentation transcript:
Baseline Evaluation: An Empirical Study of the Performance of Machine Learning Algorithms in Short Snippet Sentiment Analysis Farag Saad i-KNOW 2014 Graz- Austria,
Outline Introduction Sentiment Analysis Sentiment Classification Features extraction Feature weighting Classification algorithms (Binarized Multinomial Naïve Bayes) Training data Evaluation Classification performance comparison between various classifiers Does features selection useful for classification? Does the up weighting of adjectives improves classifiers’ performance? Conclusion 2
Introduction Emergence of Web 2.0 Internet is more user interactive Many users generate content daily Rich opinions are important. However, User-generated content Lacks Organization Contains Improper Structure Reviews are Long Therefore, automatically mining user-generated content is very difficult but a very important task to achieve 3 blog.journals.cambridge.org
Sentiment Analysis aka opinion mining Attempts to identify the opinion/sentiment that a person may hold towards an object (Bing Liu, 2010) Our task is to determine firstly, if a piece of text is Objective or subjective ? e.g.,: “Yesterday I bought a Nikon camera” is an objective text “The video capability is truly amazing” is subjective text Second, is to detect a text polarity: Positive or negative sentiment. However, 4
Sentiment Analysis A piece of text can fall between positive or negative “With the exception of burst shooting, this camera’s performance is excellent” The sentiment might be expressed explicitly or implicitly e.g., ”poor picture quality” explicit sentiment, while “The laptop battery lasted for 3 hour” implicit sentiment The sentiment is domain dependent e.g., “gangster kills a guy in a fight” bears a negative sentiment “fight illness with healthy food” bears positive sentiment 5
Sentiment Classification Consist of three main steps: Feature extraction Unigram feature Data preprocessing: Remove stop words but keep the rest Features reduction (select only the most useful feature) Feature weighting Term frequency Term frequency & inverse document frequency (TF-IDF) Term Presence Part of Speech (only adjective is selected) 6
Sentiment Classification Algorithms Naïve Bayes (NB) with its variations, Support Vector Machine (SVM) and J48 We will focus on describing the best performing classifier that is the Binarized Multinomial NB 7
The Binarized Multinomial NB 8
The Training/Test Data Data collection (Blitzer et al.,2007) The 12 domains dataset were binary annotated (+/-) It consists of reviews for different product types such as books, apparel, health and personal care, magazines etc. The selected products contain an equal number of sentences from positive and negative reviews. Each product contains 1000 sentences positive and 1000 negative The total annotated sentences across all products was J. Blitzer, M. Dredze, and F. Pereira. Biographies, bollywood, boom-boxes and blenders: Domain adaptation for sentiment classification (ACL 2007)
Evaluation Inconsistence in the previous studies in regard to algorithms performance We have designed a binary classification task, where each sentence is a test instance and the target class attribute is either positive or negative Train a selected classifier models to predict the class of any unlabeled test instances 11
Evaluation Three main experiments: A comparison of classifiers‘ performance has been carried out. What is the effect of feature selection methods (mainly using Information Gain) Does the up-weighting of adjectives leads to a classification improvement? 12
Evaluation / Cross-Validation For each data set: break up the data into 10 folds For each fold Select the current fold as test set Train the classifier on the rest (9 folds) Compute the average classification performance on the 10 runs Exp.1 Training Exp.2 Exp.3 Exp.4 Exp.5 Exp.10 Training
Evaluation CorporaNBMNB/BMNBSVMJ48 F1-Measure TFTP TF- IDF TFTP TF- IDF TFTP TF- IDF TFTP TF- IDF Apparel Baby Books Camera & Photo DVD Electronics Health & personal care Kitchen & housewares Magazines software Sport & outdoors Video Table 1: The polarity classification results using F1-Measure for different classifiers applied on 12 test data domains (the best performed method for each domain is in bold and underlined).
Evaluation CorporaNBMNB/BMNBSVMJ48 F1-Measure TFTP TF- IDF TFTP TF- IDF TFTP TF- IDF TFTPTF-IDF Apparel Baby Books Camera & Photo DVD Electronics Health & personal care Kitchen & housewares Magazines software Sport & outdoors Video Table 2: The impact of feature selection method Information Gain (IG) on the classifiers' performance using F1-Measure. (the best performed method for each domain is in bold and underlined).
Evaluation CorporaNBMNB/BMNBSVMJ48 F1-Measure TFTP TF- IDF TFTP TF- IDF TFTP TF- IDF TFTPTF-IDF Apparel Baby Books Camera & Photo DVD Electronics Health & personal care Kitchen & housewares Magazines software Sport & outdoors Video Table 2: The impact of feature selection method Information Gain (IG) on the classifiers' performance using F1-Measure. (the best performed method for each domain is in bold and underlined). Weighting methodImprovement TF6.10% TP5.62% TF&IDF2.63% Overall average4.7%
Conclusion We have conducted a series of comparative experiments in order to compare the performance of various machine learning classifiers on the sentiment analysis task We studied the impact of feature selection methods on the classification performance improvement The best achieved classification results were obtained using the BMNB classifier Using feature selection methods have led to a significant increase of all classifiers‘ performance using different feature weighting methods 17
Conclusion Based on the carried out experiments in this paper, our finding raised the possibility that the BMNB classier performs the best in a short snippet sentiment analysis We further support the recent finding done by Wang and Manning (2012*) (in the short snippet sentiment analysis, MNB actually performs better than other classifiers particularly better than SVM classifier 18 * S. Wang and C. D. Manning. Baselines and bigrams: Simple, good sentiment and topic classification. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics- Volume 2, ACL '12, pages 90-94, 2012.