Presentation is loading. Please wait.

Presentation is loading. Please wait.

SUPERVISORS DR. VERENA RIESER & PROF. ROB POOLEY SENTIMENT ANALYSIS OF ARABIC SOCIAL NETWORKS PRESENTED BY ESHRAG REFAEE.

Similar presentations


Presentation on theme: "SUPERVISORS DR. VERENA RIESER & PROF. ROB POOLEY SENTIMENT ANALYSIS OF ARABIC SOCIAL NETWORKS PRESENTED BY ESHRAG REFAEE."— Presentation transcript:

1 SUPERVISORS DR. VERENA RIESER & PROF. ROB POOLEY SENTIMENT ANALYSIS OF ARABIC SOCIAL NETWORKS PRESENTED BY ESHRAG REFAEE

2 OUTLINE The concept of sentiment analysis Arabic as a morphologically rich language Aims of the research Sentiment analysis in English and Arabic literature Twitter corpus: collection and annotation Empirical work Results and evaluation Future work

3 SENTIMENT ANALYSIS Definition: Analysing and understanding people’s sentiments, evaluations, opinions, attitudes, and emotions from written text. Research on SA appeared early 2000 (Liu, 2012). SA is one of the most active research areas in NLP.

4 APPLICATIONS In addition to its significance as a major sub-field of Natural Language Processing (NLP)research, SSA has a potential of several:  Commercial applications measuring success of a product  Social applications  Political applications  Economical applications

5 SENTIMENT ANALYSIS OF SOCIAL NETWORKS The growing importance of sentiment analysis coincides with the growth of social media such as reviews, forum discussions, and micro-blogs. A social network like twitter, with more than 500 million active users (ALEXA, 2012), provides a global arena for users to share views, attitudes, preferences etc; and discuss points of agreement, and/or conflict. March 2012, Twitter has become available in Arabic (Twitter Blog, 2012)

6 ABOUT ARABIC Arabic is the language of an aggregate population of over 300 million people, first language of the 22 member countries of the Arabic League and official language in three others (Habash, 2010).

7 ABOUT ARABIC Arabic language can be classified into three major levels:  Classic Arabic (CA)  Modern standard Arabic (MSA)  Arabic Dialects (AD). Social networks uses DA & MSA side-by-side(Al- Sabbagh, and Girju, 2012).

8 AIMS Address the bottleneck of availability of NLP resources to study SA of Arabic micro-blogs genre by constructing a corpus of Arabic tweets, a subset of which is annotated for sentiment analysis. Use the corpus to build and test models of sentiment analysis. Employ freely available Arabic NLP tool for annotating language specific features, including Part-of-Speech tagging, and morphological analysis. Evaluate the quality of these features by measuring their contribution to the SA classification task.

9 AIMS OF THIS RESEARCH Construct a corpus of Arabic tweets for sentiment analysis. Build and test classification models for automatic sentiment analysis. Explore distant supervision approaches to build efficient models for the changing twitter stream.

10 SENTIMENT ANALYSIS OF ENGLISH TEXT Feature-sets Publication Word tokens Semantic Feat. Stylistic Feat. n-grams Morph Unique Domain POS User: PER/ORG Statistical Feat. Classification Schemes ResultsTargeted language Yu, H., & Hatzivassiloglou, V. (2003) NBAcc. 91 English(newswire articles, question-answering) Abbasi et al (2008) SVM 10-fold CV 2-stage classification Best Acc. 91.70 English and Arabic forums, movie reviews Osherenko, (2008) SVMprecision 44% recall 42% English (759sentences) Wilson et al (2009) Boos Texter, TiMBL, Ripper, SVM (1)Perfect neutral classification (manual). BL78.7 SVM81.6 (2) Auto neut. Detection SVM64. Neutral-polar SVM75.3 English (question-answering opinion corpus) Bifet and Frank (2010) Multi-nominal NB, SGD Best acc.86.11 NB 86.26 SGD Englis tweets (automatic annotation using emoticons) Pak and Paroubek (2010) NB SVM 60% FEnglish tweets Purver and Battersby (2012) SVM 10-fold CV Six-class emotion detection 77.5% F for happiness on manual test set English tweets-distant Learning (automatic annotation using emoticons) noisy labels

11 SENTIMENT ANALYSIS OF ARABIC TEXT Feature-sets Publication Word tokens Semantic Feat. Stylistic Feat. n-grams Morph Unique Domain POS User: PER/ORG Statistical Feat. Classification Schemes ResultsTargeted language Abbasi et al (2008) SVM 10-fold CV 2-stage classification Best Acc. 91.70 English and Arabic forums, movie reviews Farra et al (2010) SVM, J48 10-fold CV Acc. Grammatical 89.3/semant 80 Arabic movie reviews(44) Abdul-Mageed et al 2011 SVM 2-stage classification (-neutral) Manual polarity MSA lexicon Stem+morph+ADJ 90.93 F 5-fold CV 95.52 F (with the best config. Modern Standard Arabic El-Halees, 2011 Max entropy, k- nearest, NB, SVM Best acc. 84.34 Arabic forum posts(1143) Itani et al 2012 Naïve BayesBest acc. 85.6 Arabic (Facebook posts) Mourad and Darwish 2013 NB, and SVM 2- stage (sentiment: only positive vs. negative) 10- fold CV Best acc. On tweet SUBJ 64.1, SENTI 72.5 Arabic tweets (2,300 manual annotation)

12 APPROACH AND METHODOLOGY Arabic Twitter Corpora Build and annotate a Twitter corpora for SSA Machine Learning Algorithm Apply a machine learning scheme : Support Vector Machines (SVM) Naïve Bayes (NB) Decision Tree (J48) Build a sentiment classifier Learn a statistical classifier to discriminate a given text to: subjective vs. objective subjective positive vs. subjective negative Evaluate and test models’ capabilities of being generalised 10 fold cross- validation Independent test set

13 BUILDING TRAINING SET 1: DEFINING THE ANNOTATION SCHEME LabelDefinitionExample Polar  Positive or negative emotion, evaluation, or attitude. السياحة في اليمن جمال لا يصدق Tourism in Yemen, unbelievable beauty positive  Clear positive indicator كم انت عظيم يا بشار الاسد How great you are, Bashar Al-Asad Negative  Clear negative indicator حنا للأسف نستخدم ايفون Unfortunately, we use the iPhone Neutral  Simple factual statement/ news  Open questions with no emotions indicated  Undeterminable indicators/neither positive or negative وفاة جديدة بإتش 7 إن 9 بالصين A new reported death case with H7N9 in China كيف انقطعت الإنترنت عن سوريا؟ How was the Internet disconnected from Syria? لمساواة في قمع الحريات الشخصية عدل Equality in suppressing personal freedoms is justice

14 BUILDING TRAINING SET 2: AGREEMENT STUDY we conducted an inter-annotator agreement study on a subset of 677 of the annotated tweets. We use Cohen’s Kappa (Cohen, 1960) which measures the degree of agreement among the assigned labels, correcting for agreement by chance. Where Pr(a) is the observed agreement among annotators, and Pr(e) is the probability of agreement by chance among annotators. The overall observed agreement is 84.79% and resulting weighted Kappa reached 0.756, which indicates a reliable annotations.

15 OUR ARABIC TWITTER CORPUS  (Refaee E, and Rieser V, 2014). An Arabic Twitter Corpus for Subjectivity and Sentiment Analysis. Proceedings of the 9th International Conference on Language Resources and Evaluation (LREC 2014) Reykjavik, Iceland.  Corpus freely available from LREC repository.

16 APPROACH AND METHODOLOGY Arabic Twitter Corpora Build and annotate a Twitter corpora for SSA Machine Learning Algorithm Apply a machine learning scheme : Support Vector Machines (SVM) Naïve Bayes (NB) Decision Tree (J48) Build a sentiment classifier Learn a statistical classifier to discriminate a given text to: subjective vs. objective subjective positive vs. subjective negative Evaluate and test models’ capabilities of being generalised 10 fold cross- validation Independent test set

17 BUILDING TRAINING SET : FEATURES EXTRACTION & FEATURE VECTOR CONSTRUCTION Raw tweets An Arabic Twitter Corpora Text cleaning-up Sentiment annotation Feature extraction Pre-processing: build feature vector Classifier/ learner Class of a new document

18 EXPERIMENTAL SETTINGS a.Machine learners We use the implementations of the following algorithms provided by the WEKA data mining package – version 3.7.9 (Witten and Frank, 2005).  Naïve Bayes (NB)  Trees (J48) NB is a simple probabilistic classifier that assume the feature independence J48 is a statistical model that generate a decision tree used for classification.

19 EXPERIMENTAL SETTINGS a.Machine learners We use the implementations of the following algorithms provided by the WEKA data mining package – version 3.7.9 (Witten and Frank, 2005).  Sequential Minimal Optimization-SMO (Platt, 1999) Support Vector Machines (SVM)  ZeroR (baseline scheme) SVM aims to identify the Optimal hyperplane that linearly separates data instances with the maximum margin

20 EXPERIMENTAL SETTINGS b. Evaluation Metrics The results are evaluated with respect to two statistical measurements: F-measure (F) the harmonic average of the precision and recall: Where precision is the ratio of retrieved instances that are relevant, and recall is the ratio of relevant instances that are retrieved. The accuracy is percentage of the correctly classified instances: For all experiments, machine learners were run 100 times for each data-set (10 repetition* 10-fold cross validation)

21 RESULTS AND EVALUATION baselineSVM Tokens55.2594.55 Morph feat.55.2595.64 Semantic feat.55.2596.02 Stylistic feat.55.25 96.05 2-level classification: Subjective vs. Objective

22 RESULTS AND EVALUATION 2-level classification: positive vs. negative baselineSVM Tokens50.1688.21 Morph feat.50.1689.55 Semantic feat.50.1691.69 Stylistic feat.50.16 92.1

23 RESULTS AND EVALUATION baselineSVM Tokens55.2592.29 Morph feat.55.2592.47 Semantic feat.55.2593.22 Stylistic feat.55.25 93.46 Single-level classification: positive vs. negative. Vs. neutral

24 CURRENT DIRECTION OF RESEARCH Applying semi-supervised learning to automatically annotate the rest of our twitter corpus. Investigate distant learning approaches to boost a large training set to be used for models’ optimisation. Building a high quality polarity lexicon to be employed in automatically detecting/identifying the overall sentiment orientation of a given text. Explore culture-related features that can detect cultural references in user-generated text.

25 THANKS @eshragR


Download ppt "SUPERVISORS DR. VERENA RIESER & PROF. ROB POOLEY SENTIMENT ANALYSIS OF ARABIC SOCIAL NETWORKS PRESENTED BY ESHRAG REFAEE."

Similar presentations


Ads by Google