14/12/2009ICON Dipankar Das and Sivaji Bandyopadhyay Department of Computer Science & Engineering Jadavpur University, Kolkata , India ICON 2009 Emotion Tagging – A Comparative Study on Bengali and English Blogs
14/12/2009ICON Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion
14/12/2009ICON Motivation (1/3) In psychology and common use, emotion is an aspect of a person's mental state of being, normally based in or tied to the person’s internal (physical) and external (social) sensory feeling (Zhang et al., 2008)
14/12/2009ICON Motivation (2/3) Natural Language Processing (NLP) tasks - Tracking users’ emotion (products, events, politics) - Customer relationship management - Question Answering (QA) systems - Modern Information Retrieval (IR) systems
14/12/2009ICON Motivation (3/3) Blogs - Communicative and informative repository of text based emotional contents in the Web 2.0. (Lin et al., 2007) - Online diary of the bloggers - Blog posts annotated by other bloggers - Large data suitable for machine learning Recognition of emotion from written text
14/12/2009ICON Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion
14/12/2009ICON Resources (1/4) Bengali Blog - Web blog archive ( different comic related topics and user comments sentences English blog - Saima Aman and Stan Szpakowicz Identifying Expressions of Emotion in Text. V. Matoušek and P. Mautner (Eds.): TSD 2007, LNAI 4629, pp. 196– sentences
14/12/2009ICON Resources (2/4) English Sentiment Lexicon - SentiWordNet (Esuli et al., 2006) - WordNet Affect lists (WAL) (Strapparava et al., 2004) Updating of WAL - Inadequate number of emotion word entries - Retrieved synsets from English SentiWordNet - Update with synsets
14/12/2009ICON Resources (3/4) No Sentiment lexicon in Bengali Both SentiWordNet and WordNet Affect lists into Bengali Translation - Using Bengali synsets (English to Bengali bilingual synset dictionary being developed as part of the English to Indian Languages Machine Translation (EILMT) project, a TDIL project undertaken by the consortium of different premier institutes and sponsored by MCIT, Govt. of India WAL (termed as Emotion List)
14/12/2009ICON Resources (4/4) A knowledge base for Emoticons
14/12/2009ICON Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion
14/12/2009ICON Word Level Tagging Semi-automatic annotation Emotion tag to a word with help of the Emotion list Other non-emotional words tagged with neutral type Stemming process Verified by linguists 700 sentences for training, 300 and 200 sentences as development and test set
14/12/2009ICON Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion
14/12/2009ICON Baseline Model Identify word level emotion tagging accuracies for each emotion class All words incorporate no prior knowledge regarding word features Six separate modules for six emotion classes Words passed through six separate modules Tag each word with the emotion tag based on the emotion class in which that word appears
14/12/2009ICON Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion
14/12/2009ICON Morphology Minimize errors to recognize emotional words Bengali, like any other Indian languages, is morphologically very rich Different suffixes (e.g. verbs, the features are Tense, Aspect, and Person) Stemmer uses suffix list to identify the stem form For English, porter stemmer (Porter, 1997) 3.65% and 6.03% improvement over baseline system in average accuracies on Bengali and English test set
14/12/2009ICON Baseline vs. Morphology (Result)
14/12/2009ICON Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion
14/12/2009ICON CRF based Model (1/4) 10 active features (Das and Bandyopadhyay, 2009a) · POS information (adjective, verb, noun, adverb) · First sentence in a topic · SentiWordNet emotion word (delight…) · Reduplication (so-so, good-good..) · Question words (what, why…) · Colloquial / Foreign words · Special punctuation symbols · Quoted sentence ( “you are 2 good man”) · Sentence Length (>=8,<15) · Emoticons (, , ..) Different unigram and bi-gram context features (word level as well as POS tag level) and their combinations
14/12/2009ICON CRF based Model (2/4) Feature Analysis - Frequencies - Combination of multiple features vs. single feature - Feature with passive role (e.g. First sentence in a topic) (specific phenomenon for English blog corpus) but active for Topic or user comments or title sentences of Bengali blog - Special punctuation symbols Etc.), their frequencies and attachments obtain 3% and 6% improvement for Bengali and English - Length of a sentence (> eight and < fifteen words per sentence) - Added each feature if its inclusion along with the pre-selected features improves accuracy - Accuracy improvement of 20.83% for Bengali and 24.33% for English over baseline model
14/12/2009ICON CRF based Model (3/4)
14/12/2009ICON CRF based Model (4/4)
14/12/2009ICON Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion
14/12/2009ICON Sentence Level Tagging (1/2) Sense _ Tag_Weight (STW) - Select the basic six words “happy”, “sad”, “anger”, “disgust”, “fear” and “surprise” as seed words for six emotions - positive and negative scores from English SentiWordNet for each synset in which each of the seed words appears - Average retrieved score is fixed as Sense_Tag_Weight (STW) of that particular emotion tag
14/12/2009ICON Sentence Level Tagging (2/2) Sense_Weight_Score (SWS) for each emotion tag - SWS i =(STW i *N i )/(∑j=1 to 7 STW j *N j ) | i Єj - SWS i is the Sentence level Sense_Weight_Score for the emotion tag i - N i is the number of occurrences of that emotion tag in the sentence - Sentence level emotion tag SET = [max i=1 to 7 (SWSi)] - Sentences are of neutral type if for all emotion tags i, SWSi produced zero (0) emotion score Post-processing for handling negative words (Das and Bandyopadhyay, 2009b)
14/12/2009ICON Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion
14/12/2009ICON Evaluation (1/2) Accuracies - By counting number of sentences whose system assigned emotion tag match with the emotion tag corresponding to its emotion class
14/12/2009ICON Evaluation (2/2) Loss in accuracies - Frequent use of metaphoric words in blogs Bengali blogs collected from comic articles Emotions such as “happy”, “sad”, and “surprise” are present with sufficient number in the blog corpus Presence of adequate number of training examples for a particular emotion tag improves accuracy of that tag
14/12/2009ICON Outline Motivation Resources Word Level Tagging - Baseline Model - Morphology - CRF based Model Sentence Level Tagging Evaluation Conclusion
14/12/2009ICON Conclusion Handling of metaphors Phrase level analysis concerning genre of corpus Document level emotion identification More emotion annotated data - To improve the performance - Suitable for machine learning approach
14/12/2009ICON Thank you
14/12/2009ICON Questions ?