Presentation is loading. Please wait.

Presentation is loading. Please wait.

14/12/2009ICON 20091 Dipankar Das and Sivaji Bandyopadhyay Department of Computer Science & Engineering Jadavpur University, Kolkata-700032, India ICON.

Similar presentations


Presentation on theme: "14/12/2009ICON 20091 Dipankar Das and Sivaji Bandyopadhyay Department of Computer Science & Engineering Jadavpur University, Kolkata-700032, India ICON."— Presentation transcript:

1 14/12/2009ICON 20091 Dipankar Das and Sivaji Bandyopadhyay Department of Computer Science & Engineering Jadavpur University, Kolkata-700032, India ICON 2009 Emotion Tagging – A Comparative Study on Bengali and English Blogs

2 14/12/2009ICON 20092 Outline  Motivation  Resources  Word Level Tagging - Baseline Model - Morphology - CRF based Model  Sentence Level Tagging  Evaluation  Conclusion

3 14/12/2009ICON 20093 Motivation (1/3)  In psychology and common use, emotion is an aspect of a person's mental state of being, normally based in or tied to the person’s internal (physical) and external (social) sensory feeling (Zhang et al., 2008)

4 14/12/2009ICON 20094 Motivation (2/3)  Natural Language Processing (NLP) tasks - Tracking users’ emotion (products, events, politics) - Customer relationship management - Question Answering (QA) systems - Modern Information Retrieval (IR) systems

5 14/12/2009ICON 20095 Motivation (3/3)  Blogs - Communicative and informative repository of text based emotional contents in the Web 2.0. (Lin et al., 2007) - Online diary of the bloggers - Blog posts annotated by other bloggers - Large data suitable for machine learning  Recognition of emotion from written text

6 14/12/2009ICON 20096 Outline  Motivation  Resources  Word Level Tagging - Baseline Model - Morphology - CRF based Model  Sentence Level Tagging  Evaluation  Conclusion

7 14/12/2009ICON 20097 Resources (1/4)  Bengali Blog - Web blog archive (www.amarblog.com) - 14 different comic related topics and user comments - 1200 sentences  English blog - Saima Aman and Stan Szpakowicz.2007. Identifying Expressions of Emotion in Text. V. Matoušek and P. Mautner (Eds.): TSD 2007, LNAI 4629, pp. 196–205 - 1200 sentences

8 14/12/2009ICON 20098 Resources (2/4)  English Sentiment Lexicon - SentiWordNet (Esuli et al., 2006) - WordNet Affect lists (WAL) (Strapparava et al., 2004)  Updating of WAL - Inadequate number of emotion word entries - Retrieved synsets from English SentiWordNet - Update with synsets

9 14/12/2009ICON 20099 Resources (3/4)  No Sentiment lexicon in Bengali  Both SentiWordNet and WordNet Affect lists into Bengali  Translation - Using Bengali synsets (English to Bengali bilingual synset dictionary being developed as part of the English to Indian Languages Machine Translation (EILMT) project, a TDIL project undertaken by the consortium of different premier institutes and sponsored by MCIT, Govt. of India  WAL (termed as Emotion List)

10 14/12/2009ICON 200910 Resources (4/4)  A knowledge base for Emoticons

11 14/12/2009ICON 200911 Outline  Motivation  Resources  Word Level Tagging - Baseline Model - Morphology - CRF based Model  Sentence Level Tagging  Evaluation  Conclusion

12 14/12/2009ICON 200912 Word Level Tagging  Semi-automatic annotation  Emotion tag to a word with help of the Emotion list  Other non-emotional words tagged with neutral type  Stemming process  Verified by linguists  700 sentences for training, 300 and 200 sentences as development and test set

13 14/12/2009ICON 200913 Outline  Motivation  Resources  Word Level Tagging - Baseline Model - Morphology - CRF based Model  Sentence Level Tagging  Evaluation  Conclusion

14 14/12/2009ICON 200914 Baseline Model  Identify word level emotion tagging accuracies for each emotion class  All words incorporate no prior knowledge regarding word features  Six separate modules for six emotion classes  Words passed through six separate modules  Tag each word with the emotion tag based on the emotion class in which that word appears

15 14/12/2009ICON 200915 Outline  Motivation  Resources  Word Level Tagging - Baseline Model - Morphology - CRF based Model  Sentence Level Tagging  Evaluation  Conclusion

16 14/12/2009ICON 200916 Morphology  Minimize errors to recognize emotional words  Bengali, like any other Indian languages, is morphologically very rich  Different suffixes (e.g. verbs, the features are Tense, Aspect, and Person)  Stemmer uses suffix list to identify the stem form  For English, porter stemmer (Porter, 1997)  3.65% and 6.03% improvement over baseline system in average accuracies on Bengali and English test set

17 14/12/2009ICON 200917 Baseline vs. Morphology (Result)

18 14/12/2009ICON 200918 Outline  Motivation  Resources  Word Level Tagging - Baseline Model - Morphology - CRF based Model  Sentence Level Tagging  Evaluation  Conclusion

19 14/12/2009ICON 200919 CRF based Model (1/4)  10 active features (Das and Bandyopadhyay, 2009a) · POS information (adjective, verb, noun, adverb) · First sentence in a topic · SentiWordNet emotion word (delight…) · Reduplication (so-so, good-good..) · Question words (what, why…) · Colloquial / Foreign words · Special punctuation symbols (!,@,?..) · Quoted sentence ( “you are 2 good man”) · Sentence Length (>=8,<15) · Emoticons (, , ..) Different unigram and bi-gram context features (word level as well as POS tag level) and their combinations

20 14/12/2009ICON 200920 CRF based Model (2/4)  Feature Analysis - Frequencies - Combination of multiple features vs. single feature - Feature with passive role (e.g. First sentence in a topic) (specific phenomenon for English blog corpus) but active for Topic or user comments or title sentences of Bengali blog - Special punctuation symbols (!,@,? Etc.), their frequencies and attachments obtain 3% and 6% improvement for Bengali and English - Length of a sentence (> eight and < fifteen words per sentence) - Added each feature if its inclusion along with the pre-selected features improves accuracy - Accuracy improvement of 20.83% for Bengali and 24.33% for English over baseline model

21 14/12/2009ICON 200921 CRF based Model (3/4)

22 14/12/2009ICON 200922 CRF based Model (4/4)

23 14/12/2009ICON 200923 Outline  Motivation  Resources  Word Level Tagging - Baseline Model - Morphology - CRF based Model  Sentence Level Tagging  Evaluation  Conclusion

24 14/12/2009ICON 200924 Sentence Level Tagging (1/2)  Sense _ Tag_Weight (STW) - Select the basic six words “happy”, “sad”, “anger”, “disgust”, “fear” and “surprise” as seed words for six emotions - positive and negative scores from English SentiWordNet for each synset in which each of the seed words appears - Average retrieved score is fixed as Sense_Tag_Weight (STW) of that particular emotion tag

25 14/12/2009ICON 200925 Sentence Level Tagging (2/2)  Sense_Weight_Score (SWS) for each emotion tag - SWS i =(STW i *N i )/(∑j=1 to 7 STW j *N j ) | i Єj - SWS i is the Sentence level Sense_Weight_Score for the emotion tag i - N i is the number of occurrences of that emotion tag in the sentence - Sentence level emotion tag SET = [max i=1 to 7 (SWSi)] - Sentences are of neutral type if for all emotion tags i, SWSi produced zero (0) emotion score  Post-processing for handling negative words (Das and Bandyopadhyay, 2009b)

26 14/12/2009ICON 200926 Outline  Motivation  Resources  Word Level Tagging - Baseline Model - Morphology - CRF based Model  Sentence Level Tagging  Evaluation  Conclusion

27 14/12/2009ICON 200927 Evaluation (1/2)  Accuracies - By counting number of sentences whose system assigned emotion tag match with the emotion tag corresponding to its emotion class

28 14/12/2009ICON 200928 Evaluation (2/2)  Loss in accuracies - Frequent use of metaphoric words in blogs  Bengali blogs collected from comic articles  Emotions such as “happy”, “sad”, and “surprise” are present with sufficient number in the blog corpus  Presence of adequate number of training examples for a particular emotion tag improves accuracy of that tag

29 14/12/2009ICON 200929 Outline  Motivation  Resources  Word Level Tagging - Baseline Model - Morphology - CRF based Model  Sentence Level Tagging  Evaluation  Conclusion

30 14/12/2009ICON 200930 Conclusion  Handling of metaphors  Phrase level analysis concerning genre of corpus  Document level emotion identification  More emotion annotated data - To improve the performance - Suitable for machine learning approach

31 14/12/2009ICON 200931 Thank you

32 14/12/2009ICON 200932 Questions ?


Download ppt "14/12/2009ICON 20091 Dipankar Das and Sivaji Bandyopadhyay Department of Computer Science & Engineering Jadavpur University, Kolkata-700032, India ICON."

Similar presentations


Ads by Google