Presentation is loading. Please wait.

Presentation is loading. Please wait.

SemEval 2013 Task 2 Labs AVAYA: Sentiment Analysis in Twitter with Self-Training and Polarity Lexicon Expansion Lee Becker, George Erhart, David Skiba,

Similar presentations


Presentation on theme: "SemEval 2013 Task 2 Labs AVAYA: Sentiment Analysis in Twitter with Self-Training and Polarity Lexicon Expansion Lee Becker, George Erhart, David Skiba,"— Presentation transcript:

1 SemEval 2013 Task 2 Labs AVAYA: Sentiment Analysis in Twitter with Self-Training and Polarity Lexicon Expansion Lee Becker, George Erhart, David Skiba, and Valentine Matula June 16, 2013

2 Participation SemEval 2013 Task 2 Subtasks: Training Conditions:
A: Message Polarity Classification B: Contextual Polarity Disambiguation Training Conditions: Constrained Unconstrained Testing Conditions Tweet SMS

3 Guiding Intuitions Boost recall of positive/negative instances (A,B)
Don’t worry about neutral instances (A,B) Encode polarity cues into features (A,B) Exploit the context (A)

4 System Overview: Task B Constrained
Polarity Lexicon Sentiment Labeled Tweets Constrained Model Feature Extraction

5 System Overview: Task B Unconstrained
Unlabeled Tweets Constrained Model Auto Labeled Tweets Unconstrained Model Expanded PolarityLexicon Feature Extraction

6 Overview: Task A Models
Polarity Lexicon Sentiment Labeled Contexts Feature Extraction Constrained Model Expanded Polarity Lexicon Sentiment Labeled Contexts Feature Extraction Unconstrained Model

7 Preprocessing Normalization: NLP Pipeline URLS @Mentions
Written in ClearTK framework ClearNLP Wrappers Tokenization – preserves emoticons and URLs POS Tagging Lemmatization Dependency Parsing PTB POS -> ArkTweet POS (Gimpel, et. al. 2011) Dependencies -> Collapsed Dependencies

8 Resources MPQA Subjectivity Lexicon (Wilson, Weibe and Hoffman, 2005)
Hand-Crafted Negation Word Dictionary Hand-Crafted Emoticon Polarity Dictionary

9 Task B Features Polarized Bag-of-Words Negation Window
Easy way to double the feature space (e.g. happy & NOT_happy) Negation Window I am not too happy about this, but I’m still pumped and thrilled for tomorrow. Features: Token Token + PTB POS Token + Simplified POS Lemma Lemma + PTB POS Lemma + Simplified POS

10 Task B Features Message Polarity Features Microblogging Features
Word Sentiment Counts (pos|neg) Emoticon Sentiment Counts (pos|neg) Net word polarity Net emoticon polarity Microblogging Features ALL CAPS word counts Words with repeated characters (yaaaaay, booooo) counts Emphasis (*yes*) Winning Sports score (Nuggets 15-0) PTB POS Tag counts Collapsed Dependency Relations Incorporated negation Text-Text Lemma+Simplified POS – Lemma+Simplified POS POS - Lemma

11 Task B: Constrained Model
LIBLinear with Logistic Regression loss function Heavily boosted negative-polarity instances wpositive =1 wnegative = 25 wneutral = 1

12 Polarity Lexicon Expansion: Overview
+ =

13 Polarity Lexicon Expansion: Pointwise Mutual Information
Based on Semantic Orientation for Sentiment (Turney, 2002) Intuition: Utilize co-occurrence statistics to measure words’ dependence/independence with a polarity. PMI(word, sentiment) = log2 p(word, sentiment) p(word)p(sentiment) polarity(word) = sgn(PMI(word, positive) – PMI(word, negative))

14 Polarity Lexicon Expansion: From tweets to lexicon
Differences from Turney (2002) Classifier output instead of seed words Words instead of word phrases Procedure Applied to ~475k Unlabeled Tweets Filtered and balanced corpus via classifier confidence score thresholds 50,789 positive instances ( > 0.9) 59,029 negative instances ( > 0.7) 70,601 neutral instances ( > 0.8) Removed: f(word) < 10 neutral polarity words single character words (‘a’, ‘j’, ‘I’, etc…) numbers (1, 20, 1000) punctuation Merged with MPQA subjectivity lexicon Final lexicon size: 11,740 entries

15 Task B: Unconstrained Model
Self-trained model ~470k constrained model produced instances ~10k original instances Expanded polarity lexicon Heavily discounted neutral instances wpositive =2 wnegative = 5 wneutral = 0.1

16 Task B Results System Fpos+ Fneg- Fneu Favg +/- Rank Tweet NRC-Canada
.733 .647 .744 .690 1 Avaya-Unconstrained .700 .582 .713 .641 5 Avaya-Constrained .669 .548 .608 12 Mean .626 .450 .538 - SMS .730 .639 .799 .685 .648 .553 .778 .600 4 .633 .557 .759 .595 .546 .456 .627 .501

17 Task A: Features Same as Task B Additional Features:
Polarized Bag of Words Contextual Polarity Features Word Sentiment Counts (pos|neg) Emoticon Sentiment Counts (pos|neg) Net word polarity Net emoticon polarity Microblogging Features PTB POS tags Additional Features: Scoped Dependencies Dependency Paths

18 Task A Features: Scoped Dependencies
You do not want to miss this tomorrow night. root nsubj neg xcomp aux tmod OUT_neg_nsubj(want,you) OUT_neg(want, not) IN_xcomp(want, miss) IN_aux(miss, to) OUT_tmod(miss, tomorrow)

19 Task A Features: Dependency Paths
Criminals killed Sadat and in the process they killed Egypt. dobj conj root POS Path: {NNP} dobj < {VBD} < conj {VBD} < root Sentiment POS Path: {^/neutral} < {V/negative} < {V/negative} < {root} In Subject: False In Object: True

20 Task A Models Constrained: MPQA Subjectivity Lexicon
Unconstrained: Expanded Polarity Lexicon LIBLinear wpositive =11 wnegative = 2 wneutral = 1

21 Task A Results System Fpos+ Fneg- Fneu Favg +/- Rank Tweet NRC-Canada
.910 .869 .110 .889 1 Avaya-Unconstrained .898 .849 .311 .874 2 Avaya-Constrained .896 .843 .309 .870 3 Mean .773 .677 .115 .725 - SMS GUMLTLT .865 .902 .086 .884 .842 .138 .858 .823 .856 .125 .839 4 .710 .698 .099 .704

22 Discussion Dictionary expansion via supervised sentiment models provides a relatively simple way to expand the feature space and expand coverage. Dependency-Based features provide additional context and richer information Future work Ablation studies Better tuning of self-training

23 Thank you! Task 2 Organizers and Participants SemEval 2013 Organizers
Anonymous Reviewers


Download ppt "SemEval 2013 Task 2 Labs AVAYA: Sentiment Analysis in Twitter with Self-Training and Polarity Lexicon Expansion Lee Becker, George Erhart, David Skiba,"

Similar presentations


Ads by Google