Presentation on theme: "Identifying Sarcasm in Twitter: A Closer Look Roberto Gonzalez Smaranda Muresan Nina Wacholder."— Presentation transcript:
Identifying Sarcasm in Twitter: A Closer Look Roberto Gonzalez Smaranda Muresan Nina Wacholder
Aim of the study To construct a corpus of sarcastic utterances that have been explicitly labeled so by the composers themselves. (#sarcasm, #sarcastic) To exemplify the difficulty in distinguishing sarcastic sentences from negative/positive sentences.
Data Data for the study is divided in three sets of 900 tweets each: sarcastic, positive and negative. Each data set is culled from twitter using appropriate hash-tags. – Sarcasm: #sarcasm, #sarcastic – Positive: #happy, #joy, #lucky – Negative: #sadness, #frustrated, #angry
Data Preprocessing Tweets tagged with #sarcasm or #sarcastic in the middle of the tweet removed. Manually checked to see if the tags were a part of the content of the tweet. – Eg: “I really love #sarcasm”
Lexical features Unigrams Dictionary based – Pennebaker et al (LIWC) Linguistic Processes (adverbs, pronouns) Psychological Processes (Positive, negative emotion) Personal Concerns (work, achievement) Spoken Categories ( assent, non-fluencies) – WordNet Affect – List of interjections and punctuations
Classification Logistic Regression and Support Vector Machine with SMO (sequential minimal optimization) Features used: – Unigrams – Dictionary features presence (LIWC + _P) – Dictionary features frequency (LIWC + _F)
Comparison against human performance 3 judges asked to classify tweets as sarcastic, positive or negative. (90 tweets per category) S-N-P: 50% agreement (k = ) S-NS: 71.67% agreement (k = ) Emoticon based S-NS: 89% agreement (k = 0.74)