Semi Supervised Recognition of Sarcastic Sentences in Twitter and Amazon Dmitry DavidovOren TsurAri Rappoport.

Slides:



Advertisements
Similar presentations
Dan Jurafsky Lecture 4: Sarcasm, Alzheimers, +Distributional Semantics Computational Extraction of Social and Interactional Meaning SSLST, Summer 2011.
Advertisements

Entity-Centric Topic-Oriented Opinion Summarization in Twitter Date : 2013/09/03 Author : Xinfan Meng, Furu Wei, Xiaohua, Liu, Ming Zhou, Sujian Li and.
Sentiment Analysis on Twitter Data
Farag Saad i-KNOW 2014 Graz- Austria,
My name is Dustin Boswell and I will be presenting: Ensemble Methods in Machine Learning by Thomas G. Dietterich Oregon State University, Corvallis, Oregon.
Question Identification on Twitter 1 The Chinese University of Hong Kong, Shatin, N.T., Hong Kong 2 Google Research, Beijing, China 3 AT&T Labs Research,
Identifying Sarcasm in Twitter: A Closer Look
Distant Supervision for Emotion Classification in Twitter posts 1/17.
Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.
BEHAVIORAL PREDICTION OF TWITTER USERS BASED ON TEXTUAL INFORMATION Shiyao Wang.
Problem Semi supervised sarcasm identification using SASI
Computational Models of Discourse Analysis Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Automatic Identification of Cognates, False Friends, and Partial Cognates University of Ottawa, Canada University of Ottawa, Canada.
Sarcasm Detection on Twitter A Behavioral Modeling Approach
Title Course opinion mining methodology for knowledge discovery, based on web social media Authors Sotirios Kontogiannis Ioannis Kazanidis Stavros Valsamidis.
Why Watching Movie Tweets Won’t Tell the Whole Story? Felix Ming-Fai Wong, Soumya Sen, Mung Chiang EE, Princeton University 1 WOSN’12. August 17, Helsinki.
1 SELC:A Self-Supervised Model for Sentiment Classification Likun Qiu, Weishi Zhang, Chanjian Hu, Kai Zhao CIKM 2009 Speaker: Yu-Cheng, Hsieh.
1 A scheme for racquet sports video analysis with the combination of audio-visual information Visual Communication and Image Processing 2005 Liyuan Xing,
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
Learning syntactic patterns for automatic hypernym discovery Rion Snow, Daniel Jurafsky and Andrew Y. Ng Prepared by Ang Sun
Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,
CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER.
1/16 Final project: Web Page Classification By: Xiaodong Wang Yanhua Wang Haitang Wang University of Cincinnati.
Chapter 8 The Marketing Science of Sentiment Analysis To make smart changes in business, you need to understand your customers’ opinions Many market analysts.
Text Classification using SVM- light DSSI 2008 Jing Jiang.
1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.
1 Emotion Classification Using Massive Examples Extracted from the Web Ryoko Tokuhisa, Kentaro Inui, Yuji Matsumoto Toyota Central R&D Labs/Nara Institute.
COLING 2012 Extracting and Normalizing Entity-Actions from Users’ comments Swapna Gottipati, Jing Jiang School of Information Systems, Singapore Management.
Prediction of Influencers from Word Use Chan Shing Hei.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
Prepared by: Mahmoud Rafeek Al-Farra College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining
Sentiment Analysis with Incremental Human-in-the-Loop Learning and Lexical Resource Customization Shubhanshu Mishra 1, Jana Diesner 1, Jason Byrne 2, Elizabeth.
Linking Organizational Social Networking Profiles PROJECT ID: H JEROME CHENG ZHI KAI (A H ) 1.
Your Sentiment Precedes You: Using an author’s historical tweets to predict sarcasm Anupam Khattri 2, Aditya Joshi 1,3, Pushpak Bhattacharyya 1, Mark James.
CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Multi-level Bootstrapping for Extracting Parallel Sentence from a Quasi-Comparable Corpus Pascale Fung and Percy Cheung Human Language Technology Center,
Linked Data Profiling Andrejs Abele National University of Ireland, Galway Supervisor: Paul Buitelaar.
Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:
Predicting Voice Elicited Emotions
Extending SASI to Satirical Product Reviews: A Preview Bernease Herman University of Michigan Monday, April 22, 2013.
Reputation Management System
From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Date: 2013/9/25 Author: Mikhail Ageev, Dmitry Lagun, Eugene Agichtein Source: SIGIR’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Improving Search Result.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Semi-Supervised Recognition of Sarcastic Sentences in Twitter and Amazon -Smit Shilu.
Project Deliverable-1 -Prof. Vincent Ng -Girish Ramachandran -Chen Chen -Jitendra Mohanty.
Influence detection of famous personalities using Politeness and Likeability Navita Jain.
Opinion spam and Analysis 소프트웨어공학 연구실 G 최효린 1 / 35.
Sentiment Analysis on Tweets. Thumbs up? Sentiment Classification using Machine Learning Techniques Classify documents by overall sentiment. Machine Learning.
Homework 3 Progress Presentation -Meet Shah. Goal Identify whether tweet is sarcastic or not.
A Simple Approach for Author Profiling in MapReduce
Name: Sushmita Laila Khan Affiliation: Georgia Southern University
By : Namesh Kher Big Data Insights – INFM 750
University of Rochester
CRF &SVM in Medication Extraction
Progress Report - V Ravi Chander
Erasmus University Rotterdam
Summary Presented by : Aishwarya Deep Shukla
Sentiment Analysis Study
MID-SEM REVIEW.
Mining the Data Charu C. Aggarwal, ChengXiang Zhai
Example: Recognizing Five Letters
Quanzeng You, Jiebo Luo, Hailin Jin and Jianchao Yang
Family History Technology Workshop
Evaluation David Kauchak CS 158 – Fall 2019.
Presentation transcript:

Semi Supervised Recognition of Sarcastic Sentences in Twitter and Amazon Dmitry DavidovOren TsurAri Rappoport

Sarcasm: Definition “Sarcasm is a sophisticated form of speech act in which the speakers convey their message in an implicit way.” “The activity of saying or writing the opposite of what you mean, or of speaking in a way intended to make someone else feel stupid or angry.” – Macmillan English Dictionary(2007)

Examples Twitter: “This is what I get to study tonight…! Yippy #sarcasm”#sarcasm “Ahhhh the feeling you get while driving back to boarding school. The best. #sarcasm”#sarcasm Amazon: “Finally pens for women! I don’t know what I have been doing all my life writing with men’s pens.” “Defective by Design.”

SASI – Semi Supervised Sarcasm Identification Trains a classifier to recognize sarcastic patterns in a semi-supervised setting. Classifies sentences into sarcastic classes using the classifier: Absence of Sarcasm (1) to Clearly Sarcastic (5).

Seed data for Training(Amazon) 80 positive and 505 negative examples extended to 471 positive and 5020 negative examples. (Using Yahoo! BOSS API) Data was preprocessed to replace occurrences of author, product, company, book titles, usernames, links with [AUTHOR], [PRODUCT], [COMPANY], [TITLE], [USER], [LINK] Reduces specificity of patterns recognized.

Seed data for Training(Twitter) Positive examples same as the ones used for Amazon and negative examples were hand annotated. (cross domain) Data was preprocessed to replace occurrences of username, links and hash-tags with [USER], [LINK], [HASHTAG]

Testing data Amazon product reviews for 120 products 5.9 million tweets

Pattern extraction Words were classified as High frequency(HFW) or Content(CW) based on frequency comparison. HFW have a frequency of at least 100 per million and CW have a frequency of at most 1000 per million. Patterns such as “[COMPANY] CW does not CW much” and “about CW CW or CW CW” are extracted.

Pattern extraction(contd.) To reduce the number of patterns: – Remove patterns which occur in only one review – Remove ambivalent patterns. Patterns such as “[COMPANY] CW does not CW much” and “about CW CW or CW CW” are extracted.

Feature Vectors Each pattern is used as one element of feature vector F = [p1, p2, p3, ……, pn] Where pi = 1 – exact match α – sparse match ƴ * n/N – incomplete match 0 – No match

Classification Algorithm Feature vectors for seed data and test data are created and compared. For a vector v in the training set, Label(v) = 1/k Σ Count(Label(ti)) * Label(ti) Σ Count(Label(tj)) where t1..tk are the k seed vectors with lowest euclidean score from v

Baseline and Evaluation For the Amazon set, reviews with low star rating and high positive word content. For Twitter set, 1500 tweets with #sarcasm served as a gold standard. (Noisy) Five fold validation performed. A random sampling of 90 positively and 90 negatively ranked sentences from the test data were annotated with the help of Mechanical Turk. (k = 0.34(Am), k = 0.41(Tw))

Five Fold Evaluation(Amazon)

Five Fold Evaluation(Twitter)

Final evaluation results