BEHAVIORAL PREDICTION OF TWITTER USERS BASED ON TEXTUAL INFORMATION Shiyao Wang.

Slides:



Advertisements
Similar presentations
And It Begins… So, you want to start using social media for your business? Sounds like a plan… We will focus on two platforms today, Facebook and Twitter.
Advertisements

Sentiment Analysis on Twitter Data
Rapid Object Detection using a Boosted Cascade of Simple Features Paul Viola, Michael Jones Conference on Computer Vision and Pattern Recognition 2001.
Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Problem Semi supervised sarcasm identification using SASI
High Throughput Computing and Protein Structure Stephen E. Hamby.
We Know #Tag: Does the Dual Role Affect Hashtag Adoption? Lei Yang 1, Tao Sun 2, Ming Zhang 2, Qiaozhu Mei 1 1 School of Information, the University.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
U.S. SENATE BILL CLASSIFICATION & VOTE PREDICTION Alessandra Paulino Rick Pocklington Serhat Selcuk Bucak.
1 Statistical Learning Introduction to Weka Michel Galley Artificial Intelligence class November 2, 2006.
Extracting Interest Tags from Twitter User Biographies Ying Ding, Jing Jiang School of Information Systems Singapore Management University AIRS 2014, Kuching,
Social Theory Driven Operational Forecasting of Civil Unrest Event Outbreaks Final Project Presentation Peter Wu Apr 30, 2015.
Learning at Low False Positive Rate Scott Wen-tau Yih Joshua Goodman Learning for Messaging and Adversarial Problems Microsoft Research Geoff Hulten Microsoft.
Wang, Z., et al. Presented by: Kayla Henneman October 27, 2014 WHO IS HERE: LOCATION AWARE FACE RECOGNITION.
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
Presented by: Kushal Mehta University of Central Florida Michael Spreitzenbarth, Felix Freiling Friedrich-Alexander- University Erlangen, Germany michael.spreitzenbart,
1 1 Slide Evaluation. 2 2 n Interactive decision tree construction Load segmentchallenge.arff; look at dataset Load segmentchallenge.arff; look at dataset.
Introduction to Text and Web Mining. I. Text Mining is part of our lives.
Intelligent DataBase System Lab, NCKU, Taiwan Josh Jia-Ching Ying 1, Eric Hsueh-Chan Lu 2 and Vincent S. Tseng 1 1 Institute of Computer Science and Information.
GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.
Laboratory for InterNet Computing CSCE 561 Social Media Projects Ryan Benton October 8, 2012.
Microblogs: Information and Social Network Huang Yuxin.
Learning Objective Chapter 9 The Concept of Measurement and Attitude Scales Copyright © 2000 South-Western College Publishing Co. CHAPTER nine The Concept.
By Gianluca Stringhini, Christopher Kruegel and Giovanni Vigna Presented By Awrad Mohammed Ali 1.
Prediction of Influencers from Word Use Chan Shing Hei.
CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.
TEXT ANALYTICS - LABS Maha Althobaiti Udo Kruschwitz Massimo Poesio.
CSKGOI'08 Commonsense Knowledge and Goal Oriented Interfaces.
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
Linking Organizational Social Networking Profiles PROJECT ID: H JEROME CHENG ZHI KAI (A H ) 1.
Recognizing Stances in Online Debates Unsupervised opinion analysis method for debate-side classification. Mine the web to learn associations that are.
Mining information from social media
***Classification Model*** Hosam Al-Samarraie, PhD. CITM-USM.
Reputation Management System
Weka. Weka A Java-based machine vlearning tool Implements numerous classifiers and other ML algorithms Uses a common.
Combining Evolutionary Information Extracted From Frequency Profiles With Sequence-based Kernels For Protein Remote Homology Detection Name: ZhuFangzhi.
Learning from the Crowd: Collaborative Filtering Techniques for Identifying On-the-Ground Twitterers during Mass Disruptions Kate Starbird University of.
Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.
A Nonparametric Method for Early Detection of Trending Topics Zhang Advisor: Prof. Aravind Srinivasan.
Fabricio Benevenuto, Gabriel Magno, Tiago Rodrigues, and Virgilio Almeida Universidade Federal de Minas Gerais Belo Horizonte, Brazil ACSAC 2010 Fabricio.

Semi-Supervised Recognition of Sarcastic Sentences in Twitter and Amazon -Smit Shilu.
Positive Messaging. Coca-Cola has experienced their share of public backlash. This year, their Super Bowl commercial focused on something positive...
In part from: Yizhou Sun 2008 An Introduction to WEKA Explorer.
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
Alvin CHAN Kay CHEUNG Alex YING Relationship between Twitter Events and Real-life.
TwitterFeedRank Nick Flacco Dalton Huynh Abhishek Jha Phong Lam.
Launching Your Loyalty Program © Nova point of sale 1 20 Best Practices to Increase Enrollment
Grow Your Business with Social Marketing
Positive Messaging. Coca-Cola has experienced their share of public backlash. This year, their Super Bowl commercial focused on something positive...
Does one size really fit all? Evaluating classifiers in a Bag-of-Visual-Words classification Christian Hentschel, Harald Sack Hasso Plattner Institute.
A Sentiment-Based Approach to Twitter User Recommendation BY AJAY ABDULPUR RAJARAM NIKKAM.
Homework 3 Progress Presentation -Meet Shah. Goal Identify whether tweet is sarcastic or not.
Detecting Web Attacks Using Multi-Stage Log Analysis
Sentiment Analysis of Twitter Messages Using Word2Vec
Name: Sushmita Laila Khan Affiliation: Georgia Southern University
Summary Presented by : Aishwarya Deep Shukla
Sentiment Analysis Study
Schizophrenia Classification Using
Analyzing WebView Vulnerabilities in Android Applications
Tutorial for LightSIDE
iSRD Spam Review Detection with Imbalanced Data Distributions
Classification of highly unbalanced data using deep learning techniques
Predicting Prevalence of Influenza-Like Illness From Geo-Tagged Tweets
Classification Breakdown
Assignment 1: Classification by K Nearest Neighbors (KNN) technique
Kanchana Ihalagedara Rajitha Kithuldeniya Supun weerasekara
Best Twitter Tools To Embed Twitter Feed On Websites.
Presentation transcript:

BEHAVIORAL PREDICTION OF TWITTER USERS BASED ON TEXTUAL INFORMATION Shiyao Wang

Viral Event Ice Bucket Challenge To promote awareness of amyotrophic lateral sclerosis (ALS) Major activity: dump a bucket of ice water on someone’s head and encourage donation towards ALS research Rule: an individual can challenge others (usually 3) to take the challenge. Individual who receives the challenge can either choose to take it within 24 hrs or make a donation to the ALS research foundation. In most cases people take the challenge before nominating others. Went viral on SNS

Our Goal Analyze the spread pattern of this event primarily on Twitter Classify user behavior based on Tweets Look for potential correlations between information cascade and the offline behaviors within the Twitter network Further analysis on this rich set of data

Data million tweets purchased from Gnip, a third party Twitter data provider All tweets contains keywords or hashtags related to the ice bucket challenge Among all tweets, 5.44 million were original A total number of 5.56 million users were included and 2.51 million of them published original tweets

Text-based Classifier for User Behavior Goal: predict whether the user has taken the Ice Bucket Challenge (IBC) Data: Tweets related to the IBC (text containing keywords or hashtags)

Initial Approach Manual Labeling: To identify if there are strong signs of users’ taking the challenge Based on both the tweet text and the attached multimedia information (primarily URLs linking towards other SNS) Method debatable Feature Selection: Keyword (first person, third person, take, nominate, etc.) N-gram URL type (type of webpages being linked to) User statistics (number of followers/ees, etc.) Other features

Current Approach Feature Selection: Keyword replacement in tweet text: URLs checked and converted into keywords such as URL_S (URL linking towards SNS) Hashtags were converted as HASHTAG (or HASHTAG_CH if containing IBC related keywords) Mentions were converted as MENTION N-gram based on the modified tweet text POS tags based on the modified tweet text Roughly features in total

Previous Toy Classifier Data downloaded using Twitter API 580 tweets included, 155 were labeled as positive (26.7%) Best result given by NaiveBayes with 10 fold CV Positive class F-Measure: ROC: 0.924

Real Data Classifier Randomly selected 500 original tweets from the database Manual labeling performed, included opening links from tweets to find signs of taking the challenge (different from the toy classifier) 58 instances labeled as positive among the 500 tweets (11.6%)

Classifier Building Various classifiers were tested including, NaiveBayes, Random Forest/Tree, J48, Logistic, SMO, SVM, etc. Oversampling of training set on positive instances was implemented, different ratios between positive and negative instances tested Manual Cross Validation was implemented NaiveBayes and SVM with Linear Kernel works the best at this point

Results

Problems 1. Labeling method is debatable 2. Highly unbalanced dataset and small number of instances 3. Weka’s ROC and manually calculated ROC were slightly different (based on Python’s sklearn)