Who Needs Polls? Gauging Public Opinion from Twitter Data David Cummings Haruki Oh Ningxuan (Jason) Wang.

Who Needs Polls? Gauging Public Opinion from Twitter Data David Cummings Haruki Oh Ningxuan (Jason) Wang

From Tweets to Poll Numbers Motivation: People spend millions of dollars on polling every year: politics, economy, entertainment Millions of posts on Twitter every day Can we model public opinion using tweets? Data: 476 million tweets from June to December 2009, courtesy of Jure Lescovec Public polls from The Gallup Organization (presidential approval, economic confidence) and Rasmussen Reports (generic Congressional ballot) Goal: high correlation with public opinion polls All correlation figures for 6-day smoothing window

Approach 1: Volume The simplest metric: percentage of tweets that mention a given topic in a certain time window Moderate negative correlation (-36.3%, -35.7%) for economy and Congressional ballot: mention things you want to complain about more often Higher correlation (52.4%) for Obama

Approach 2: Generic Sentiment Can we distinguish between positive and negative sentiment of tweets? University of Pennsylvania OpinionFinder subjective polarity lexicon “conceited”strong negative-10 “ironic”weak negative-5 “trendy”weak positive+5 “illuminating”strong positive+10 Sum word scores for a tweet to classify it as positive, negative, or neutral; then subtract negative counts from positive counts and normalize over window

Approach 2: Generic Sentiment Good results on economic confidence: 60.4% correlation, 70.1% correlation on 15-day window Poor performance on presidential approval and Congressional ballot: -24.5% and 21.5% correlation respectively Sentiment about politics expressed differently?

Approach 3: LM-based Classification Train three language models (positive, negative, and neutral) on hand-classified data Classify each tweet according to the language model that affords it the highest probability Applied for the case of Obama: manually classified 3,633 tweets “can we all talk about how awesome Obama is?” “that Obama sticker on your car might as well say ‘Yes I’m stupid’ #tcot #iamthemob #teaparty #glennbeck” Then we tested the language models: best performer was a linearly interpolated bigram model

Approach 3: LM-based Classification Much-improved results on presidential approval: 49.4% correlation Throwing out retweets and duplicate tweets helps a little more: 55.9% correlation Finally, combining both volume and LM-based sentiment gives best results: 63.3% correlation, or 69.6% correlation on a 15-day window

Who Needs Polls? Gauging Public Opinion from Twitter Data David Cummings Haruki Oh Ningxuan (Jason) Wang.

Similar presentations

Presentation on theme: "Who Needs Polls? Gauging Public Opinion from Twitter Data David Cummings Haruki Oh Ningxuan (Jason) Wang."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Who Needs Polls? Gauging Public Opinion from Twitter Data David Cummings Haruki Oh Ningxuan (Jason) Wang.

Similar presentations

Presentation on theme: "Who Needs Polls? Gauging Public Opinion from Twitter Data David Cummings Haruki Oh Ningxuan (Jason) Wang."— Presentation transcript:

Similar presentations

About project

Feedback