Prediction and sentiment analysis Mahsa Elyasi.

Slides:



Advertisements
Similar presentations
Chapter 5 Multiple Linear Regression
Advertisements

CHAPTER TWELVE ANALYSING DATA I: QUANTITATIVE DATA ANALYSIS.
Simple Linear Regression Conditions Confidence intervals Prediction intervals Section 9.1, 9.2, 9.3 Professor Kari Lock Morgan Duke University.
From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series Brendan O’Connor Ramnath Balasubramanyan Bryan R. Routledge Noah A. Smith Carnegie.
Predicting Elections with Social Media: Opportunities and Challenges Marko M. Skoric Nanyang Technological University, Singapore Nathaniel D. Poor Independent.
CS 315 – Web Search and Data Mining. Overview The power of crowdsourcing Predicting flu outbreaks Predicting “the present” through Google Insights! Predicting.
Finding Correlations Between Geographical Twitter Sentiment and Stock Prices Undergraduate Researchers: Juweek Adolphe Ressi Miranda Graduate Student Mentor:
Testing Theories: Three Reasons Why Data Might not Match the Theory.
Language and Computation Group 18 th November 2011.
Building Up a Real Sector Confidence Index for Turkey Ece Oral Dilara Ece Türknur Hamsici CBRT.
The Crystal Ball Forecasting Elections in the United States.
Anindya Ghose Panos Ipeirotis Arun Sundararajan Stern School of Business New York University Opinion Mining using Econometrics A Case Study on Reputation.
Search Engines and Information Retrieval
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 1.1 Chapter One What is Statistics?
Beginning the Research Design
Data Sources The most sophisticated forecasting model will fail if it is applied to unreliable data Data should be reliable and accurate Data should be.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Chapter 13 Forecasting.
Linear Regression and Correlation Analysis
Who Needs Polls? Gauging Public Opinion from Twitter Data David Cummings Haruki Oh Ningxuan (Jason) Wang.
Part II – TIME SERIES ANALYSIS C2 Simple Time Series Methods & Moving Averages © Angel A. Juan & Carles Serrat - UPC 2007/2008.
Copyright © 2014 by McGraw-Hill Higher Education. All rights reserved.
Twitter Mood Predicts the Stock Market Authors: Johan Bollen, Huina Mao, Xiao-Jun Zeng Presented By: Krishna Aswani Computing ID: ka5am.
The Research Process. Purposes of Research  Exploration gaining some familiarity with a topic, discovering some of its main dimensions, and possibly.
Forecasting with Twitter data Presented by : Thusitha Chandrapala MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.
Coletto, Lucchese, Orlando, Perego ELECTORAL PREDICTIONS WITH TWITTER: A MACHINE-LEARNING APPROACH M. Coletto 1,3, C. Lucchese 1, S. Orlando 2, and R.
Correlation and Linear Regression
The new HBS Chisinau, 26 October Outline 1.How the HBS changed 2.Assessment of data quality 3.Data comparability 4.Conclusions.
A Joint Model of Feature Mining and Sentiment Analysis for Product Review Rating Jorge Carrillo de Albornoz Laura Plaza Pablo Gervás Alberto Díaz Universidad.
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
Measurement in Exercise and Sport Psychology Research EPHE 348.
Search Engines and Information Retrieval Chapter 1.
Theory testing Part of what differentiates science from non-science is the process of theory testing. When a theory has been articulated carefully, it.
Opinion Mining Using Econometrics: A Case Study on Reputation Systems Anindya Ghose, Panagiotis G. Ipeirotis, and Arun Sundararajan Leonard N. Stern School.
Testing Theories: Three Reasons Why Data Might not Match the Theory Psych 437.
The Crystal Ball Forecasting Elections in the United States.
1.1 Chapter One What is Statistics?. 1.2 What is Statistics? “Statistics is a way to get information from data.”
Chapter Six: Public Opinion and Political Socialization 1.
Chapter 6 Public Opinion, Political Socialization and Media.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Chapter 8: Estimating with Confidence Section 8.1 Confidence Intervals: The.
User Study Evaluation Human-Computer Interaction.
TAUCHI – Tampere Unit for Computer-Human Interaction 1 Statistical Models of Human Performance I. Scott MacKenzie.
Public Opinion Polls What is public opinion?. Public Opinion Polls take the pulse of America regarding many different issues. They are also predictors.
Statistical Inference: Making conclusions about the population from sample data.
Forecasting Elections POL Forecasting Models Aim to accurately predict the results of an election, before the election is held, identifying.
6. Evaluation of measuring tools: validity Psychometrics. 2012/13. Group A (English)
DOES LANGUAGE CHOICE PREDICT MOTIVATIONS FOR RELATIONSHIP INITIATION?: USING LIWC TO ANALYZE LINGUISTIC MARKERS OF INTENT IN ONLINE DATING PROFILES LIESEL.
Objectives 2.1Scatterplots  Scatterplots  Explanatory and response variables  Interpreting scatterplots  Outliers Adapted from authors’ slides © 2012.
This material is approved for public release. Distribution is limited by the Software Engineering Institute to attendees. Sponsored by the U.S. Department.
Prediction of Influencers from Word Use Chan Shing Hei.
Predicting Elections with Regressions Mario Guerrero Political Science 104 Thursday, November 13, 2008.
4-1 Operations Management Forecasting Chapter 4 - Part 2.
Qi Guo Emory University Ryen White, Susan Dumais, Jue Wang, Blake Anderson Microsoft Presented by Tetsuya Sakai, Microsoft Research.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 12 Testing for Relationships Tests of linear relationships –Correlation 2 continuous.
Statistical Reasoning
Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.
Run length and the Predictability of Stock Price Reversals Juan Yao Graham Partington Max Stevenson Finance Discipline, University of Sydney.
Sampling Design & Measurement Scaling
MGS3100_03.ppt/Feb 11, 2016/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Time Series Forecasting Feb 11, 2016.
BIVARIATE/MULTIVARIATE DESCRIPTIVE STATISTICS Displaying and analyzing the relationship between continuous variables.
Lesson 3 Measurement and Scaling. Case: “What is performance?” brandesign.co.za.
Introduction Sample surveys involve chance error. Here we will study how to find the likely size of the chance error in a percentage, for simple random.
Alvin CHAN Kay CHEUNG Alex YING Relationship between Twitter Events and Real-life.
Intro to Probability and Statistics 1-1: How Can You Investigate Using Data? 1-2: We Learn about Populations Using Samples 1-3: What Role Do Computers.
Twitter Based Research Benny Bornfeld Mentors Professor Sheizaf Rafaeli Dr. Daphne Raban.
TITLE: Detection and Polarization of Political Sentiments on Twitter
Predicting Political Positions With Twitter
Estimating with PROBE II
Big Data Environment. Analysing Public Perceptions of South Africa’s Local Elections by using Geo-located Twitter Data.
Presentation transcript:

Prediction and sentiment analysis Mahsa Elyasi

Word Salad: Relating Food Prices and Descriptions V Chahuneau, K Gimpel, B.R Routledge, L Scherlis, N.A Smith

Motivation 2 pcs chicken meal $4.99 Chicken Quesadillas Made with fresh Salsa, jack and Cheddar cheese $6.99 Caesar Salad Romain hearts Croutons, shaved, parmesan cheese and classic Caeser dressing $9.95 Poulet Cajun $28.00

Data location(city, neighborhood) Services available(delivery, wifi) Ambience(good for groups, noise level) Price range( $ to $$$$) location(city, neighborhood) Services available(delivery, wifi) Ambience(good for groups, noise level) Price range( $ to $$$$) 7 U.S cities

Data Distribution of prices & stars

Models Linear regression Logistic regression Features: – METADATA : – MENUNAMES : n-grams – MENUDESC : n-grams – MENTION : n-grams(word + ITEM + word)

Item price prediction Predict the price of each item on a menu

Item price prediction Baselines – Predict mean – Predict median – Regression Evaluation – Mean absolute error – Mean relative error Item’s price = w * x

Item price prediction $ $ % % Number of features with non- zero weight Total number of features

Item price prediction MENUDESC-authenticity

Item price prediction MENUDESC-size

Price range prediction For each restaurant on Yelp page McCullagh Ordinal regression McCullagh Ordinal regression

Polarity prediction

Joint price star prediction

From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series B O’Connor, R Balasubramanyan, B.R Routledge, N.A Smith

Measuring public opinion through social media?

Text Data: Twitter Twitter is large, public Sources – Archiving twitter Streaming API – Scrape of earlier messages via API Sizes – 0.7 billion messages, Jan 2008 – Oct 2009 – 1.5 billion messages, Jan 2008 _May 2010 Identify user location Message Language age Misleading information user population are changing The Republican’s are less likely to used social media for political purposes

Poll Data Consumer confidence – Index of Consumer Sentiment (ICS) – Gallup Daily 2008 Presidential Elections – Pollster.com 2009 Presidential Job Approval – Gallup Daily

Text Analysis Message retrieval – Identify messages relating to the topic consumer confidence: job, jobs, economy Presidential approval: obama Election: obama, mccain Opinion estimation – Positive opinion – Negative opinion – news lying Can vote location age informal language Weak word = strong word Weight

Sentiment analysis: word counting Within topical messages Count messages containing these positive and negative words lexicon : words marked as + or – This list is not well suited for social media English – “sucks”, “ : ) ”, “ : ( “

Sentiment ratio over Messages For one day t and topic word, compute score

Sentiment Ratio Moving Average High day-to-day volatility. Average last k days Keyword “jobs” K = 1, 7, 30 Gallup

Correlation Analysis: Smoothed comparisons,”jobs” sentiment Stock market Go’s down Stock market Go’s down Stock market go’s up

Predicting polls L+K days are necessary to cover start of the text sentiment window Text sentiment is a poor predictor of consumer confidence

Presidential elections and job approval Looks easy : simple decline r=72.5% k= 15 Looks easy : simple decline r=72.5% k= 15 Sentiment ratio has negative correlate to the election r = -8%

"I Wanted to Predict Elections with Twitter and all I got was this Lousy Paper" -- A Balanced Survey on Election Prediction using Twitter Data D Gayo-Avello

Flaws in using Twitter Data for Election Prediction It’s not prediction at all Chance is not valid baseline There is not a commonly accepted way of “counting votes” in Twitter There is not a commonly accepted way of interpreting reality Sentiment analysis are only slightly better than random classifiers All the tweets are assumed to be trustworthy Demographics are neglected Self-selection bias is simply ignored

Recommendations for using Twitter Data for Election Prediction There are elections virtually all the time, thus, if you are claiming you have a prediction method you should predict an election in the future! Check the degree of influence incumbency plays in the elections you are trying to predict. Your baseline should not be chance but predicting the incumbent will win. Apply that baseline to prior elections All elections are not important like presidential election Small amount of data available

Recommendations for using Twitter Data for Election Prediction Clearly define which is a “vote” and provide sound and compelling arguments supporting your definition. Clearly define the golden truth you are using. use the “real thing” How filter your data? Why are you using some of the users? or not?

Recommendations for using Twitter Data for Election Prediction Sentiment analysis is a core task. – We should first work on sentiment analysis in politics before trying to predict elections. Credibility should be a major concern. – Remove spammers

Recommendations for using Twitter Data for Election Prediction adjust your prediction: – the participation of the different groups in the prior election’s you are trying to predict – the belonging of users to each of those groups. The silent majority is a huge problem.

Relevant prior Art Modeling Public Mood and Emotion: Twitter Sentiment and Socio- Economic Phenomena Bollen, J., Pepe, A., and Mao, H – Definition of data and mood assessment – Data cleaning, parsing ad normalization – Time series production: aggregation of POMS mood scores over time application of mood (not sentiment) This paper dose not describe any predictive method Used US 2008 Obama Election, no conclusions are inferred regarding the predictability of election Bollen : “we assess the validity of our sentiment analysis by examining the effects of particular events, namely the U.S. Presidential election of November 4, 2008, and the Thanksgiving holiday in the U.S., on our time series. “

Relevant prior Art Paper 2(From Tweets to Polls ) No correlation was found between electoral polls and Twitter sentiment data

Relevant prior Art Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment Tumasjan, A., Sprenger, T.O., Sandner, P.G., and Welpe, I.M – Used LIWC for analysis of the tweets related to different parties running (German 2009 election) – Only count of tweets mentioning a party or candidate accurately predicted the election results – they claim that the MAE of the “prediction” based on Twitter data was rather close to that of actual polls.

Relevant prior Art Why the Pirate Party Won the German Election of 2009 or The Trouble With Predictions: A Response to previous slide Jungherr, A., Jürgens, P., and Schoen, H – method by Tumasjan et al. was based on arbitrary choices not taking into account all the parties running for the elections but just those represented in congress – results varied depending on the time window used to compute them.

Relevant prior Art Where There is a Sea There are Pirates: AResponse to previous slide Tumasjan, A., Sprenger, T.O., Sandner, P.G., and Welpe, I.M Twitter data is not to replace polls but to complement them

Relevant prior Art Understanding the Demographics of Twitter Users Mislove, A., Lehmann, S., Ahn, Y.Y., Onnela, J.P., and Rosenquist, J.N The methods applied are simple but quite compelling All of the data was inferred from the users profiles This is consistent with some of the findings of Gayo-Avello [8]