Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.

Slides:



Advertisements
Similar presentations
Mining customer ratings for product recommendation using the support vector machine and the latent class model William K. Cheung, James T. Kwok, Martin.
Advertisements

Stock Price Prediction Based on Social Network A survey Presented by: CHEN En.
Entity-Centric Topic-Oriented Opinion Summarization in Twitter Date : 2013/09/03 Author : Xinfan Meng, Furu Wei, Xiaohua, Liu, Ming Zhou, Sujian Li and.
Farag Saad i-KNOW 2014 Graz- Austria,
Twitter: The mental state of humankind Daniel Allen COMP 2903 X1 Fall 2010.
CS 315 – Web Search and Data Mining. Overview The power of crowdsourcing Predicting flu outbreaks Predicting “the present” through Google Insights! Predicting.
Tweet Classification for Political Sentiment Analysis Micol Marchetti-Bowick.
Towards Twitter Context Summarization with User Influence Models Yi Chang et al. WSDM 2013 Hyewon Lim 21 June 2013.
Distant Supervision for Emotion Classification in Twitter posts 1/17.
Supervised Learning Techniques over Twitter Data Kleisarchaki Sofia.
SOPS: Stock Prediction using Web Sentiment Presented by Vivek sehgal, Charles Song Department of Computer Science, University of Maryland ICDMW
A (very) brief introduction to multivoxel analysis “stuff” Jo Etzel, Social Brain Lab
Company LOGO Stock Price Forecasting with Support Vector Machines based on Web Financial Information Sentiment Analysis Run Cao School of Information Renmin.
A Survey on Text Categorization with Machine Learning Chikayama lab. Dai Saito.
Joint Sentiment/Topic Model for Sentiment Analysis Chenghua Lin & Yulan He CIKM09.
A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts 04 10, 2014 Hyun Geun Soo Bo Pang and Lillian Lee (2004)
Analysis of Twitter Data NIKHIL PURANIK CMSC 601 – Research Skills 25 th April 2011UNIVERSITY OF MARYLAND BALTIMORE COUNTY.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Collaborative Filtering in iCAMP Max Welling Professor of Computer Science & Statistics.
Sentiment analysis of news articles for financial signal prediction Anand Atreya Nicholas Cohen Jinjiang James Zhai.
Twitter Mood Predicts the Stock Market Authors: Johan Bollen, Huina Mao, Xiao-Jun Zeng Presented By: Krishna Aswani Computing ID: ka5am.
Twitter Volume Spikes: Analysis and Application in Stock Trading Yuexin Mao, Wei Wei and Bing Wang COMP4332/RMBI4310 CHAN Chun Ting ( )
Stock Market Prediction Using Sentiment Detection C. LEE FANZILLI ADVISORS: PROF. DVORAK AND PROF. WEBB.
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
SUPPORTING A MODELING CONTINUUM IN SCALATION John A. Miller Michael E. Cotterell Stephen J. Buckley University of Georgia IBM Thomas J. Watson Research.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Prediction model building and feature selection with SVM in breast cancer diagnosis Cheng-Lung Huang, Hung-Chang Liao, Mu- Chen Chen Expert Systems with.
Introduction to Text and Web Mining. I. Text Mining is part of our lives.
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Whose and what chatter matters? The effect of tweets on movie sales Huaxia Rui, Yizao Liu, Andrew Whinston Decision Support Systems, Elsevier Mar.
Automatic Detection of Tags for Political Blogs Khairun-nisa Hassanali Vasileios Hatzivassiloglou The University.
This work is supported by the Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number.
Enron Corpus: A New Dataset for Classification By Bryan Klimt and Yiming Yang CEAS 2004 Presented by Will Lee.
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
Spam Detection Ethan Grefe December 13, 2013.
Jennifer Lewis Priestley Presentation of “Assessment of Evaluation Methods for Prediction and Classification of Consumer Risk in the Credit Industry” co-authored.
Prediction of Influencers from Word Use Chan Shing Hei.
CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.
School of Engineering and Computer Science Victoria University of Wellington Copyright: Peter Andreae, VUW Image Recognition COMP # 18.
Department of Electrical Engineering and Computer Science Kunpeng Zhang, Yu Cheng, Yusheng Xie, Doug Downey, Ankit Agrawal, Alok Choudhary {kzh980,ych133,
1/21 Automatic Discovery of Intentions in Text and its Application to Question Answering (ACL 2005 Student Research Workshop )
Topic Modeling using Latent Dirichlet Allocation
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Don’t Follow me : Spam Detection in Twitter January 12, 2011 In-seok An SNU Internet Database Lab. Alex Hai Wang The Pensylvania State University International.
Stock market forecasting using LASSO Linear Regression model
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
WHAT IS DATA MINING?  The process of automatically extracting useful information from large amounts of data.  Uses traditional data analysis techniques.
Text-classification using Latent Dirichlet Allocation - intro graphical model Lei Li
Concept-Based Analysis of Scientific Literature Chen-Tse Tsai, Gourab Kundu, Dan Roth UIUC.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
DATA MINING and VISUALIZATION Instructor: Dr. Matthew Iklé, Adams State University Remote Instructor: Dr. Hong Liu, Embry-Riddle Aeronautical University.
Prepared by Fayes Salma.  Introduction: Financial Tasks  Data Mining process  Methods in Financial Data mining o Neural Network o Decision Tree  Trading.
A Simple Approach for Author Profiling in MapReduce
Learning to Detect and Classify Malicious Executables in the Wild by J
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Sentiment analysis algorithms and applications: A survey
Stock Market Prediction
Mining and Analyzing Data from Open Source Software Repository
Machine Learning Week 1.
Ninja Trader: Introduction to data mining in financial applications
Machine Learning 101 Intro to AI, ML, Deep Learning
Predicting Prevalence of Influenza-Like Illness From Geo-Tagged Tweets
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Stock Predictions Project Presentation
Austin Karingada, Jacob Handy, Adviser : Dr
Presentation transcript:

Forecasting with Twitter data Presented by : Thusitha Chandrapala MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA

What information does twitter messages have? Twitter information ▫Sentiment analysis: Are people happy or unhappy about a certain topic? ▫Volume: Number of tweets about a given topic Does twitter really help in predicting time series data? ▫Moving stream of info.

This motivation of the paper Use three different forecasting model families, vary parameters systematically and analyze under which conditions twitter information is actually useful Testing non-linearity and causality between twitter data and the target Introduction of summery tree

Related work Stock market prediction ▫Bollen et al:  Twitter -> sentiment->predict Dow Jones Industrial average ▫Wolfram et al.  Twitter as an additional source of features, no sentiment analysis Movie box office income ▫Mishne et al:  correlation, blog posts ▫Asur et al:  predict sales

Work flow 1) Collecting data 2) Cleaning and preprocessing 3) Sentiment analysis 4) Prediction model

Preprocessing: Language detection Negation handling: considering “I like this…” and “I don’t like this… “ to be 2 features Relevance filtering and topic classification: using LDA ▫Latent Dirichlet Allocation

Sentiment classification Whether the text contains negative or positive impressions on a given subject Approach 1: ▫Automatic tagging to extract training instances  :) :D - Happy sentiment  :( - Unhappy sentiment ▫Binary classification problem: Use naïve Bayes to train the classifier ▫Use different dictionaries as features

Sentiment classification Whether the text contains negative or positive impressions on a given subject Approach 1: ▫Automatic tagging to extract training instances  :) :D - Happy sentiment  :( - Unhappy sentiment ▫Binary classification problem: Use naïve Bayes to train the classifier ▫Use different dictionaries as features

Sentiment index A time-series of sentiment values ▫The daily value is calculated based on the daily % of +/- tweets over the total number of messages on a specific topic

Training the model ARMA : Auto Regressive Moving Average ▫y[t] = a.x[t]+b.x[t-1]+… +m.y[t-1]+n.y[t-2]….. Simplified prediction: ▫A binary prediction, which says if y[t]>y[t-1] ▫Use past values of self, and twitter time series

Model parameters Target Time seriesShare Market :Returns Movie box office: Revenue Twitter seriesVolume Sentiment Index Forecasting model familyLinear models Support vector machines Neural networks Result: Does including Twitter data increase classification accuracy by 5%?

Study details Stock market prediction targets ▫Companies: Apple, google, … ▫General market indices: S&P100, S&P500 Box office data ▫Daily sales revenue series

Summery Tree Helps to identify model parameters that leads to consistently +/- results Decision Tree structure ▫Nodes are different parameters ▫Leaves : Result

Summery Tree

Results: Stock market data Summery of prediction results: ▫Generally Linear models do not provide a significance performance improvement either for twitter volume or sentiment analysis based info. ▫Non-linear models can give an improvement! ▫Neural network based models gave the best performance

Results: Stock market data

Results: Movie box office Summary: ▫Sentiment analysis did not have a positive impact ▫Volume information had a positive impact with Linear regression and SVM

Conclusion In general, twitter information when used with non-linear models increase the prediction accuracy for long term stock market predictions Twitter volume had a linear relationship with movie sales, but sentiment analysis had none

Appendix Logarithmic returns of the series

Testing model adequacy Testing the relationship between twitter time series and the time series that has to be forecasted Neglected nonlinearity ▫Are the 2 Time series non-linearly related? Granger causality ▫X->Y OR Y->X ?