Presentation is loading. Please wait.

Presentation is loading. Please wait.

Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA.

Similar presentations


Presentation on theme: "Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA."— Presentation transcript:

1 Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA

2 What information does twitter messages have? Twitter information ▫Sentiment analysis: Are people happy or unhappy about a certain topic? ▫Volume: Number of tweets about a given topic Does twitter really help in predicting time series data? ▫Moving stream of info.

3 This motivation of the paper Use three different forecasting model families, vary parameters systematically and analyze under which conditions twitter information is actually useful Testing non-linearity and causality between twitter data and the target Introduction of summery tree

4 Related work Stock market prediction ▫Bollen et al:  Twitter -> sentiment->predict Dow Jones Industrial average ▫Wolfram et al.  Twitter as an additional source of features, no sentiment analysis Movie box office income ▫Mishne et al:  correlation, blog posts ▫Asur et al:  predict sales

5 Work flow 1) Collecting data 2) Cleaning and preprocessing 3) Sentiment analysis 4) Prediction model

6 Preprocessing: Language detection Negation handling: considering “I like this…” and “I don’t like this… “ to be 2 features Relevance filtering and topic classification: using LDA ▫Latent Dirichlet Allocation

7 Sentiment classification Whether the text contains negative or positive impressions on a given subject Approach 1: ▫Automatic tagging to extract training instances  :) :D - Happy sentiment  :( - Unhappy sentiment ▫Binary classification problem: Use naïve Bayes to train the classifier ▫Use different dictionaries as features

8 Sentiment classification Whether the text contains negative or positive impressions on a given subject Approach 1: ▫Automatic tagging to extract training instances  :) :D - Happy sentiment  :( - Unhappy sentiment ▫Binary classification problem: Use naïve Bayes to train the classifier ▫Use different dictionaries as features

9 Sentiment index A time-series of sentiment values ▫The daily value is calculated based on the daily % of +/- tweets over the total number of messages on a specific topic

10 Training the model ARMA : Auto Regressive Moving Average ▫y[t] = a.x[t]+b.x[t-1]+… +m.y[t-1]+n.y[t-2]….. Simplified prediction: ▫A binary prediction, which says if y[t]>y[t-1] ▫Use past values of self, and twitter time series

11 Model parameters Target Time seriesShare Market :Returns Movie box office: Revenue Twitter seriesVolume Sentiment Index Forecasting model familyLinear models Support vector machines Neural networks Result: Does including Twitter data increase classification accuracy by 5%?

12 Study details Stock market prediction targets ▫Companies: Apple, google, … ▫General market indices: S&P100, S&P500 Box office data ▫Daily sales revenue series

13 Summery Tree Helps to identify model parameters that leads to consistently +/- results Decision Tree structure ▫Nodes are different parameters ▫Leaves : Result

14 Summery Tree

15 Results: Stock market data Summery of prediction results: ▫Generally Linear models do not provide a significance performance improvement either for twitter volume or sentiment analysis based info. ▫Non-linear models can give an improvement! ▫Neural network based models gave the best performance

16 Results: Stock market data

17 Results: Movie box office Summary: ▫Sentiment analysis did not have a positive impact ▫Volume information had a positive impact with Linear regression and SVM

18 Conclusion In general, twitter information when used with non-linear models increase the prediction accuracy for long term stock market predictions Twitter volume had a linear relationship with movie sales, but sentiment analysis had none

19 Appendix Logarithmic returns of the series

20 Testing model adequacy Testing the relationship between twitter time series and the time series that has to be forecasted Neglected nonlinearity ▫Are the 2 Time series non-linearly related? Granger causality ▫X->Y OR Y->X ?


Download ppt "Forecasting with Twitter data Presented by : Thusitha Chandrapala 20064923 MARTA ARIAS, ARGIMIRO ARRATIA, and RAMON XURIGUERA."

Similar presentations


Ads by Google