Happiness and Stocks Ali Javed, Tim Stevens

Slides:



Advertisements
Similar presentations
Design of Experiments Lecture I
Advertisements

Managerial Economics Estimation of Demand
Learning Algorithm Evaluation
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Simple Linear Regression and Correlation
Opinion Spam and Analysis Nitin Jindal and Bing Liu Department of Computer Science University of Illinois at Chicago.
Regression and Correlation
Chapter 12 - Forecasting Forecasting is important in the business decision-making process in which a current choice or decision has future implications:
Data Sources The most sophisticated forecasting model will fail if it is applied to unreliable data Data should be reliable and accurate Data should be.
Statistics for Business and Economics
Chapter 13 Forecasting.
ForecastingOMS 335 Welcome to Forecasting Summer Semester 2002 Introduction.
+ Doing More with Less : Student Modeling and Performance Prediction with Reduced Content Models Yun Huang, University of Pittsburgh Yanbo Xu, Carnegie.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Twitter Volume Spikes: Analysis and Application in Stock Trading Yuexin Mao, Wei Wei and Bing Wang COMP4332/RMBI4310 CHAN Chun Ting ( )
Chapter 9 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 What is a Perfect Positive Linear Correlation? –It occurs when everyone has the.
Aron, Aron, & Coups, Statistics for the Behavioral and Social Sciences: A Brief Course (3e), © 2005 Prentice Hall Chapter 3 Correlation and Prediction.
Determining the Significance of Item Order In Randomized Problem Sets Zachary A. Pardos, Neil T. Heffernan Worcester Polytechnic Institute Department of.
ESTIMATING & FORECASTING DEMAND Chapter 4 slide 1 Regression Analysis estimates the equation that best fits the data and measures whether the relationship.
Forecasting supply chain requirements
© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Part IV Significantly Different: Using Inferential Statistics
STATISTICAL ANALYSIS FOR THE MATHEMATICALLY-CHALLENGED Associate Professor Phua Kai Lit School of Medicine & Health Sciences Monash University (Sunway.
CROSS-VALIDATION AND MODEL SELECTION Many Slides are from: Dr. Thomas Jensen -Expedia.com and Prof. Olga Veksler - CS Learning and Computer Vision.
Evaluating Risk Adjustment Models Andy Bindman MD Department of Medicine, Epidemiology and Biostatistics.
Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.
Correlation and Linear Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Financial Statistics Unit 2: Modeling a Business Chapter 2.2: Linear Regression.
2.5 Using Linear Models A scatter plot is a graph that relates two sets of data by plotting the data as ordered pairs. You can use a scatter plot to determine.
Developing a Hiring System Measuring Applicant Qualifications or Statistics Can Be Your Friend!
Regression. Outline of Today’s Discussion 1.Coefficient of Determination 2.Regression Analysis: Introduction 3.Regression Analysis: SPSS 4.Regression.
Chapter 7 An Introduction to Portfolio Management.
Chapter 13 Understanding research results: statistical inference.
Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December am – 11 am Puan Hasmawati Binti Hassan
Data Mining: Neural Network Applications by Louise Francis CAS Convention, Nov 13, 2001 Francis Analytics and Actuarial Data Mining, Inc.
LESSON 5 - STATISTICS & RESEARCH STATISTICS – USE OF MATH TO ORGANIZE, SUMMARIZE, AND INTERPRET DATA.
Diversification, risk, return and the market portfolio.
Why Model? Make predictions or forecasts where we don’t have data.
Jump Detection and Analysis Investigation of Media/Telecomm Industry
REGRESSION (R2).
Practical Statistics Abbreviated Summary.
Dr. Siti Nor Binti Yaacob
26134 Business Statistics Week 5 Tutorial
Modify—use bio. IB book  IB Biology Topic 1: Statistical Analysis
Please hand in Project 4 To your TA.
R. E. Wyllys Copyright 2003 by R. E. Wyllys Last revised 2003 Jan 15
Estimation & Hypothesis Testing for Two Population Parameters
Virtual COMSATS Inferential Statistics Lecture-26
Linear Regression and Correlation Analysis
Chapter 11: Simple Linear Regression
Understanding Results
Saif Ullah Lecture Presentation Software to accompany Investment Analysis and.
Elementary Statistics
Just What Is Science Anyway???
Introduction Feature Extraction Discussions Conclusions Results
S519: Evaluation of Information Systems
Happy new year Welcome back.
Teaching Analytics with Case Studies: Finding Love in a Classification Tree Ruth Hummel, PhD JMP Academic Ambassador.
Evaluation and Its Methods
Product moment correlation
15.1 The Role of Statistics in the Research Process
Chapter 8: Relationships among Variables
Deeper exploration of volume and Jump statistics
Chapter 3 Correlation and Prediction
Sample Presentation – Mr. Linden
Correlation and Prediction
Presentation transcript:

Happiness and Stocks Ali Javed, Tim Stevens Department of Computer Science STAT295– Introduction to Statistical Learning in R

STAT295– Introduction to Statistical Learning in R Outline Introduction Background and Dataset Experimental Setup Evaluation Conclusion STAT295– Introduction to Statistical Learning in R

STAT295– Introduction to Statistical Learning in R Applying statistical methods to understand market behavior has been a topic of research since decades. Market depends on infinite variable, not all of which have been digitized. Sentiment analysis using data from social media websites and internet is a latest topic of interest amongst researchers [https://www.marketpsych.com] STAT295– Introduction to Statistical Learning in R

STAT295– Introduction to Statistical Learning in R Data From Sep -9-2008 to Oct 10-2017 Hednometer: Daily metric of happiness using twitter data. Range: 0-10 Mean: 6.02 Standard Deviation: 6.04 S&P500 Index: Range : 1268 - 6534 NASDAQCOM: 676 – 2537 Both S&P500 and NASDAQCOM show an increasing trend throughout. STAT295– Introduction to Statistical Learning in R

STAT295– Introduction to Statistical Learning in R Proposed Research Problem To what extend is there a relationship between ”happiness” and the price of stock market? Which features have the strongest relationships with S&P500 index. Features created for project: Happiness Value S&P500 Value NASDAQCOM Value Lag variables Change variables Direction variables STAT295– Introduction to Statistical Learning in R

STAT295– Introduction to Statistical Learning in R Happiness Data Day Happiness Value Change Monday 6.014 -0.016 Tuesday 6.015 0.000 Wenesday 6.016 6.003 Thursday 6.021 0.006 Friday 6.039 0.022 STAT295– Introduction to Statistical Learning in R

STAT295– Introduction to Statistical Learning in R SP500 and HAPPINESS STAT295– Introduction to Statistical Learning in R

STAT295– Introduction to Statistical Learning in R SP500 and HAPPINESS Conclusion: No correlation between happiness and SP500 Low Pearson Correlation Coefficients <.1 for any Happiness and S&P500 variable pair. Very poor model performance, 56.7% accuracy Logistic regression for SP500_direction~HAPPINESS and 75/25 train/test split. Naïve guessing SP500_direction yields 56% accuracy. QDA and LDA perform similarly. KNN performs worse, 53% accuracy (K = 3 with 10-fold CV) STAT295– Introduction to Statistical Learning in R

SP500 and HAPPINESS 95% confidence interval of AUC = .48 - .59 STAT295– Introduction to Statistical Learning in R

STAT295– Introduction to Statistical Learning in R KNN Accuracy predicted using CFV with f=10 and repeated 15 times. Highest accuracy of 53% at K = 3 STAT295– Introduction to Statistical Learning in R

STAT295– Introduction to Statistical Learning in R SP500 AND NASDAQCOM STAT295– Introduction to Statistical Learning in R

STAT295– Introduction to Statistical Learning in R SP500 AND NASDAQCOM .9949 Pearson Correlation Coefficient SP500_direction~NASDAQCOM with logistic regression: 89.5% accurate with test data Similar results with LDA and QDA KNN 86.5% test accuracy with K = 35 and 10-fold CV Not useful- data is available at the same time Both are weighted averages of stock prices. STAT295– Introduction to Statistical Learning in R

SP500 AND NASDAQCOM 95% confidence interval of AUC = .943-.974 STAT295– Introduction to Statistical Learning in R

STAT295– Introduction to Statistical Learning in R KNN Accuracy predicted using CFV with f=10 and repeated 15 times. Highest accuracy of 86.5% at K = 35 STAT295– Introduction to Statistical Learning in R

STAT295– Introduction to Statistical Learning in R Hosmer and Lemeshow SP500~HAPPINESS : Chi = 10.77 and p-value = .21 SP500~NASDAQCOM: Chi = 16.293 and p-value = .03 p-value < .05 implies significant evidence of lack of fit. Higher Chi values are worse. Why? Our well fitting model may flounder in more subtle changes in the data. (e.g. SP500 increases by 1), which indicates a lack of fit. P-value of >0.05 is not an indication of a good fit, just an indication that there is not significant evidence to suggest a lack of good fit. STAT295– Introduction to Statistical Learning in R

STAT295– Introduction to Statistical Learning in R For the Future Pick a better dataset with more probability of correlation Less feature creation Diversify with more models such as lasso or random forests STAT295– Introduction to Statistical Learning in R

STAT295– Introduction to Statistical Learning in R Conclusion No correlation between happiness and stock data Index funds are obviously correlated and can reliably predict each other for the same time period Validating with multiple tests, and multiple models is important. Beating the market is no easy task. STAT295– Introduction to Statistical Learning in R