Presentation is loading. Please wait.

Presentation is loading. Please wait.

Predicting Market Movements: From Breaking News to Emerging Social Media Dr. Hsinchun Chen Director, Artificial Intelligence Lab University of Arizona.

Similar presentations


Presentation on theme: "Predicting Market Movements: From Breaking News to Emerging Social Media Dr. Hsinchun Chen Director, Artificial Intelligence Lab University of Arizona."— Presentation transcript:

1 Predicting Market Movements: From Breaking News to Emerging Social Media Dr. Hsinchun Chen Director, Artificial Intelligence Lab University of Arizona hchen@eller.arizona.edu http://ai.arizona.edu Acknowledgements: NSF CRI; NSF EXP-LA; DOD DTRA, CTFP, NPS; (ARFL WMD, CIA, FBI)

2 PREDICITNG MARKET MOVEMENTS

3 Predicting Markets Markets: international markets, emerging markets, import/export markets, financial market, stock market, commodity market, retail market Economics (macro), international relations (trade, geopolitics), finance (international/banking/stock), accounting (market return), marketing (sales/retailing) US (NSF SBE, social behavioral economics; governments, think tanks), Europe/Asia Business school research in not science (cannot be funded by NSF in US)! Economics, finance, accounting, political science, social science, marketing, computer science (small, no funding in US!), MIS (business intelligence) Geopolitical/econ/finance/accounting models/theories, market metrics/parameters, analytical techniques, results interpretations, predicating markets EMH (efficiency market hypothesis), RWT (random walk theory), CAPM (capital asset pricing model), quant/algorithm trading

4 Research Opportunities Sophisticated econ/finance/accounting/marketing models/theories, established analytical techniques and metrics (numeric), abundant structured databases (financial metrics, economic indicators, stock quotes) New, diverse unstructured (text) web-enabled business data sources, e.g., 10K/10Q SEC reports, mass media news, local news, Internet news, financial blogs, investor forums, tweets… Topic extraction, named entity recognition, sentiment/affect analysis, multilingual language models, social network analysis, statistical machine learning, temporal data/text mining, time- series analysis…

5 Nerds on Wall Street Future technological stars…(1) Advanced electronic market tools; (2) Understanding both quantitative and qualitative information… The Text Frontier, Collective Intelligence, Social Media, and Market Monitors Stocks are stories, bonds are mathematics. David Leinweber, 2009

6 AZ BIZ INTEL: BUSINESS MASS MEDIA, SOCIAL MEDIA, TEXT ANALYTICS, SENTIMENT ANALYSIS, SPIKE DETECTION, FINANCE/ACCOUNTING/MARKETING MODELING, PREDICTING MARKET MOVEMENTS

7 $3B BI revenue in 2009 (Gartner, 2006) The Data Deluge (The Economists, March 2010); internet traffic 667 Exabytes by 2013, Cisco; Total amount of information in 2010, 1.2 Zettabyte (KB-MB-GB-TB-PB-EB-ZB- YB) $9.4B BI software M&A spending in 2010 and $14.1B by 2014 (Forrester) IBM spent $14B in BI in five years; $9B BI revenue in 2010 (USA Today, November 2010); 24 acquisitions, 10,000 BI software developers, 8,000 BI consultants, 200 BI mathematicians Acquired i2/COPLINK in 2011 Business Intelligence & Analytics

8 BI: skills, technologies, applications, and practices used to help an enterprise better understand its business and market. Technologies: data warehousing; Extraction, Transformation, and Load(ETL); Business Performance Management (BPM); visual dashboards; and advanced knowledge discovery using data and text mining BI 2.0: web intelligence, web analytics, web 2.0, social media analytics, opinion mining; cloud computing and web services; real-time monitoring and mining; enterprise performances (marketing/accounting/finance/healthcare)

9 AZ BIZ INTEL Mass media, social media contents Text & social media analytics techniques Finance/accounting/marketing models (Tetlock/Columbia, Antweiler/UBC, Das/Santa Clara) NYU (Dhar), Arizona (Dhaliwal, Kelly, Jiang, Lusch, Yong), National Taiwan U (Li, Hong, Lu) Bag of words, named entities, proper nouns, topics (1, 2-, 3- grams) Sentiment/valence, lexicons, machine learning, stakeholder analysis, EFLS analysis Time series models, spike detection, decaying function, trading windows, targeted sentiment Econometrics/regression models (R-sqr, p-value), 10-fold validation (F, accuracy), simulated trading (cost, frequency, exit)

10 AZ ONLINE WOM

11 11 AZ WOM: events, volume, sentiment

12 12 Results Evolution of online WOM through new-product lifecycle WOM communication starts early in preproduction, becomes highly active before movie release, then diminishes gradually Valence has a clear decreasing trend over time, indicating that WOM becomes more negative after movie release Subjectivity, number of sentences and number of valence words stay stable over time

13 13 ITS THE BUZZ!

14 AZ STOCK TRACKER I & II

15 Literature Review: Stock Performance Prediction Theoretical perspectives on stock behavior Efficient market hypothesis (Fama 1964) Price of a stock reflects all available information Market reacts instantaneously; impossible to outperform Random walk theory (Malkiel 1973) Price of a stock varies randomly over time Future prediction, outperforming the market is impossible Pessimistic assessments of the predictability of stock behavior refuted through empirical studies Lo and MacKinlay 1988; Jaffe et al 1989; Pesaran and Timmermann 1995 15

16 Literature Review: Stock Performance Prediction Predominant approaches to stock prediction Fundamentalists utilize fundamental and financial measures of economy, industry, and firm Economy and sector indicators, financial ratios of the firm Fama-French three factors model (Fama and French 1993) Market return, market capitalization, book to market ratio Currency exchange rates, interest rates, dividends Technicians utilize historical time-series information of the stock and market behavior Historical price, volatility, trading volume Various machine learning models applied Regression, ANN, ARIMA, support vector machines 16

17 Literature Review: Stock Performance Prediction In addition to financial and stock variables, researchers have incorporated firm-related news article measures Developed trend-based language models for news articles Lavrenko et al. 2000 Categorized press releases (good, bad, neutral) Mittermayer 2004 Examined various textual representations of news articles Schumaker and Chen, 2009a; 2009b But few have incorporated firm-related web forums Thomas and Sycara (2000) utilize text classifications of discussions on Raging Bull to inform stock trading strategies 17

18 Literature Review: Firm-Related Web Forums and Stock Studies relating web forums and stock behavior Examined firm-related web forums on major web portals Early studies focused on activity, without content analysis Supported market efficiency; only concurrent relationships identified Wysocki 1998; Tumarkin and Whitelaw 2001 Subsequently challenged; forum activity predicted stock behavior Antweiler and Frank 2002; 2004; Das and Chen 2007 Analysis advanced to measure opinions in discussions Bullishness classifiers to distinguish investment positions Antweiler and Frank 2004; Das and Chen 2007 Classified buy, hold, or sell positions with 60 – 70% accuracy Identified predictive relationships between forum discussion sentiment and subsequent stock returns, volatility, trading volume Shortcomings Retrospective analyses, shareholder perspective of major forums 18

19 AZ FinText: numbers + text Techniques: bag of words, named entities, proper nouns, past stock prices + SVR Testbed: S&P 500 5 weeks, Oct-Nov 2005, 2,809 news, 10M stock quotes, GICS industry classification Evaluation: Return, vs. Quant funds; 20-minute prediction

20 AZ FinText in the news Thursday, June 10, 2010 AI That Picks Stocks Better Than the Pros A computer science professor uses textual analysis of articles to beat the market. WSJ WSJ Technology News and Insights June 21, 2010, 1:45 PM ET Using Artificial Intelligence to Digest News, Trade Stocks

21 21 Conversation analysis AZ STOCK TRACKER I: mass, social media, topic, volume, sentiment Sentiment identification Data collectionTopic extraction Discussion topics Mutual information phrase extractor Database Spider/ Parser Sentiment grader Message sentiments Online news Web Forums Traffic dynamics Message Author Sentimen t Topic correlation and evolution Sentiment correlation and evolution Active topics and sentiments Market prediction Sentiment aggregator Topic

22 22 User-Generated Contents (UGC): Conversations of 30,000 Wal-Mart Constituents and 500,000 Responses Data sourcesDuration # of Threads # of Messages # of Users Wall Street Journal - WalMart-related News (WSJ) Aug 1999 - Mar 2007 N/A4,081657 Yahoo! Finance - WalMart Message Board (YAHOO) Jan 1999 - Jun 2008 139,062 441,954 25,500 Walmart-blows Forum - Employee Department Board (EMP) Dec 2003 - Oct 2008 7,440102,2402,930 Walmart-blows Forum - WalMart Sucks Board (WSB) Nov 2003 - Nov 2008 1,35419,6241,855 Wakeupwalmart Forum - General WalMart Discussion Board (GDB) Aug 2005 - Nov 2008 2,13623,940967

23 23 Post Dynamics

24 24 Sentiment Trend

25 25 Market Modeling CorrelationReturnVolatilityTrading Volume Return1 Volatility0.03481 Trading Volume1 Sentiment0.0338 Disagreement-0.0507-0.03578 Message Volume-0.31860.3131 Message Length0.0473-0.1840 Subjectivity Sentiment One Day Lag Disagreement One Day Lag-0.0527-0.0475 Message Volume One Day Lag-0.34330.3026 Message Length One Day Lag0.0859-0.1795 Subjectivity One Day Lag-0.0425 Correlation coefficients with p<0.10 are shown (two-tailed test) Correlation Sentiment expressed in the forum contemporaneously correlates significantly with stock return Disagreement, volume, and length expressed in the forum also hold significant correlations with volatility and trading volume

26 26 Market Predictive Results (contd) Overall Forum Market t Sentiment t-1 Disagreement t-1 Message Volume t-1 Message Length t-1 Subjectivity t-1 Return t 0.8723*** (31.33) 0.0025 (0.31) 0.0000 (0.04) -0.0007** (-2.29) 0.0002 (1.42) 0.0015 (1.46) Volatility t -0.0010 (-0.25) 0.0074 (0.47) -0.0023*** (-4.94) -0.0122*** (-19.09) 0.0030*** (7.82) 0.0149*** (7.27) Trading Volume t 0.7627*** (15.06) -0.4275** (-2.06) 0.0140** (2.29) 0.1957*** (23.18) -0.0668*** (-13.24) -0.3014*** (-11.11) Note: *p<0.10;**p<0.05;***p<0.01 Predictive regression (t-1) The significant measures of forum discussions identified in contemporaneous regressions maintain their significance in the predictive regression models Additionally, sentiment expressed in the web forum holds a significant relationship with the trading volume on the following day Positive sentiment reduces trading volume; negative sentiment induces trading activity

27 27 AZ STOCK TRACKER II: stakeholder analysis

28 Experimental Design: Description of Prediction Models VariablesDescription Dependent: RETURN t Stock return on day t (log difference of share price) Fundamental: FFSIZE FFBTM FFMARKET t-1 FFMARKET t-2 Fama-French firm size (prior year; market capitalization = share price * shares outstanding) Fama-French book-to-market ratio (prior year; book value / market value of shares) Fama-French market return on day t – 1 (log difference of S&P 500 index price) Fama-French market return on day t – 2 (log difference of S&P 500 index price) Technical: RETURN t-1 RETURN t-2 VOLATILITY t-1 VOLATILITY t-2 VOLUME t-1 VOLUME t-2 DAY d t Stock return on day t – 1 (log difference of share price) Stock return on day t – 2 (log difference of share price) Stock price volatility on day t – 1 (volatility modeled using a GARCH(1,1)) Stock price volatility on day t – 2 (volatility modeled using a GARCH(1,1)) Stock trading volume on day t – 1 (in log) Stock trading volume on day t – 2 (in log) Dummy variables for trading day of the week on day t t = days (t = 1, 2, …, n); day of the week (d = 1, …, 4) 28

29 Experimental Design: Description of Prediction Models 29 VariablesDescription Forum: MESSAGES t-1 LENGTH t-1 SENTI t-1 VARSENTI t-1 SUBJ t-1 VARSUBJ t-1 Number of messages posted in the forum on day t – 1 (in log (1 + messages)) Average length of messages posted in the forum on day t – 1 (in number of sentences) Average sentiment of messages posted in the forum on day t – 1 Variance in sentiment of messages posted in the forum on day t – 1 Average subjectivity of messages posted in the forum on day t – 1 Variance in subjectivity of messages posted in the forum on day t – 1 Stakeholder: MESSAGES s t-1 LENGTH s t-1 SENTI s t-1 VARSENTI s t-1 SUBJ s t-1 VARSUBJ s t-1 Number of messages posted by stakeholder cluster s on day t – 1 (in log (1 + messages)) Average length of messages posted by stakeholder cluster s on day t – 1 (in number of sentences) Average sentiment of messages posted by stakeholder cluster s on day t – 1 Variance in sentiment of messages posted by stakeholder cluster s on day t – 1 Average subjectivity of messages posted by stakeholder cluster s on day t – 1 Variance in subjectivity of messages posted by stakeholder cluster s on day t – 1 t = days (t = 1, 2, …, n); stakeholder clusters (s = 1, 2, …, c)

30 Experimental Design: Description of Prediction Models Baseline Model – Baseline-FF Fundamental variables: Fama-French model Baseline Model – Baseline-Tech Technical variables: Lagged stock returns, volatility, trading volume, day-of-week dummies Baseline Model – Baseline-Comp Comprehensive: all fundamental and technical variables 30 Where t = days (t = 1, 2, …, n); day of the week (d = 1, …, 4)

31 Experimental Design: Description of Prediction Models Forum models Comprehensive baseline variables plus forum-level measures 31 Where t = days (t = 1, 2, …, n); day of the week (d = 1, …, 4); stakeholder clusters (s = 1, 2, …, c)

32 Experimental Design: Description of Prediction Models Stakeholder models Comprehensive baseline variables plus stakeholder group- level forum measures 32 Where t = days (t = 1, 2, …, n); day of the week (d = 1, …, 4); stakeholder clusters (s = 1, 2, …, c); index k = (((c - 1) * 6) + 15)

33 Experimental Design: Social Media Data A 17 month period was utilized for analysis and experimentation November 1, 2005 to March 31, 2007 First five months were utilized to calibrate the initial stock return prediction models November1, 2005 – March 31, 2006 Calibrated models applied for prediction during each trading day in the next month Each subsequent month, new models were calibrated using five previous months of time-series variables, for stock return prediction during the next month of trading In total, stock return prediction was performed daily for one year (250 trading days) April 1, 2006 – March 31, 2007 ForumMessages Discussion Threads Stakeholders Messages per Thread Messages per Stakeholder Yahoo Finance – WMT (finance.yahoo.com) 134,20140,6335,5333.3024.25 Wal-Mart Blows (www.walmartblows.com) 55,1253,6901,46114.9437.73 Wakeup Wal-Mart (www.wakeupwalmart.com) 10,7971,3069158.2711.80 33

34 Results and Discussion HypothesisResult H1.1 Baseline-Comp model > Baseline-FF modelPartially supported H1.2 Baseline-Comp model > Baseline-Tech modelRejected H2 Forum-level models > best baseline modelsRejected H3.1 Stakeholder-level models > best baseline models Supported H3.2 Stakeholder-level models > forum-level modelsPartially supported H4.1 Social network > discussion content representationPartially supported H4.2 Writing style > discussion content representationRejected H4.3 Social network > writing style representationPartially supported H5.1 ANN > OLSRejected H5.2 SVR > OLSPartially supported H5.3 SVR > ANNPartially supported 34 Hypothesis testing results

35 Results and Discussion Wal-Mart stock return prediction model results Baseline models using fundamental and technical variables Results across 250 trading days forecasted Baselines for simulated trading (initial investment of $10,000): Holding Wal-Mart stock for the year results in $10,096 Holding S&P 500 for the year results in $11,012 ModelOLS $OLS AccuracyANN $ANN AccuracySVR $SVR Accuracy Baseline-FF$ 9,78755.20%$ 9,99844.40%$ 9,40851.20% Baseline-Tech$ 8,79957.20%$ 9,70257.60%$ 9,50356.40% Baseline-Comp$ 10,76354.40%$ 10,41856.80%$ 10,64556.80% 35

36 Results and Discussion Wal-Mart stock return prediction model results Incorporating the Wakeup Wal-Mart web forum Results across 250 trading days forecasted ModelOLS $OLS AccuracyANN $ANN AccuracySVR $SVR Accuracy Best Baseline$ 10,76357.20%$ 10,41857.60%$ 10,64556.80% Forum$ 10,36757.60%$ 10,39759.20%$ 10,30359.20% Stakeholder-SN$ 9,87355.20%$ 10,93057.20%$ 10,66959.20% Stakeholder -Content$ 10,68960.40%$ 11,59560.40%$ 11,97661.20% * Stakeholder -Style$ 10,27156.00%$ 9,65356.80%$ 9,30556.00% Stakeholder-SN+Content$ 10,38461.60%$ 13,06660.80%$ 11,86662.80% ** Stakeholder-SN+Style$ 10,74460.00%$ 10,79260.40%$ 11,24957.60% Stakeholder-Content+Style$ 10,69659.20%$ 10,59056.40%$ 10,60358.80% Stakeholder-SN+Content+Style$ 10,97658.00%$ 10,77856.40%$ 10,88159.60% 36 Pair-wise t-test; improvement over best baseline model at * p < 0.10 ** p < 0.05

37 AZ STOCK TRACKER III

38 Introduction Forward-looking statements (FLS) refer to Projections, forecasts, or other predictive statements Made by firm management Section 21E of the Securities Exchange Act (1934) Extended forward-looking statements (EFLS) Statements that may have implications for a firms future development Similar to FLS, but broader Including information from information intermediaries (e.g., newspapers, newswires) and individuals (e.g., blogs) 38

39 Recognizing EFLS EFLS: Extends FLS to include statements about firms future performance from other sources such as financial press, analysts reports, and individuals 39 GoalRecognition TaskDefinition EFLS RecognitionFuture Timing (FT)Primary content is about future events or states Explicit Uncertainty (EU) Explicit accounts of doubt or unreliability Overall Assessment (ALL) Affect decision makers belief about a firms future cash flow EFLS SentimentPositive (POS)Positive impact on the belief Negative (NEG)Negative impact on the belief

40 40 AZ STOCK TRACKER III: EFLS

41 Summary of Annotation Results AgreementCohens Kappa ALL0.91 (0.88, 0.93) 0.81 (0.76, 0.86) POS0.90 (0.88, 0.93) 0.79 (0.73, 0.85) NEG0.89 (0.86, 0.91) 0.77 (0.71, 0.82) 41 Note: (95% CI) from 1,000 Bootstrappings High kappa values (>0.7) on risks supports the coding scheme being empirically valid Agreement upper bound 89% to 91% (for ALL, POS, and NEG) CategoryCountPercent ALL115746% POS83633% NEG90436% Reference Standard Dataset: –2539 sentences in total

42 Experiment 1: Sentence-Level Evaluation ModelAccuracy F-Measure Recall Precision LASSO67.1%66.5%83.8%55.1% ENET7569.3%68.0%87.7%55.6% ENET5068.9%68.7%90.5%55.4% ENET2569.4%68.9%91.2%55.4% SVM69.5%70.2%83.9%60.3% SVM w/IG69.1%68.9%84.3%58.3% FKC64.7%50.9%69.7%40.1% OF_PN54.8%27.9%19.1%51.4% 42

43 EFLS Impacts: Hypotheses Development 43 Private Signals Public Signals

44 Hypotheses Development (Contd.) Hypothesis 1: Firms with lower EFLS intensity are associated with higher expected return. 44

45 Hypotheses Development (Contd.) 45

46 Control Variables 46 VariableDefinition Number of news articles mentioning firm i in month t. Logarithm of market value, computed using the closing market price of month t-1. Logarithm of book-to-market ratio, computed following Fama and French (1993).1993 Log(Dollar trading volume of firm i in month t) Log(variance); variance of firm i in month t is computed using daily stock returns. Proportion of individual ownership of stock i, using the latest available data, computed by aggregating 13f filings (Fang and Peress 2009).Fang and Peress 2009 Log(1+number of analysts covering firm i in month t). Log(1+standard deviation of analysts earnings predictions).

47 Firm-Level Performance Evaluation (Contd.) Empirical Model 1: Empirical Model 2: 47 Hypothesis 1 Predicts Negative b1 Hypothesis 2 Predicts b1 0

48 Experiment Two: Firm-Level Evaluation Research Testbed: January 1986 to May 2008, 1,134,321 Wall Street Journal news articles Merged with CRSP, Compustat, and IBES Stock prices lower than $5 at the end of a month were removed (Cohen and Frazzini 2008; Fang and Peress 2009) 1,274,711 firm-months, spanning 269 months 48

49 Expected Return and EFLS Intensity Variable Value Variable Value Variable Value -0.0026 * -0.0052 ** -0.0039 Control Variables 0.00069 *** 0.00068 *** 0.00067 *** -0.00081-0.0012-0.0015 -0.0019 ** -0.0019 *** 0.0025 *** -0.046 *** 0.00042 Intercept 0.039 *** Intercept 0.039 *** Intercept 0.039 *** 0.0031 49 ***, **, * indicate statistical significance at the 0.01, 0.05, and 0.1 levels, respectively.

50 Volatility and EFLS Intensity 50 VariableValueVariableValueVariableValue -0.074 *** -0.196 *** -0.254 *** Control Variables 0.012 *** -0.105 *** -0.103 *** -0.110 *** 0.108 *** 0.565 *** -0.222 *** -0.066 *** -0.615 *** -0.616 *** 0.071 *** 0.016 *** 0.017 *** 0.095 *** Intercept -1.568 *** Intercept -1.566 *** Intercept -1.566 *** 0.57 ***, **, * indicate statistical significance at the 0.01, 0.05, and 0.1 levels, respectively.

51 Take-Away and WIP (20%) Mass and social media texts provide additional signals for market prediction (in addition to numbers) Message volume important; aggregate sentiment may not (EMH) Business sentiment processing difficult; may require additional content pre-processing (stakeholder; EFLS) Predicting return hard; predicting volatility easier (VIX Chicago Board) Large-scale stock news tracking and text analytics can be automated Trading windows; decay function; targeted sentiment; extensive trading periods (up/down); industry and news category (oil/banking); firm & index size (Russell/NYSE); emerging markets (China) All the firms (10K), all the news (1M each), all the time ??? Trading strategy ??? 51

52 52 Predefined Data Sources Data Sources for US Public Companies SEC/EdgarNYSE.com NASDAQ.com Finance.Yahoo.com Company Information Database Ticker CUSIP CIK PERMNO Company Keywords Company Name Dynamic Data Sources BlogsNews Search Engines WSJTwitter BasicInformation Yahoo Finance Forums Company Websites Stock Exchange 10K Report Data Collection DataProcessing Transformation/Integration Topics & Sentiments Time Series / Burst Risk Model SNA Data Analysis Analytic Approaches Performance Indicators Cross Media Analysis Single Media Analysis Predictive Analysis AZ BIZ INTEL System Design Visualization Static Figures/Dashboards Interactive Applications Simulated Trading

53 Hsinchun Chen, Ph.D. Artificial Intelligence Lab, University of Arizona hchen@eller.arizona.edu http://ai.arizona.edu


Download ppt "Predicting Market Movements: From Breaking News to Emerging Social Media Dr. Hsinchun Chen Director, Artificial Intelligence Lab University of Arizona."

Similar presentations


Ads by Google