Presentation is loading. Please wait.

Presentation is loading. Please wait.

Applications of news analytics in finance: a review Gautam Mitra Co-author Leela Mitra.

Similar presentations


Presentation on theme: "Applications of news analytics in finance: a review Gautam Mitra Co-author Leela Mitra."— Presentation transcript:

1 Applications of news analytics in finance: a review Gautam Mitra Co-author Leela Mitra

2 Summary and scope In this talk we set out a structured (reading) guide to the published research outputs: Journal papers, white papers, case studies which are emerging in the domain of “news analytics” applied to finance. We aim to provide insight into the subtle interplay of information technology (including AI), the quantitative models and behavioural biases in the context of trading and investment decisions. Applications such as low frequency and high frequency trading are presented; some desirable/potential applications are discussed.

3 Outline Introduction News data Data sources Pre analysis of data Determining sentiment scores General overview Das and Chen Lo Models and applications in summary form (abnormal ) Returns Volatility and risk control Desirable industry applications Summary and discussions

4 News. Market Environment. Sentiment. Investment Decisions. Risk Control. Introduction

5 Traders [ High Frequency ] Fund Managers [ Low Frequency ] Desktop Market Data NewsWire Data WareHouse DataMart Introduction

6 R & D Challenge  Identify Killer Application Smart investors rapidly analyse/digest information.  News stories/announcements.  Stock price moves (market reactions).  Act promptly to take trading/investment decisions. Can a machine act intelligently(AI) to compete or outsmart humans ? Introduction

7 At least can we have IT/AI tools which help humans make good investment decisions? Intelligence Amplification Thus three disciplines converge;  Information Systems  AI, in particular, Natural Language Processing  Financial Engineering/quantitative Modelling ( including behavioural finance ) Introduction

8 Introduction Data  analysis  Datamart  quant models Mainstream News Pre-News Web 2.0 Social Media Pre-Analysis Classifiers Sentiment Scores (Numeric) financial market data AnalysisConsolidated Datamart Updated beliefs, Ex-ante view of market environment Quant Models 1.Return Predictions 2.Fund Management / Trading Decisions 3.Volatility estimates and risk control

9 Outline Introduction News data Data sources Pre analysis of data Determining sentiment scores General overview Das and Chen Lo Models and applications in summary form (abnormal) Returns Volatility and risk control Desirable industry applications Summary and discussions

10 News data: Data sources Sources of news/informational flows (Leinweber) News: Mainstream media, reputable sources. Newswires to traders desks. Newspapers, radio and TV. Pre-News: Source data SEC reports and filings. Government agency reports. Scheduled announcements, macro economic news, industry stats, company earnings reports… Social media: Blogs, websites and message boards Quality can vary significantly Barriers to entry low Human behaviour and agendas

11 News data: Data sources Web based news Individual investors pay more attention than institutional investors (Das and Rieger) “Collective Intelligence ” large group of people (no ulterior motives) their collective opinion may be useful. SEC does monitor message boards Far from perfect vetting of information. Financial news can be split between Scheduled news (Synchronous) Unscheduled news (Asynchronous, event driven)

12 News data: Data sources Scheduled news (Synchronous) Arrives at pre scheduled times Much of pre news Structured format Often basic numerical format Typically macro economic announcements and earnings announcements

13 News data: Data sources Macro economic announcements Widely used in automated trading Impact large and most liquid markets (foreign exchange, Govt. debt, futures markets) Naturally affects trading strategies. Speed and accuracy are key... technology requirements substantial Providers in this space Trade the News, Need to Know News, Market News International, Thomson Reuters, Dow Jones, Bloomberg… Earnings announcements Directly influences stock prices’ Widely anticipated and used in trading strategies

14 News data: Data sources Unscheduled news (Asynchronous, event driven) Arrives unexpectedly over time Mainstream news and social media Unstructured, qualitative, textual form Non-numeric Difficult to process quickly and quantitatively May contain information about effect and cause of an event To be applied in quant models needs to be converted to an input time series

15 Outline Introduction News data Data sources Pre analysis of data Determining sentiment scores General overview Das and Chen Lo Models and applications in summary form (abnormal) Returns Volatility and risk control Desirable industry applications Summary and discussions

16 News data: Pre analysis of data Collecting, cleaning and analysing news data … challenging Major newswire providers collect news from a wide range of sources e.g. Factiva database from Dow Jones, news from 400 sources Tagging – Machine readable meta data Major newswire providers tag incoming news stories Reporters tag stories as they enter them to system Machine learning techniques also used to identify relevant tags (RavenPack) Unstructured stories into basic machine readable form Tags held in XML Reveals story’s topic areas and other useful meta data

17 News data: Pre analysis of data Need to identify news which is relevant and current “Information events” distinguish stories reporting on old news from genuinely “new” news Tetlock et al. event study shows “information leakage”

18 News data: Pre analysis of data Need to identify news which is relevant and current Reuters give for each article Relevance scores … measures by how much the article is about a particular company Novelty/uniqueness determines the repetition among articles RavenPack Distinguish stories which are events  Carry first mention of a particular theme Stories which are not events are excluded  To minimise number of duplicate stories

19 News data: Pre analysis of data Classification of news Tagged stories provide hundreds of event types Need to distinguish what types of news are relevant to our application Market may react differently to different types of news e.g. Moniz et. al. find market reacts more strongly to earnings news than strategic news Different news is available for different assets Larger companies with more liquid stock, tend to have higher news coverage

20 News data: Pre analysis of data Classification of news Accounting related news Earnings  Announcements of earnings  Restatements of Operating Results etc.. Trading updates  Announcements of Sales/Trading Statement etc… Strategic news M&A Related  M&A Rumours and discussion  M&A Transaction announcements etc… Restructuring issues etc…

21 News data: Pre analysis of data Relationship of different news items / Independence of news… important consideration Seasonality of news (Hafez, Lo, Moniz) Need to be able to identify unexpected newsflow from variation due to seasonality Hourly, daily and weekly seasonality Intraday - larger volumes of newsflow just before opening of European, US and Asian stockmarkets (Hafez)

22 News data: Pre analysis of data Illustration of Seasonality (Hafez, RavenPack)

23 Outline Introduction News data Data sources Pre analysis of data Determining sentiment scores General overview Das and Chen Lo Models and applications in summary form (abnormal) Returns Volatility and risk control Desirable industry applications Summary and discussions

24 Determining sentiment scores Informational content of news: Converting qualitative data into a quantitative form … challenging Distinguish the sentiment of stories (positive/negative) scale of positivity / negativity … sentiment scores Consider the story’s context and language  How positively/negatively human interprets story… emotive content Expert classification Psychosocial dictionaries e.g. General Inquirer Different groups of people effected by events differently or have different interpretations of same events …conflicts may arise

25 Determining sentiment scores Market based measures (Lo, Moniz et. al. and Lavernko) Markets’ lagged relative change in returns/volatility for a particular asset (asset class) Machine learning and natural language techniques can be used, to determine sentiment of incoming stories … sentiment indices over time Index validation - To use index we must be able to find relationship with relevant market variables

26 Outline Introduction News data Data sources Pre analysis of data Determining sentiment scores General overview Das and Chen Lo Models and applications in summary form (abnormal) Returns Volatility and risk control Desirable industry applications Summary and discussions

27 Das and Chen extract investor sentiment from stock message boards for Morgan Stanley High Tech (MSH) Index Web scraper program downloads tech sector message board messages Five algorithms with different conceptual underpinnings are used to classify each message Voting scheme is then applied

28 Das and Chen Three supplementary databases Dictionary – nature of the word, noun adjective, adverb. Lexicon - collection of hand picked words which form variables for statistical inference within the algorithms Grammar – training corpus of base messages used in determining in-sample statistical information. Applied for use on the out-of-sample messages Lexicon and grammar jointly determine the context of the sentiment

29 Das and Chen Five algorithms : (=Classifiers) 1. Naïve classifier Based on word count of positive and negative connotation words 2.Vector distance classifier Each of the D words in the lexicon is assigned a dimension in vector space Each training message is pre classified as positive, negative or neutral Each new message is classified by comparison to the cluster of pre trained vectors and is assigned the same classification as that vector with which it has the smallest angle

30 Das and Chen 3. Discriminant based classifier NC weights all words within the lexicon equally. The discriminant based classification method replaces this simple word count with a weighted word count. The weights determine how well a particular lexicon word discriminates between the different message categories 4.Adjective-adverb phrase classifier This is based on the assumption that phrases which use adjectives and adverbs emphasize sentiment and require greater weight. Uses a word count but uses only those words within phrases containing adjectives and adverbs.

31 Das and Chen 5.Bayesian classifier Given the class of each message in the training set we can determine the frequency with which a lexical word appears in a particular class. For a new message we are able to compute the probability it falls within a particular class given its component lexicon words The message is classified as being from the category with the highest probability. Voting scheme … final classification based on achieving majority amongst classifiers Reduces number of messages classified Enhances classification accuracy

32 Das and Chen Ambiguity - stock message boards messages often highly ambiguous Use General Inquirer … determine optimism score Filter in and consider only most highly optimistic stories in positive category Filter in and consider only the most highly pessimistic scores in the negative category Number of false positive in classification declines Disagreement – 0 no disagreement; 1 high disagreement

33 Das and Chen Relationship between sentiment indices and market variables ? Nature of sentiment index? Positive sentiment bias Fig shows histogram of normalised sentiment for a stock…positively skewed RavenPack find positive bias in classifiers … more marked in bull markets

34 Das and Chen Relationship between sentiment indices and market variables Sentiment and stock levels – are related …determining precise nature of price relationship is difficult Sentiment inversely related to disagreement Disgreement rises, sentiment falls Sentiment correlated to posting volume Discussion increases, indicates optimism about stock is rising Strong relationship between message volume and volatility (Antweiler and Frank (2004) also) Strong relationship between trading volume and volatility

35 Outline Introduction News data Data sources Pre analysis of data Determining sentiment scores General overview Das and Chen Lo Models and applications in summary form (abnormal) Returns Volatility and risk control Desirable industry applications Summary and discussions

36 Lo Reuters NewsScope Event Indices (NEI) are constructed to have predictive power for returns and realised volatility integrated framework, returns and volatility used in calibrating indices News data Reuters newsalerts -quick news flashes issued when newsworthy events occur – timely and relevant Tags machine readable Headlines concise, small vocabulary…g ood for machine learning analysis

37 Lo The following parameters are used List of keywords and phrases with real valued weights A rolling “sentiment window” of size r (say 5/10 minutes) A rolling calibration window of size R (say 90 days) is the vector of keyword frequencies over Raw score is defined as this will tend to be high when news volume is high …normalised score

38 Lo Normalised score At all times t in R days of calibration window record raw score news volume; Normalised score determined by comparing current raw score against raw scores where news volume equals current news volume S t =0.92: 92 % of time news volume is at current level, the raw score is less than it currently is.

39 Lo Model calibration Determine keywords Create list of keywords by hand Tool to extract news from periods when scores are high… determine whether keywords are legitimate or need adjusting Optimal weights for intraday return sentiment index regress word frequencies against intraday returns Optimal weights for intraday volatility sentiment index regress word frequencies against (deseasonalised) intraday realised volatility

40 Lo Model calibration Determining optimal weights more general classification problem Other techniques…machine learning…perceptron algorithm, support vector machines…

41 Lo Index validation – to establish empirical significance of indices… event study analysis Event is defined when (return/volatility sentiment) index exceeds a threshold value (0.995) Remove events that follow in less than one hour of another event … consider only “new” events Tests null hypothesis: Distribution of returns / deseasonalised realised volatility is the same before / after an event. Visual inspection t –test for equality of means Levene’s test for change in standard deviation Chi – squared goodness of fit

42 Lo Index validation – to establish empirical significance of indices… event study analysis

43 Lo

44 RavenPack Sentiment Scores

45 Reuters NewsScope Sentiment Engine

46 Outline Introduction News data Data sources Pre analysis of data Determining sentiment scores General overview Das and Chen Lo Models and applications in summary form (abnormal) Returns Volatility and risk control Desirable industry applications Summary and discussions

47 Average Stock Price Reaction to Negative News Events Source: Macquarie Quant Research –May 2009 Model & Applications… (abnormal ) Returns Model & Applications… (abnormal ) Returns

48 Average Stock Price Reaction to Positive News Events Source: Macquarie Quant Research –May 2009 Model & Applications… (abnormal ) Returns Model & Applications… (abnormal ) Returns

49 Traders and quant managers … identify and exploit asset mispricings before they correct … generate alpha News data can be used Stock picking and generating trading signal Factor models Exploit behavioural biases in investor decisions

50 Model & Applications… (abnormal ) Returns Model & Applications… (abnormal ) Returns Stock picking and generating trading signal Li (2006) simple ranking procedure … identify stocks with positive and negative sentiment 10 K SEC filings for non-financial firms 1994 – 2005 Risk sentiment measure – count number of times words risk, risks, risky, uncertain, uncertainty and uncertainties appear in management discussion and analysis section Strategy long in low risk sentiment stocks  short in high risk sentiment stocks  … reasonable level returns Leinweber (2010) – event studies based on Reuters NewsScope Sentiment Engine

51 Model & Applications… (abnormal ) Returns Model & Applications… (abnormal ) Returns Factor models CAPM (Sharpe 1964; Lintner 1965), APT (Ross 1976) …additional sources of information to market “Profits may be viewed as the economic rents which accrue to [the] competitive advantage of … superior information, superior technology, financial innovation” (Lo ) Tetlock, Saar-Tsechansky and Mackassy (2008) Investors’ perception … determined from… their “information sets”

52 Model & Applications… (abnormal ) Returns Factor models “Information sets” 1. analysts forecasts, 2. quantifiable publicly disclosed accounting variables 3. linguistic descriptions of firm’s current and future profit generating activities If 1. and 2. are incomplete or biased, 3. may give relevant information MacQuarie Report Cahan et. al., News sentiment data in a multifactor models. Results are positive … such an approach does add value. In particular they note the value of this source of information during the credit crisis, when determining fundamentals (which traditional quant factors are based on) was problematic.

53 Model & Applications… (abnormal ) Returns Behavioural biases Behavioural economists challenge the assumption that markets act rationally … EMH  AMH ( Lo ) Propose individuals display certain biased behaviour Due to biases they systematically deviate from optimal (rational) trading behaviour Use behavioural biases to explain (abnormal) returns, rather than risk based explanations.

54 Model & Applications… (abnormal ) Returns Behavioural biases Odean and Barber (2007) find evidence individual investors have a tendency to buy attention grabbing stocks. Professional investors better equipped to assess a wider range of stocks they are less prone to buying attention grabbing stocks Da, Engleberg and Gao also consider how the amount of attention a stock received affects its cross-section of returns. Use the frequency of Google searches for a particular company as a measure of attention. Find some evidence that changes in investor attention can predict the cross-section of returns.

55 Model & Applications… (abnormal ) Returns Behavioural biases Chan (2003) finds stocks with major public news exhibit momentum over the following month. In contrast stocks with large price movements, but an absence of news, tend to show return reversals in the following month. This would support a trading strategy based on momentum reinforced with news signals. Moniz et. al. (2009) finds a strategy based on earnings momentum reinforced by newsflow is effective.

56 Outline Introduction News data Data sources Pre analysis of data Determining sentiment scores General overview Das and Chen Lo Models and applications in summary form (abnormal) Returns Volatility and risk control Desirable industry applications Summary and discussions

57 Applications: Risk management Traditionally historic asset price data has been used to estimate risk measures. ex post retrospective measures fail to account for developments in the market environment, investor sentiment and knowledge Significant changes in the market environment Traditional measures can fail to capture the true level of risk (Mitra, Mitra and diBartolomeo 2009; diBartolomeo and Warrick 2005) Incorporating measures or observations of the market environment in risk estimation is important

58 Applications: Risk management The risk structure of assets may change over time Patton and Verardo find news impacts beta of stocks and in particular most of beta increase comes from rising covariance, suggesting there is contagion in information content of news releases.

59 Applications: Risk management Relationship between information release and volatility widely reported Ederington and Lee (1993) macro economic announcements and foreign exchange and interest rate futures Stock message board activity is a good predictor of volatility Antweiler and Frank (2004); Wysocki (1999) GARCH model with news inputs Kalev et al. (2004); Robertson, Geva and Wolff (2007)

60 Outline Introduction News data Data sources Pre analysis of data Determining sentiment scores General overview Das and Chen Lo Models and applications in summary form (abnormal) Returns Volatility and risk control Desirable industry applications Summary and discussions

61 Desirable Industry Applications 1. Enhanced Strategies ( Asset Management) Low Frequency  Portfolio (rebalancing) early trigger based on “draw down” rules/risk. High Frequency Trading “wish to” trade signals. Trading “have to/need to trade sell and buy” signals. News analytics market views taken into consideration for the “optimal trade execution” algorithms. { VWAP, Almgren & Chriss, Lo & Bertsimas }

62 2. Risk Control and Compliance. improved short term risk estimate. Enhanced downside risk estimate; (improving scenario generators by using sentiment scores). ??? Wolf Detection; Signal to stop trading in a specific stock/asset. Desirable Industry Applications

63 3. Post trade analysis (reporting). 4. Refine fundamental research ( results /figures) 5. Use by regulator/public body (government treasuries) to take a prior view of the “impact” of (economic and other) announcements Desirable Industry Applications

64 Outline Introduction News data Data sources Pre analysis of data Determining sentiment scores General overview Das and Chen Lo Models and applications in summary form (abnormal) Returns Volatility and risk control Desirable industry applications Summary and discussions

65 Summary & discussions Applications of (semi-)automated news analytics in finance are growing in importance. Pay back can be substantial to: Investment Managers Traders Internal Risk Auditors Regulators

66 Knowledge and Skills from three different disciplines: Information Systems. Artificial Intelligence. Financial Engineering & quantitative modelling (including behavioural finance). are required in various degrees to progress the field/make substantial impact. Summary & discussions

67 THANK YOU FOR YOUR ATTENTION …ANY QUESTIONS…?


Download ppt "Applications of news analytics in finance: a review Gautam Mitra Co-author Leela Mitra."

Similar presentations


Ads by Google