Presentation is loading. Please wait.

Presentation is loading. Please wait.

Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè Observatory.

Similar presentations


Presentation on theme: "Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè Observatory."— Presentation transcript:

1 Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè http://lagash.dft.unipa.it Observatory of Complex Systems Dipartimento di Fisica e Tecnologie Relative Università degli Studi di Palermo GIACS Conference “Data in Complex Systems” - Palermo, 7-9 April 2008

2 Overview of Databases Observatory of Complex Systems S. Miccichè F. Lillo R. N. Mantegna M. Tumminello G. Vaglica C. Coronnello Econophysics Bioinformatics Stochastic Processes Econophysics Bioinformatics Stochastic Processes M. Spanò

3 We will present an overview of some widely investigated financial and economic databases. Most financial databases include data about transaction prices, bid and ask quotes, volume of transactions. In some financial databases the information about the coded identity of the market members acting on the order book is also available. The economic databases we will discuss contain financial and economic information on over ten millions public and private companies operating in Europe and USA. Overview of Databases What do we do with them?

4 Why Physicists are interested in Financial Markets Financial market can be considered as model complex systems Many Agents/Factors interactions are not always clear/known (NO equations, Hamiltonians ?) G. Parisi cond-mat/0205297 Complex Systems: a Physicist's ViewPoint: “A system is complex if its behaviour crucially depends on the details of the system” Econophysics Econophysics is a recently established discipline whose main aim is that of modeling some of the stylized facts empirically observed in the study of financial markets. Overview of Databases: financial databases

5 Methods of Statistical Physics can be applied : Stochastic Processes (Brownian motion, superdiffusivity, power- law tails, long-range correlation,...) scaling Network theory, clustering techniques, random matrix,... Agent-based models,...... Last but not least: There is a huge amount of data! 1995: 1 CD per month 2003: 12-13 CD per month Overview of Databases: financial databases

6 FINANCIAL databases: TAQ, Euronext, BI, TSE LSE, BME MTS Overview of Databases

7 Trade and Quote (NYSE) - 19956.3 Gb - 1996 8.1 Gb - 199713.5 Gb - 199820.0 Gb - 1999 27.1 Gb - 200063.1 Gb - 2001approx 110. Gb - 2002approx 180 Gb - 2003approx 215 Gb Rebuild Order Book - LSE - 2002 19.5 Gb (now also 2004, 2005, 2006) OPEN BOOK - NYSE - 2002 approx 110 Gb Tokio (TSE) - 2002 trades 1.6 Gb. EURONEXT - 20026.7 Gb. MTS - 4/2003-3/2004 4.0 Gb. MILANO (BI) - 2002 trades 2.14 Gb. - 2002 best quotes 2.43 Gb. Overview of Databases: financial databases  1 Tb Size

8 Transaction prices Quotes Overview of Databases: financial databases

9 Given a price S(t) at time t, the price return r(t) is: ARBARBIITRAGETRAGEARBARBIITRAGETRAGEI To start with Overview of Databases: financial databases – transaction prices - synchronized

10 Multivariate description COMOVEMENTS  t=op-cl, 1995-2003 Overview of Databases: financial databases – transaction prices - synchronized

11 We are looking for a possible collective stochastic dynamics and/or links between price returns / volatilities of different stocks. PRICE RETURNS CLUSTERS Cross-Correlation Clustering Procedure based on a similarity measure: where r i are the price returns time series. distance  subdominant ultrametric distance.  Hierarchical Tree (HT) and Minimum Spanning Tree (MST). Multivariate description At any  t Overview of Databases: financial databases – transaction prices - synchronized

12 Multivariate description Compare the dynamics of price returns of stocks traded at different exchanges - industry sector identification at different time horizon - sector dynamics - LSE and NYSE - are there common (stylized) facts ? Compare the dynamics of price returns of stocks traded at different exchanges - industry sector identification at different time horizon - sector dynamics - LSE and NYSE - are there common (stylized) facts ? Single Linkage Clustering Analysis MST construction (N-1) At each step,when two elements or one element and a cluster or two clusters p and q merge in a wider single cluster t, the distance d tr between the new cluster t and any cluster r is recursively given by: d tr =min { d pr,d qr } i.e. the distance between any element of cluster t and any element of cluster r is the shortest distance between any two entities in clusters t and r. Planar Maximally Filtered Graph (3N-2) Overview of Databases: financial databases – transaction prices - synchronized

13 Sinchronized data We consider: NYSE - the 100 most capitalized stocks in 2002. LSE - the 92 most traded stocks in 2002. intraday synchronizehomogenize We consider high-frequency (intraday) data. Transactions do not occur at the same time for all stocks. We have to synchronize/homogenize the data: NYSE: 5 min, 15 min, 30 min, 65 min, 195 min, 1 day trading time 6 h 30’ LSE: 5 min, 15 min, 51 min, 102 min, 255 min, 1 daytrading time 8 h 30’ NYSE: 5 min, 15 min, 30 min, 65 min, 195 min, 1 day trading time 6 h 30’ LSE: 5 min, 15 min, 51 min, 102 min, 255 min, 1 day trading time 8 h 30’ TAQTAQ1995-2003 Trades And Quotes (TAQ) database maintained by NYSE (1995-2003) R O BROB2002 Rebuild Order Book (ROB) database maintained by LSE (2002) Overview of Databases: financial databases – transaction prices - synchronized

14 The set of investigated stocks NYSE 100 stocks Financial 04 Consumer non-Cyclical 11 Services 12 Conglomerates 4 NYSE 100 stocks 01 Technology 8 02 Financial 24 03 Energy 3 04 Consumer non-Cyclical 11 05 Consumer Cyclical 2 06 Healthcare 12 07 Basic Materials 6 08 Services 20 09 Utilities 2 10 Capital Goods 6 11 Transportation 2 12 Conglomerates 4 LSE 92 stocks Financial 04 Consumer non-Cyclical 12 Services 12 Conglomerates 0 LSE 92 stocks 01 Technology 4 02 Financial 20 03 Energy 3 04 Consumer non-Cyclical 12 05 Consumer Cyclical 10 06 Healthcare 6 07 Basic Materials 5 08 Services 19 09 Utilities 6 10 Capital Goods 5 11 Transportation 2 12 Conglomerates 0

15 Daily data: SLCA – hierarchy & topology day NYSE day day LSE day High level of correlation Overview of Databases: financial databases – transaction prices - synchronized

16 Daily data: PMFG day NYSE day day LSE day Overview of Databases: financial databases – transaction prices - synchronized

17 5-min data: SLCA – hierarchy & topology 5-min LSE 5-min 5-min NYSE 5-min FINANCIAL 04 out of 20 SERVICES 02 out of 19 Overview of Databases: financial databases – transaction prices - synchronized

18 5-minute data: PMFG 5-min LSE 5-min 5-min NYSE 5-min Overview of Databases: financial databases – transaction prices - synchronized

19 Conclusions The system is more hierarchically/topologically structured at daily time horizons conferming that the market needs a finite amount of time to assess the correct degree of cross correlation between pairs of stocks. The system is more hierarchically/topologically structured at daily time horizons conferming that the market needs a finite amount of time to assess the correct degree of cross correlation between pairs of stocks. Financial and Energy seem to be structured even at a low time horizon (LSE more than NYSE). Financial and Energy seem to be structured even at a low time horizon (LSE more than NYSE). Overview of Databases: financial databases – transaction prices - synchronized overnight

20 A possible use of tick-by-tick data Overview of Databases: financial databases – transaction prices – tick-by-thick The “extreme events” we consider will be related with the first crossing of any of the two barriers. The Mean Exit Time (MET) is simply the expected value of the time interval Financial Interest Financial Interest: the MET provides a timescale for market movements. dashed black=original data magenta magenta = shuffle returns only GE stock 2L

21 A possible use of tick-by-tick data Overview of Databases: financial databases – transaction prices – tick-by-thick QUOTES Time between consecutive quotes

22 Another database: MTS Overview of Databases: financial databases – bonds These are data of bonds traded in the European markets and managed by the MTS Group firm, which is based in Italy. The bonds we have considered are those continuously traded In Italy in the whole year from April 2003 to March 2004.

23 The state of the complete order book can be visualized at any period of time by using a schematic representation Order book data allows to follow the details of price formation in a financial market Order book data Overview of Databases: financial databases – order book data

24 The real behavior in a short time for a normal stock - sell limit orders - buy limit orders ○ sell market orders x buy market orders time (s) pricex100 Overview of Databases: financial databases – order book data Order book data: time evolution

25 Representation of the order book focusing on the time dependence of order flow (the plot refers to a stock traded at London Stock Exchange) Overview of Databases: financial databases – order book data Order book data: time evolution

26 A very special day (20 Sept 2002) Overview of Databases: financial databases – order book data Order book data: time evolution

27 (Coded) Identity Overview of Databases: financial databases

28 Tick-by-tick data, volume and identity Overview of Databases: financial databases – order book data In the LSE and BME databases the information about the coded identity of the market members (brokerages) acting on the order book is also available For LSE we have got these data under a special confidentiality agreement : e.g. people who uses these data MUST be traceable! For BME the identity is transparent in the market.

29 Inventory variation  Inventory variation : the value (i.e. price times volume) of an asset exchanged as a buyer minus the value exchanged as a seller in a given time interval . price (2001-2004) volume sign +1 for buys -1 for sells In this talk, we focus on  = 1 trading day Overview of Databases: financial databases – order book data Tick-by-tick data, volume and identity i=1, …, 69 (BBVA) most active BBVA, TEF, SAN, REP

30 Inventory variation correlation matrix  obtained by sorting the firms in the rows and columns according to their correlation  of inventory variation with price return BBVA 2003 Overview of Databases: financial databases – order book data Tick-by-tick data, volume and identity 69  69 ordering

31 “trending” firms (momentum traders) “reversing” firms (contrarians traders) “noisy” firms A brokerages/firms classification by considering the correlation between its inventory variation and the price return of the traded stock; Overview of Databases: financial databases – order book data Tick-by-tick data, volume and identity

32 BBVA 2003 “Reversing” (negative correlation between inventory variation and price return). “Noisy” (correlation between inventory variation and price return within noise confidence levels). “Trending” (positive correlation between inventory variation and price return). Number of firms in the group 372111 Trending - Positively correlated with price return - Large institutions - Acting on a long time scales, splitting large orders to build portfolio position by minimizing price impact - Their trading activity tends to be localized in time Reversing - Negatively correlated with price return - Large and small institutions - Typically acting on a short time scale, reverting continuously their position in the market - Their trading activity tends to be homogeneous in time Noisy - Poorly correlated with price return - Large and small institutions Overview of Databases: financial databases – order book data Tick-by-tick data, volume and identity

33 ECONOMIC databases: Amadeus, Compustat INPS Overview of Databases: economic databases

34 AMADEUS is a comprehensive, pan-European database containing financial information on over 10 million public and private companies in 38 European countries. Standardised annual accounts (for up to 10 years), consolidated and unconsolidated, financial ratios, activities and ownership for approximately 9 million companies throughout Europe, including Eastern Europe. Standardised annual accounts (for up to 10 years), consolidated and unconsolidated, financial ratios, activities and ownership for approximately 9 million companies throughout Europe, including Eastern Europe. A standard company report includes: 24 balance sheet items, 25 profit and loss account items and 26 ratios, descriptive information including trade description and activity codes (NACE 1, NAICS or US SIC can be used across the database), ownership information. A news module contains information from Reuters’, Dow Jones, the FT as well as M&A news and rumours from our own ZEPHYR. AMADEUS also contains security and price information and links to an executive report with integral graphs plus a report comparing the financials of the company’s default peer group. Overview of Databases: economic databases

35 The growth of a firm was initially describes by Gibrat in 1931. Its model regards the logarithmic growth rate where S(t) is some proxy: total asset, employees, sells, revenue turnover, … Overview of Databases: economic databases The Gibrat Model is based on: 1)Law of proportionate effects: r i (t) is independent on the initial size of the firm 2)r i (t) and r j (t) are un-correlated By making use (i) of the Central Limit Theorem and (ii) of the additional assumption of indepenence, one can show that the logarithmic growth rate show be log-normally distributed. Logarithmic growth rate

36 Overview of Databases: economic databases All data are aggregated IC fixed AMADEUSdatabase Log-normal  laplacian  what else?

37 Overview of Databases: economic databases Z-transform Data allow disaggregation in terms of economic sectors of activity within sectors

38 Overview of Databases: economic databases Data allow disaggregation year-by-year

39 Overview of Databases: economic databases Exploring the role of correlation between firms Shuffling experiments

40 Overview of Databases: economic databases Conclusions The availability of accurate databases allows for the inspection of the role that different variables play in the system.

41 The End micciche@unipa.it http://lagash.dft.unipa.it


Download ppt "Different levels of information in financial data: an overview of some widely investigated databases Salvatore Miccichè Observatory."

Similar presentations


Ads by Google