© 2014 CY Lin, Columbia University E6893 Big Data Analytics – Lecture 4: Big Data Analytics Algorithms 1 E6893 Big Data Analytics: Financial Market Volatility.

Slides:



Advertisements
Similar presentations
Value-at-Risk: A Risk Estimating Tool for Management
Advertisements

1 An Investigation into Regression Model using EVIEWS Prepared by: Sayed Hossain Lecturer for Economics Multimedia University Personal website:
Forecasting Using the Simple Linear Regression Model and Correlation
Chapter 21 Value at Risk Options, Futures, and Other Derivatives, 8th Edition, Copyright © John C. Hull 2012.
Chapter 21 Value at Risk Options, Futures, and Other Derivatives, 8th Edition, Copyright © John C. Hull 2012.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Correlation and regression
Objectives (BPS chapter 24)
1 Chapter 3 A Review of Statistical Principles Useful in Finance.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Multiple Linear Regression Model
CHAPTER 10 Overcoming VaR's Limitations. INTRODUCTION While VaR is the single best way to measure risk, it does have several limitations. The most pressing.
QA-3 FRM-GARP Sep-2001 Zvi Wiener Quantitative Analysis 3.
Value at Risk (VAR) VAR is the maximum loss over a target
Copyright K.Cuthbertson, D. Nitzsche 1 FINANCIAL ENGINEERING: DERIVATIVES AND RISK MANAGEMENT (J. Wiley, 2001) K. Cuthbertson and D. Nitzsche Lecture VaR:
FRM Zvi Wiener Following P. Jorion, Financial Risk Manager Handbook Financial Risk Management.
Topic 3: Regression.
Regression Chapter 10 Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania.
Volatility Chapter 9 Risk Management and Financial Institutions 2e, Chapter 9, Copyright © John C. Hull
Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50.
AGEC 622 Mission is prepare you for a job in business Have you ever made a price forecast? How much confidence did you place on your forecast? Was it correct?
Risk Premium Puzzle in Real Estate: Are real estate investors overly risk averse? James D. Shilling DePaul University Tien Foo Sing National University.
Measuring market risk:
Lecture 5 Correlation and Regression
Correlation & Regression
Big data analytics with R and Hadoop Chapter 5 Learning Data Analytics with R and Hadoop 데이터마이닝연구실 김지연.
Stress testing and Extreme Value Theory By A V Vedpuriswar September 12, 2009.
Kian Guan LIM and Christopher TING Singapore Management University
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Options, Futures, and Other Derivatives 6 th Edition, Copyright © John C. Hull Chapter 18 Value at Risk.
Value at Risk.
Risk Management and Financial Institutions 2e, Chapter 13, Copyright © John C. Hull 2009 Chapter 13 Market Risk VaR: Model- Building Approach 1.
Common Probability Distributions in Finance. The Normal Distribution The normal distribution is a continuous, bell-shaped distribution that is completely.
Regression Method.
Portfolio Management Lecture: 26 Course Code: MBF702.
1 MBF 2263 Portfolio Management & Security Analysis Lecture 2 Risk and Return.
Alternative Measures of Risk. The Optimal Risk Measure Desirable Properties for Risk Measure A risk measure maps the whole distribution of one dollar.
Risk Analysis and Technical Analysis Tanveer Singh Chandok (Director of Mentorship)
Lecture 10 The Capital Asset Pricing Model Expectation, variance, standard error (deviation), covariance, and correlation of returns may be based on.
Lecture 24. Example Correlation Coefficient =.4 Stocks  % of PortfolioAvg Return ABC Corp2860% 15% Big Corp42 40% 21% Standard Deviation = weighted.
© 2014 CY Lin, Columbia University E6893 Big Data Analytics – Lecture 4: Big Data Analytics Algorithms 1 E6893 Big Data Analytics: Financial Market Volatility.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Measuring and Forecasting Portfolio Risk on the Romanian Capital Market Supervisor: Professor Moisa ALTAR MSc student: Stefania URSULEASA.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Chapter 5: Regression Analysis Part 1: Simple Linear Regression.
Market Risk VaR: Historical Simulation Approach N. Gershun.
Measurement of Market Risk. Market Risk Directional risk Relative value risk Price risk Liquidity risk Type of measurements –scenario analysis –statistical.
Extreme Value Theory for High Frequency Financial Data Abhinay Sawant April 20, 2009 Economics 201FS.
 Measures the potential loss in value of a risky asset or portfolio over a defined period for a given confidence interval  For example: ◦ If the VaR.
Value at Risk Chapter 20 Options, Futures, and Other Derivatives, 7th International Edition, Copyright © John C. Hull 2008.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Options, Futures, and Other Derivatives, 5th edition © 2002 by John C. Hull 16.1 Value at Risk Chapter 16.
Options, Futures, and Other Derivatives, 4th edition © 1999 by John C. Hull 14.1 Value at Risk Chapter 14.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
Video Conference 1 AS 2013/2012 Chapters 10 – Correlation and Regression 15 December am – 11 am Puan Hasmawati Binti Hassan
Data Mining: Neural Network Applications by Louise Francis CAS Convention, Nov 13, 2001 Francis Analytics and Actuarial Data Mining, Inc.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
1 VaR Models VaR Models for Energy Commodities Parametric VaR Historical Simulation VaR Monte Carlo VaR VaR based on Volatility Adjusted.
Inference about the slope parameter and correlation
Types of risk Market risk
Regression and Correlation
Portfolio Risk Management : A Primer
Types of risk Market risk
Probability & Statistics Probability Theory Mathematical Probability Models Event Relationships Distributions of Random Variables Continuous Random.
Chapter 3 Statistical Concepts.
Kunal Jain March 24, 2010 Economics 201FS
Analytics – Statistical Approaches
Using Clustering to Make Prediction Intervals For Neural Networks
Presentation transcript:

© 2014 CY Lin, Columbia University E6893 Big Data Analytics – Lecture 4: Big Data Analytics Algorithms 1 E6893 Big Data Analytics: Financial Market Volatility Final Project Presentation Jimmy Zhong, Tim Wu, Oliver Zhou, John Terzis December 22, 2014

Map Reduce programming model used to generate feature matrix from raw price data across 100’s of symbols. Raw price data is first merged with feature symbols from a fixed set of user determined features on timestamp. Feature extraction is done on reducer by creating forward and backward looking volatility values for each timestamp for each symbol. Resultant feature matrix contains over 300 columns from a starting point of 12. Feature matrix can be further transformed using a script to perform time-series clustering on intra-day price activity. Feature Selection/Extraction using Hadoop 2

Spark was installed and pyspark used to perform cross-validated Ridge Regression using Stochastic Gradient Descent with the goal of producing a regressor that can predict volatility for some forward looking interval (60 Minute, 1 Day, 10 Day etc) for a given symbol. A combination of MLLIB and scikit learn were used since MLLIB did not have python bindings yet for cross-validated splitting of dataset. Spark was ran on data held in HDFS. Results obtained were tested on a hold out sample and R^2 calculated to show how much variance could be explained by the regressor. Supervised Learning on Spark using MLLIB 3

Time-Series Analysis: Forecasting multistep ahead base on GARCH model and calculate VAR Motivation: Real world financial time series has property called volatility clustering; that is periods of relative calm are interrupted by bursts of volatility. An extreme market movement might represent a significant downside risk to the security portfolio of an investor. Using RHadoop ecosystem to forecast the future volatility and calculate Value at Risk (VAR) can help investor to prepared for losses arising from natural or man-made catastrophes, even of a magnitude not experienced before. Algorithm: 1.Used PIG and Python script to pre-process the raw data (AAPL) then load it into Rstudio 2.Applied R code (TimeSeriesAnalysis.R). Calculated the return in percentage. 3.Applied GARCH modeling to forecast the future volatility and calculate VAR 4.Applied Extreme Value Theory (EVT) to fit a GPD distribution to the tails Result: 1.Calculated Forecast for the volatility and Value at Risk (VaR) at 99% confidence level (Loss is expected to be exceeded only 1% of the time). In this example, AAPL (2008 – 2009), we calculated that 99% probability the monthly return is above 4%. 2.Used statistical hypothesis tests (Ljung-Box) for autocorrelation in squared returns (p value ~0, reject the null hypothesis of no autocorrelations in the squared returns at 1% significance level). GARCH model should be employed in modeling the return timeseries. 4

Time-Series Analysis: Forecasting multistep ahead base on GARCH model and calculate VAR 5

6

Tail of the AAPL % Return dataQuantile-quantile plot 7

K-Means Clustering Goal is to attempt to relate different time intervals to stock volatility through clustering. Symbols: AIG, AMZN, PEP Vector Dimensions: Normalized Volume, Symbol Volatility +1 Day, VIX Volatility +1 Day, Time Interval Time Intervals: Period of Day, Day of Week, Fiscal Quarter, Year K-means clustering in R and Hadoop with cluster size of 3-4 Euclidean Distance Measure used since all features were real valued. 8

Cluster Results No strong correlation of time intervals to symbol volatility across all three sectors. No strong correlation between VIX volatility and symbol volatility. There is a significant relationship between volume and symbol volatility. 9

Logistic Regression Goal is to use classification model to separate variables out during feature selection and identify which ones generate the best predictive power Stock Symbols Tested: AIG, AMZN, PEP Parameters in Dataset: Normalized Volume, Symbol Volatility +1 Day, VIX Volatility +1 Day, Time Interval Targeted predicting when Symbol VIX Volatility would rise over.25, which historically is a rough cutoff between regime changes from low to high volatility market cycles. 10

Logistic Regression Results Measured by AUC (Area Under Curve) 1 is a True Positive and 0 is a True Negative, while.5 is completely Random Little to no relationship with time intervals to symbol volatility, but that may be skewed by market crashes VIX volatility and symbol volatility are nearly completely randomly related There is a significant relationship between volume and symbol volatility. 11

Questions ? 12