Download presentation

Presentation is loading. Please wait.

Published byGrace Creasey Modified about 1 year ago

1
POLYNOMIAL TIME HEURISTIC OPTIMIZATION METHODS APPLIED TO PROBLEMS IN COMPUTATIONAL FINANCE Ph.D. dissertation of Fogarasi Norbert, M.Sc. Supervisor: Dr. Levendovszky János, D. Sc. Doctor of the Hungarian Academy of Sciences Department of Telecommunications Budapest University of Technology and Economics Budapest, 2014 May 20 1

2
Outline of Presentation Introduction Motivation: Computational Finance and NP hard probems My contributions Thesis Group I. Mean reverting portfolio selection Thesis Group II. Optimal scheduling on identical machines Summary of results and real world applications 2

3
Computational Finance and NP hard problems Relatively new branch of computer science (Markowitz 1950s Modern Portfolio Theory. Nobel Prize in 1990) Numerical methods and algorithms with huge focus on applicability (quantitative study of markets, arbitrage, options pricing, mortgage securitization) Recent focus: Algorithmic trading, quantitative investing, high frequency trading Post the 2008 financial crisis financial services industry has faced new challenges: Regulatory pressure (timely reporting, transparency) High-frequency trading (flash crashes) Unprecedented attention on cost and efficiency Focus of interest: Finding quick (polynomial time) approximate solutions to difficult (exponential, NP hard) in order to pave the way towards a safer financial world 3

4
Computational Finance Open Issues My Contribution Challenges Real-time portfolio identification Overnight Monte-Carlo risk calculation scheduling Polynomial time approximation using stochastic optimization Polynomial time heuristic scheduling algorithms 4 NP hard problems which need fast suboptimal solutions !

5
My Contribution (cont’d) Finding polynomial time approx solutions to NP hard problems: Mean Reverting Portfolio selection (Thesis Group I) Task Scheduling on Identical machines (Thesis Group II) Show measurable improvement to existing approximate methods Prove practical applicability in real world settings Have very quick runtime characteristics for high frequency trading, timely regulatory reporting and hardware cost savings 5 refereed journal publications, 1 conference presentation 1. Fogarasi, N., Levendovszky, J. (2012) A simplified approach to parameter estimation and selection of sparse, mean reverting portfolios. Periodica Polytechnica, 56/1, Fogarasi, N., Levendovszky, J. (2012) Improved parameter estimation and simple trading algorithm for sparse, mean- reverting portfolios. Annales Univ. Sci. Budapest., Sect. Comp., 37, Fogarasi, N., Tornai, K., & Levendovszky, J. (2012) A novel Hopfield neural network approach for minimizing total weighted tardiness of jobs scheduled on identical machines. Acta Univ. Sapientiae, Informatica, 4/1, Tornai, K., Fogarasi, N., & Levendovszky, J. (2013) Improvements to the Hopfield neural network solution to the total weighted tardiness scheduling problem. Periodica Polytechnica, 57/1, Fogarasi, N., Levendovszky, J. (2013) Sparse, mean reverting portfolio selection using simulated annealing. Algorithmic Finance, 2/3-4, Fogarasi, N., Levendovszky, J. (2012) Combinatorial methods for solving the generalized eigenvalue problem with cardinality constraint for mean reverting trading. 9th Joint Conf. on Math and Comp. Sci. February 2012 Siofok, Hungary 5

6
Summary of numerical results on real world problems Field Real world problem Average performance of traditional approaches Average performance of the proposed new method Impact on computational finance (improvement in percentage) Portfolio optimization Convergence trading on US S&P 500 stock data 11.6% (S&P 500 index return) 34%22.4% Schedule optimization Morgan Stanley overnight scheduling problem (LWPF performance) (PSHNN performance) 10% 6

7
Thesis Group I. Mean reverting portfolio selection Modern Portfolio Theory (MPT) – maximize expected return for a given amount of risk Profitability vs. Predictability Mean-reverting portfolios have a large degree of predictability Therefore, we can develop profitable convergence trading strategies (~35% annual return on portfolio selected from SP500) 7

8
Intuitive task description Asset prices – multi-dimensional time series optimal linear combination with card constraint exhibiting mean reversion My contribution: Developing novel algorithms for identifying mean reverting portfolios with cardinality constraints, trading and performance analysis Trade with mean reverting portfolio profit sell profit sell buy 8

9
Thesis Group I. Problem Description How to identify mean reverting portfolios based on multivariate historical time series? Constraint: Sparse portfolio (limited transaction costs, easier to understand/interpret strategy) d’Aspremont, A.(2011) Identifying small mean-reverting portfolios. Quantitative Finance, 11:3, (Ecole Polytechnique, Paribas London, Phd-Stanford, Postdoc-Berkeley, Princeton) 9

10
Thesis Group I. - The model Mean reversion: p(t) is an Ornstein – Uhlenbeck process Key parameter: fast return to the mean smallest uncertainty in stationary behaviour price of asset i at time instant t quantity at hand of asset i CHALLENGE: 10

11
The discrete model - VAR(1) First degree vector autoregressive process 11 where ~

12
Optimal portfolio as a generalized eigenvalue problem under the constraints of Problem: develop a fast solution to the generalized eigenvalue problem under the cardinality constraint – NP hard Poly time ?? 12

13
Thesis I.1 Estimation of Model Parameters Given nxT historical VAR(1) data s t we need to estimate A, K (covar matrix of W) and G (covar matrix of s t ) A and K can be estimated using max likelihood G can be estimated using sample covariance. Classical research focuses on regularization techniques (Dempster 1972, Banerjee et al 2008, d’Aspremont et al 2008, Rothman et al 2008) 13

14
Thesis I.1 Estimation of covariance My novel approach: use sample covariance and an iterative recursive estimate in tandem to approximate G. From definition of VAR(1), we have the Lyapunov relationship in the stationary case However, the solution may be non-positive definite so we introduce a numerical method that ensures positive definiteness Start with G(0)=sample covariance Also gives a goodness of model fit Close to 0 for generated VAR(1) data, shows how well VAR(1) assumption works for real data. 14

15
Thesis I.1 Numerical results vs. t for n=8, σ=0.1, 0.3, 0.5, generating 100 independent time series for each t and plotting the average norm of error 15

16
Cardinality reduction by exhaustive search New selection, by visiting all the states fulfilling the cardinality constraints Eigenvalue solver in the small space satisfying the cardinality constraint complexity, is there any better solution? Solution with the required cardinality (sparse portfolio) Asset selection by dimension reduction (only a few assets) Big space Small space 16

17
Polynomial Time Heuristic Approaches Greedy Method (d’Aspremont 2011) On each iteration, consider adding all remaining n-k dimensions and choose the one that yields the largest max eigenvalue. Let I k be the set of indices belonging to the k non-zero components of x. On each iteration, we consider adding all remaining n-k dimensions and we choose the one that yields the largest max eigenvalue. Amounts to solving (n − k) generalized eigenvalue problems of size k + 1. Polynomial runtime: 17

18
Polynomial Time Heuristic Approaches Truncation Method (Fogarasi et al 2012) Compute unconstrained solution then use k heaviest dimensions to solve the constrained problem. Super fast heuristic (only 2 eigenvalue computations) 18

19
Restrict the portfolio vector x to have only integer values of which only k are non-zero. Consider the Energy function to be minimized: At each step of the algorithm, we consider a neighboring state w' of the current state w and decide between moving or staying At each step, a random projection of the vector is performed to an appropriate subspace Thesis I.2: Novel approach - Application of SA by random projection 19

20
Cardinality constraint can easily be built into the neighbor function Starting point can be selected as Greedy solution Memory feature can be built in to ensure solution is at least as good as starting point Periodic revert to starting point improves performance Cooling schedule can be set to be fast enough for the specific application Procedure can be stopped at any point or an adaptive stopping condition has been developed. Thesis I.2: Novel approach - Application of SA by random projection 20

21
Thesis I.2 Numerical Results For n=10, k=5 Greedy and SA find theoretical best in 70% of cases, but in 11% of the remaining 30%, SA outperforms Greedy. For larger problem sizes, SA performs even better (eg. for n=20, k=10 it outperforms Greedy 25% of the cases) 21

22
Thesis I.2 Runtime Analysis Truncation method: sub-second portfolio selection, can be used in real- time algorithmic trading Greedy: seconds to compute, can be used in intraday trading Simulated Annealing: minutes to compute, improves upon Greedy, can be used to finetune intraday trading Exhaustive: impractical for n>20, can be used for low frequency trading CPU runtime (in seconds) versus total number of assets n, to compute a full set of sparse portfolios, with cardinality ranging from 1 to n 22

23
Thesis I.3 Portfolio mean estimation Given historical portfolio valuations p t and assuming it follows O-U process, estimate μ. Classical methods in literature Sample mean estimate Least squares regression Max likelihood estimator (numerically complex) I developed a novel mean estimation method based on “pattern matching” and decision theory 23

24
Thesis I.3 Novel portfolio parameter estimation using pattern matching Starting from definition of Ornstein-Uhlenbeck process: Taking expected value of the above: Use max likelihood estimation techniques to decide which pattern they match the most, and determine long term μ where U is the time correlation matrix of p t This estimate is more accurate than sample mean and more resilient to small λ than linear regression. 24

25
Thesis I.4 Simple Convergence Trading Model We are deciding whether μ(t)< μ by only observing p(t) using an approach based on decision theory We can use this simplified model to prove the economic viability of our algorithms and compare them to each other. 25

26
Observed sample Accepted hypothesis Error probability Action (Cash / Portfolio) Buy / Hold No Action / Sell Thus the trading strategy can then be summarized as follows As a result, for a given rate of acceptable error ε, we can select an α for which Thesis I.4 Simple Convergence Trading Model it can be said that we accept the stationary hypothesis which holds with probability If process is in stationary state then the samples are generated by a Gaussian distribution. As such, having observed the sample 26

27
Consider the 500 stocks that make up the SP500 during and select the K=4 stock portfolio to maximize mean- reversion. Repeat for 250 trading days (1 year) SP500 went up by 11.6%, our method generates 34% return Minimum, maximum, average and final portfolio values starting from 100%. Thesis Group I. S&P500 Test 27

28
Thesis Group I. Summary Thesis I.1 New numerical method for estimating covariance matrix of VAR1 process Periodica Polytechnica 2011 Thesis I.2 Adopted simulated annealing to probl of maximizing mean reversion under cardinality constraint Algorithmic Finance 2013 Thesis I.3 Novel mean estimation technique for O- U processes using pattern matching Annales Univ Sci Bp 2012 Thesis I.4 Simple trading strategy based on decision theoretic formulation Joint Conf on Math and Comp Sci

29
Complex portfolios are evaluated and risk managed using Monte- Carlo simulations at many financial institutions (eg. Morgan Stanley) Future trajectories of market variables are simulated and portfolio value/risk is evaluated on each trajectory, then a weighted avg is used Each night a changed portfolio needs to be evaluated/risk managed with new market data/model parameters Need a quick way to schedule 10000’s of jobs on 10000’s of machines in a near optimal way Why? $10M/year spend on hardware, timely response to clients and regulators regarding portfolio values and VaR. My novel method saved 53 minutes on top priority jobs running for 12 hours overnight, compared to the next best heuristic. 29 Thesis Group II. Optimal Scheduling

30
Time Steps Thesis Group II. Problem Formulation Scheduling jobs on a finite number (V) of identical processors under constraints on the completion times Given n users/jobs of sizes Cutoff times Weights/priorities Scheduling matrix: Where if job i is processed at time step j. Jobs can stop/restart on different machine (preemption) For example V=2, n=3, x={2,3,1}, K={3,3,3}. 30 Jobs

31
Define “Tardiness” as where is the finishing time of job i as per C. Minimizing Total Weighted Tardiness (TWT) is stated as Under the following constraints: For example V=2, n=3, x={2,3,2}, K={3,3,3}, w={3,2,1} All jobs cannot complete before their cutoff times, but the optimal TWT solution is: Thesis Group II. Problem Formulation 31 Time Steps Jobs

32
Heuristic Approaches to TWT 1990 Du and Leung prove that TWT is NP-hard 1979 Dogramaci, Sulkis – simple heuristic 1983 Rachamadugu – myopic heuristic, compares to Earliest Due Date (EDD) and WSPT (Weighted Shortest Processing Time) 1998 Azizoglu – branch and bound heuristic, too slow > 15 jobs 1994 Koulamas – KPM algorithm 2000 Armentano – tabu search 1995 Guinet – simulated annealing, lower bound 2002 Sen, 2008 Biskup – Surveys of existing methods 2000 – Artificial Neural Network approach to scheduling problems 2004 Maheswaran – Hopfield Neural Network approach to single machine TWT on a specific 10-job problem. 32

33
HNN are a recursive Neural Network which are good for solving quadratic optimization problems in the form Our task is to transform the TWT to a quadratic optimization problem. Thesis II.1 Novel Approach: TWT to QP 33

34
Move constraints to objective function: Each member of the above addition can be converted to quadratic Lyapunov form separately to bring the expression into the form Thesis II.1 Novel Approach: TWT to QP 34 R R

35
Thesis II.1 Novel Approach: TWT to QP Results of the matrix conversions: 35 R

36
Thesis II.2 Applying HNN Hopfield (1982) proved that the recursion converges to its fixed point, so minimizes a quadratic Lyapunov function I implemented this in MATLAB, including systematic selection of the heuristic constants α,β and γ. I also developed algorithms to validate and correct the resulting schedule matrix if needed. 36

37
Thesis II.2 HNN outperforms other simple heuristics For each problem size (# of jobs) 100 random problems were generated and the average TWT was computed and plotted 37

38
Thesis II.2 HNN outperforms other simple heuristics Outperformance is consistent over a broad spectrum of problems over simple heuristics in literature (LWPF – Largest Weighted Process First, WSPT – Weighted Shortest Processing Time, EDD – Earliest Due Date) Job size % outperf

39
Thesis II.3 Further improving HNN Smart HNN (SHNN) Use the result of Largest Weighted Path First (LWPF) as starting point for HNN rather than random starting points Speeds up HNN due to single starting point, but still require multiple iterations due to setting of heuristic constants. Perturbed Smart HNN (PSHNN) Consider random perturbations of LWPF as starting point to HNN, in order to avoid getting stuck in local minima 39

40
Thesis II.3 Further improving HNN Perturbed Largest Weighted Path First (PLWPF) Simple, but surprisingly well performing heuristic The idea is to avoid getting stuck in local minima by trying starting points near LWPF solution 40

41
Thesis II.3 Further improving HNN For small job sizes, we compare performance to the theoretical best: exhaustive search over 100 randomly generated problems per job size PSHNN consistently outperforms other methods, but there is room for improvement 41

42
PSHNN outperforms other methods by increasing margin as job size grows Thesis II.3 Further improving HNN For small job sizes, we compare performance to the theoretical best: exhaustive search over 100 randomly generated problems per job size 42

43
Thesis Group II. Practical Application Monte Carlo simulation based risk calculations scheduling at Morgan Stanley overnight for trading and regulatory reporting 100 portfolios, 556 jobs, 792 seconds average size 7% improvement over HNN, 10% over LWPF (best method in literature prior to my study). 53 minutes saved on top 3 priority jobs compated to next best heuristic Weight SUM Increment to PSHNN PSHNN % PLWPF % HNN % LWPF % EDD % 43

44
Thesis II.1 Thesis II.2 Thesis II.3 I converted TWT problem to quadratic form including the constraints with heuristic constants I applied the Hopfield Neural Net (HNN) and found approximate solutions in polynomial time I showed that HNN solution outperforms other simple heuristics on large set of random problems I improved HNN by intelligent selection of starting point and random perturbations Acta Univ Sapientiae 2012 Periodica Polytechnica Thesis Group II. Optimal Scheduling

45
Numerical results on real world problems Field Real world problem Average performance of traditional approaches Average performance of the proposed new method Impact on computational finance (improvement in percentage) Portfolio optimization Convergence trading on US S&P 500 stock data 11.6% (S&P 500 index return) 34%22.4% Schedule optimization Morgan Stanley overnight scheduling problem (LWPF performance) (PSHNN performance) 10% 45

46
Managed to find a generic approach to approximating NP hard problems in polynomial time using heuristic methods Proved the practical effectiveness and applicability on real world problems for 2 very difficult open problems Provides faster, more timely data to banks, clients and financial regulators improves society as a whole Summary of my Contribution 46 This can speed up financial calculations and their scheduling

47
Thank You For Your Attention! 47 Questions and Answers

48
48 Q: Regarding the description of the HNN, the state transition rule is asynchronous, i.e. only one of the state variables (elements of vector y) is updated. What was the reason of using only asynchronous update instead of testing also synchronous one, which later would be more suitable for massively parallel implementations? A: Synchronous updating implies updating nodes at exactly the same time (requires a “global clock tick” – unrealistic for biological/physical applications. R.Rojas: Neural Networks, Springer-Verlag, Berlin, 1996) On CPU’s only a “quasi-synchronous” implementation is possible (see next slide) Due to the inherent sequential updating and the storage/copying overhead, this implementation is slower than asynchronous updating on CPU’s For hardware-level implementation, indeed synchronous updating is faster, but this was not available and therefore, I put this beyond the scope of my dissertation (see pg. 57, paragraph 1)

49
Questions and Answers (2) truly synchronous hardware impl. (3) “quasi- -synchronous” cyclic asynchronous l = mod N k (1)

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google