Presentation on theme: "Genetic Programming for Financial Trading Nicolas NAVET INRIA, France AIECON NCCU, Taiwan Tutorial at."— Presentation transcript:
Genetic Programming for Financial Trading Nicolas NAVET INRIA, France AIECON NCCU, Taiwan Tutorial at CIEF 2006, Kaohsiung, Taiwan, 08/10/2006
2 Outline of the talk (1/2) PART 1 : Genetic programming (GP) ? GP among machine learning techniques GP on the symbolic regression problem Pitfalls GP PART 2 : GP for financial trading Various schemes How to implement it ? Experimentations : GP at work
3 Outline of the talk (2/2) PART 3 : Analyzing GP results Why GP results are usually inconclusive? Benchmarking with Zero-intelligence trading strategies Lottery Trading Answering the questions is there anything to learn on the data at hand is GP effective at this task PART 4 : Perspectives
4 GP is a Machine Learning technique Ultimate goal of machine learning is the automatic programming, that is computers programming themselves.. More achievable goal: Build computer-based systems that can adapt and learn from their experience ML algorithms originate from many fields: mathematics (logic, statistics), bio-inspired techniques (neural networks), evolutionary computing (Genetic Algorithm, Genetic Programming), swarm intelligence (ant, bees)
5 Evolutionary Computing Algorithms that make use of mechanisms inspired by natural evolution, such as Survival of the fittest among an evolving population of solutions Reproduction and mutation Prominent representatives: Genetic Algorithm (GA) Genetic Programming (GP) : GP is a branch of GA where the genetic code of a solution is of variable length Over the last 50 years, evolutionary algorithms have proved to be very efficient for finding approximate solutions to algorithmically complex problems
6 Two main problems in Machine Learning Classification : model output is a prediction whether the input belongs to some particular class Examples : Human being recognition in image analysis, spam detection, credit scoring, market timing decisions Regression : prediction of the systems output for a specific input Example: predict tomorrow's opening price for a stock given closing price, market trend, other stock exchanges, …
7 Functioning scheme of ML Functioning scheme of ML Learning on a training interval Use of the model outside the training interval
9 Genetic programming Generate a population of random programs Evaluate their quality (fitness) Create better programs by applying genetic operators, eg - mutation - combination (crossover) GP is the process of evolving a population of computer programs, that are candidate solutions, according to the evolutionary principles (e.g. survival of the fittest) Solution
10 In GP, programs are represented by trees (1/3) Trees are a very general representation form : Formula : functions terminals
11 In GP, programs are represented by trees (2/3) Logical formula :
12 In GP, programs are represented by trees (3/3) Trading rule formula : BUY IF (VOL>10) AND (Moving Average(25) > Moving Average(45)) Picture from [BhPiZu02]
13 Preliminary steps of GP The user has to define : the set of terminals the set of functions how to evaluate the quality of an individual: the fitness measure parameters of the run : e.g. number of individuals of the population the termination criterion
14 Symbolic regression : a problem GP is good at … Symbolic means that one looks for both - the functional form - the value of the parameters, e.g. Differs from other regressions where one solely looks for the best coefficient values for a pre-fixed model. Usually the choice of the model is the most difficult issue ! Symbolic regression : find a function that fits well a set of experimental data points
15 Symbolic regression Given a set of points : Find the function s.t. as far as possible : Possible fitness function : GP functions : GP terminals :
16 GP Operators : biologically inspired … Recombination (aka crossover) : 2 individuals share genetic material and create one or several offsprings Mutation : introduce genetic diversity by random changes in the genetic code Reproduction : individual survives as is in the next generation
17 Selection Operators for Crossover/reproduction Fitness proportionate : each individual is selected with a probability that depends on the value of its fitness Tournament selection of size n : n individuals are randomly chosen and the best is kept Rank based : each individual is selected with a probability function of its rank according to the fitness order General principles : in GP the fittest individuals should have more chance to survive and transmit their genetic code
18 Standard Recombination (aka crossover) Standard recombination : exchange two randomly chosen sub-trees among the parents +
19 Mutation Operator 1 : standard mutation Standard mutation : replacement of a sub-tree with a randomly generated one
20 Mutation Operator 2 : swap sub-tree mutation Swap sub-tree Mutation : swap two sub-trees of an individual
21 Mutation Operator 3 : shrink mutation Shrink Mutation : replacing a branch (a node with one or more arguments) with one of his child node
22 Other Mutation Operators Swap mutation : ( swap sub-tree mutation) exchanging the function associated to a node by one having the same number of arguments Headless Chicken crossover : mutation implemented as a crossover between a program and a newly generated random program ….
23 Reproduction / Elitism Operators Reproduction : an individual is reproduced in the next generation without any modification Elitism : the best n individuals are kept in the next generation
24 GP is no silver bullet …
25 GP Issue 1 : how to choose the function set ? 1.The problem cannot be solved if the set of functions is not sufficient… 2.But Non-relevant functions increases uselessly the search space … Problem : there is no automatic way to decide a priori the relevant functions and to build a sufficient function sets … Problem : there is no automatic way to decide a priori the relevant functions and to build a sufficient function sets …
26 Problem cannot be solved if the set of functions is not sufficient : illustration Generating function: with and without sin(x) GP functions : with and without sin(x) GP terminals : 20 Number of generations Standard GP operators: crossover, mutation, reproduction, tournament selection of size 6, … 500 Number of individuals SETUP
27 Results with sin(x) in the function set Results with sin(x) in the function set Typical outcome :
28 Results without sin(x) in the function set Results without sin(x) in the function set Typical outcome :
29 Yes, sin(x) can be approximated by its Taylors series.. Problem 1 : there is little hope to discover that.. Sin(x) and taylor approximation of degree 1, 3, 5, 7, 9, 11, 13 [image Wikipedia] Problem 2 : what happens outside the training interval ?
30 Composition of the function set is crucial : illustration GP functions : Same experimental setup as before Same experimental setup as before Subset is extraneous in this context …
31 Function set containing redundant functions (1/2) Typical outcome :
32 Function set containing redundant functions (2/2) On average, with the extraneous functions the best solution is 10% farther from the curve in the training interval (much more outside!) With the extraneous functions, the average solution is better.. because the tree is more likely to contain a trigonometric function
33 GP Issue 2 : code bloat Solutions increase in size over generations … Same experimental setup as before
34 GP Issue 2 : code bloat non-effective code !! aka introns Much of the genetic code has no influence on the fitness.. but may constitute a useful reserve of genetic material
35 Code bloat: why is it a problem ? 1. 1.Solutions are hard to understand : learning something from huge solutions is almost impossible.. One has no confidence using programs one does not understand ! 2. 2.Much of the computing power is spent manipulating non-contributing code, which may slow down the search
36 Countermeasures.. (1/2) Static limit of the tree depth Dynamic maximum tree depth [SiAl03] : the limit is increased each time an outstanding individual deeper than the current limit is found Limit the probability of longer-than-average individuals to be chosen by reducing their fitness Apply operators than ensure limited code growth Discard newly created individuals whose behavior is too close to the ones of their parents (e.g. behavior for regression pb could be position of the points [Str03]) …
37 Countermeasures.. (2/2) Possible : symbolic simplification of the tree Needs to be further investigated ! preliminary experiments [TeHe04] show that simplification does not necessarily help (introns may constitute a useful reserve of genetic materials) can be simplified into :
38 GP Issue 3 : GP can be disappointing outside the training set and such a behavior can hardly be predicted …
39 GP Issue 3 : explanation (1/2) Usually GP functions are implemented to have the closure property: each function must be able to handle every possible value What to do with : division by 0 ? sqrt(x) with x < 0 ? … Solution: protected operators, eg. the division : if (abs(denominator) < value-near-0) return 1;
40 Why did it not occur on the training interval ? - not training points chosen such that GP Issue 3 : explanation (2/2) in our case, fragment of the best GP tree :
41 GP Issue 4 : standard GP is not good at finding numerical constants (1/3) Where do numerical values come from ? Ephemeral random constants : random values inserted at the leafs of the GP trees during the creation of initial population Use of arithmetic operators on existing numerical constants Generation by combination of variables/functions: Lately, many studies show that standard GP is not good at finding constants …
42 GP Issue 4 : standard GP is not good at finding numerical constants (2/2) Experiment : find a constant function equal to the numeric constant Typical outcome: error
43 GP Issue 4 : standard GP is not good at finding numerical constants (3/3) There are several more efficient schemes for constants generation in GP [Dem95] : - local optimization [ZuPiMa01], - numeric mutation [EvFe98], - … One of them should be implemented otherwise 1) computation time is lost searching for constants 2) solutions may tend to be bigger
44 Some (personal) conclusions on GP (1/3) GP is undoubtedly a powerful technique : Efficient for predicting / classifying.. but not more than other techniques Symbolic representation of the created solutions may help to give good insight into the system under study.. not only the best solutions are interesting but also how the population has evolved over time GP is a tool to learn knowledge …
45 Some (personal) conclusions on GP (2/3) Powerful tool but... a good knowledge of the application field is required for choosing the right functions set prior experience with GP is mandatory to avoid common mistakes – there is no theory to tell us what to do ! it tend to create solutions too big to be analyzable -> countermeasures should be implemented fine-tuning the GP parameters is very time- consuming
46 Some (personal) conclusions on GP (3/3) How to analyze the results of GP ? efficiency can hardly be predicted, it varies from problem to problem … and from GP run to GP run if results are not very positive : is it because there is no good solution ? or GP is not effective and further work is needed ? There are solutions – part 3 of the talk
Part 2 : GP for financial trading
48 Why GP is an appealing technique for financial trading ? Easy to implement / robust evolutionary technique Trading rules (TR) should adapt to a changing environment – GP may simulate this evolution Solutions are produced under a symbolic form that can be understood and analyzed GP may serve as a knowledge discovery tool (e.g. evolution of the market)
49 GP for financial trading GP for composing portfolio (not discussed here, see [Lag03] ) GP for evolving the structure of neural networks used for prediction (not discussed here, see [GoFe99] ) GP for predicting price evolution (briefly discussed here, see [Kab02] ) Most common : GP for inducing technical trading rules
50 Predicting price evolution : general comments.. Long term forecast of stock prices remain a fantasy [Kab02] Swing trading or intraday trading Many other (more?) efficient ML tools : e.g. SVM and NN GP is anyway useful for ensemble methods CIEF Tutorial 1 by Prof. Fyfe – today 1h30 pm ! 2 excellent starting points : [Kab02] : single-day-trading-strategy based on the forecasted spread [SaTe01]: winner of the CEC2000 Dow-Jones Prediction – Prediction t+1, t+2, t+3,…, t+h - a solution has one tree per forecast horizon
51 Predicting price evolution : fitness function Definition of the fitness function has been shown to be crucial e.g. [SaTe01], there are many possible : (Normalized) Mean square error Mean Absolute Percentage Error (1- ) statistic = 1 - MAPE / MAPE-Randow-Walk Directional symmetry index (DS) DS weighted by the direction and amplitude of the error … Issue : a meaningful fitness function is not always GP friendly …
52 Inducing technical trading rules Training interval Validation interval Out-of-sample interval 1 ) Creation of the trading rules using GP 2) Selection of the best resulting strategies Further selection on unseen data - One strategy is chosen for out-of-sample Performance evaluation
53 Steps of the algorithm (1/3) 1. Extracting training time series from the database 2. Preprocessing : cleaning, sampling, averaging, normalizing, …
54 3. GP on the training set 3.1 Creation of the individuals 3.2 Evaluation Steps of the algorithm (2/3) Trading Rules Interpreter Trading Sequence Simulator 0,1,1,1,0,0,0,0,1,1,1 Fitness $$ 3.3 Selection of the individuals 4. Analysis of the evolution : statistics, html files … Non-preprocessed series !
55 Steps of the algorithm (3/3) 5. Evaluate selected individuals on the validation set 6. Evaluate best individual out-of sample $1 $2 $3
GP at work : Demo on the Taiwan Capitalization Weighted Stock Index
Part 3 : Analyzing GP results
58 One may cast doubts on GP efficiency.. Highly heuristic – no theory ! Problems on which GP has been shown not to be significantly better than random search Few clear-cut successes reported in the financial literature GP embeds little domain specific knowledge yet.. Doubts on the efficiency of GP to use the available computing time : code bloat bad at finding numerical constants best solutions are sometimes found very early in the run.. Variability of the results ! e.g. returns: , , , , , , , , , , , , , , , , , , , , ….
59 Possible pretest : measure of predictability of the financial time-series Possible pretest : measure of predictability of the financial time-series Serial correlation Kolmogorov complexity Lyapunov exponent Unit root analysis Comparison with results on surrogate data : shuffled series (e.g. Kaboudan statistics)... Actual question : how predictable for a given horizon with a given cost function?
60 In practice, some predictability does not imply profitability.. Volatility may not be sufficient to cover round-trip transactions costs! Not the right trading instrument at hand.. typically short selling not available Prediction horizon must be large enough!
61 Pretest methodology Compare GP with several variants of Random search algorithms Zero-Intelligence Strategies - ZIS Random trading behaviors Lottery trading - LT Statistical hypotheses testing Null : GP does not outperform ZIS Null : GP does not outperform LT Issue : how to best constrain randomness ?
Pretest 1 : GP versus Zero-Intelligence strategies (=Equivalent search intensity Random Search (ERS) with validation stage) -Null hypothesis H : GP does not outperform equivalent random search - Alternative hypothesis is H -Null hypothesis H 1,0 : GP does not outperform equivalent random search - Alternative hypothesis is H 1,1
63 Pretest 1 : GP vs zero-intelligence strategies H 1,0 cannot be rejected – interpretation : There is nothing to learn or GP is not very effective Training interval Validation interval Out-of-sample interval 1 ) Creation of the trading rules using GP 2) Selection of the best resulting strategies Further selection on unseen data - One strategy is chosen for out-of-sample Performance evaluation ERS
64 Pretest 4 : GP vs lottery trading Lottery trading (LT) = random trading behavior according the outcome of a r.v. (e.g. Bernoulli law) Issue 1 : if LT tends to hold positions (short, long) for less time that GP, transactions costs may advantage GP.. Issue 2 : it might be an advantage or an disadvantage for LT to trade much less or much more than GP. ex: downward oriented market with no short-sell
65 Frequency and intensity of a trading strategy Frequency : average number of transactions per unit of time Intensity : proportion of time where a position is held For pretest 4 : We impose that average frequency and intensity of LT is equal to the ones of GP Implementation : generate random trading sequences having the right characteristics 0,0,1,1,1,0,0,0,0,0,1,1,1,1,1,1,0,0,1,1,0,1,0,0,0,0,0,0,1,1,1,1,1,1,…
66 Training interval Validation interval Out-of-sample interval 1 ) Creation of the trading rules using GP 2) Selection of the best resulting strategies Further selection on unseen data - One strategy is chosen for out-of-sample Performance evaluation Pretest 4 : implementation 0,0,1,1,1,0,0,0,0,0,1,… Lottery trading
Answering question 1 : is there anything to learn on the training data at hand ?
68 Question 1 : pretests involved Starting point: if a set of search algorithms do not outperform LT, it gives evidence that there is nothing to learn.. Pretest 4 : GP vs Lottery Trading Null hypothesis H 4,0 : GP does not outperform LT Pretest 5 : Equivalent Random Search (ZIS) vs Lottery Trading Null hypothesis H 5,0 : ERS does not outperform LT
69 Question 1 : some answers... R means that the null hypothesis Hcannot be rejected – R means we should favor H R means that the null hypothesis H i,0 cannot be rejected – R means we should favor H i,1 H 4,0 H 5,0 Interpretation Case 1 R R Case 2RR Case 3R R Case 4 R R there is nothing to learn there is something to learn there may be something to learn - ERS might not be powerful enough there may be something to learn – GP evolution process is detrimental
Answering question 2 : is GP effective ?
71 Question 2 : some answers... Question 2 cannot be answered if there is nothing to learn (case 1) Case 4 provides us with a negative answer.. In case 2 and 3, run pretest 1 : GP vs Equivalent random search Null hypothesis H 1,0 : GP does not outperform ERS If one cannot reject H 1,0 GP shows no evidence of efficiency…
Pretests at work Methodology : Draw conclusions from pretests using our own programs and compare with results in the literature [ChKuHo06] on the same time series
73 Setup : GP control parameters - same as in [ChKuHo06]
74 Setup : statistics, data, trading scheme Hypothesis testing with student t-test with a 95% confidence level Hypothesis testing with student t-test with a 95% confidence level Pretests with samples made of 50 GP runs, 50 ERS runs and 100 LT runs Pretests with samples made of 50 GP runs, 50 ERS runs and 100 LT runs Data : indexes of 3 stock exchanges Canada, Taiwan and Japan Data : indexes of 3 stock exchanges Canada, Taiwan and Japan Daily trading with short selling Daily trading with short selling Training of 3 years – Validation of 2 years Training of 3 years – Validation of 2 years Out-of-sample periods: , , Out-of-sample periods: , , Data normalized with a 250 days moving average Data normalized with a 250 days moving average
75 Results on actual data (1/2) Evidence that there is something to learn : 4 markets out of 9 (C3,J2,T1,T3) Experiments in [ChKuHo06], with another GP implementation, show that GP performs very well on these 4 markets Evidence that there is nothing to learn : 3 (C1,J3,T2) In [ChKuHo06], there is only one (C1) where GP has positive return (but less than B&H)
76 Results on actual data (2/2) GP effective : 3 markets out of 6 In these 3 markets, GP outperforms Buy and Hold – same outcome as in [ChKuHo06] Preliminary conclusion : one can rely on pretests.. When there is nothing to learn, no GP implementation did good (except in one case) When there is something to learn, at least one implementation did good (always) When our GP is effective, GP in [ChKuHo06] is effective too (always)
77 Further conclusion Our GP implementation is is more efficient than random search : no case where ERS outperform LT and GP did not 2. 2.But only slightly more efficient … one would expect much more cases where GP does better than LT and not ERS Our GP is actually able to take advantage of regularities in data … but only of simple ones
Part 4 : Perspectives in the field of GP for financial trading
79 Rethinking fitness functions Fitness functions : accumulated return, risk- adjusted return, … Issue : on some problems [LaPo02], GP is only marginally better than random search because fitness function induces a difficult" landscape … Come up with GP-friendly fitness functions … From [LaPo02]
80 Preprocessing of the data : still an open issue Studies in forecasting show the importance of preprocessing – for GP, often, normalization with MA(250) is used - with benefits [ChKuHo06] Length of MA should change according to markets volatility, regime changes, etc ? Why not consider : MACD, Exponential MA, differencing, rate of change, log value, FFT, wavelet, …
81 Data division scheme There is evidence that GP performs poorly when the characteristics of the training interval are very different from the out-of- sample interval … Characterization of the current market condition : mean reverting, trend following... Relearning on a smaller interval if needed ?
82 More extensive tests are needed.. automating the test A comprehensive test for daily indexes done in [ChKuHo06], none exists for individual stocks and intraday data … Automated testing on several hundred of stocks is fully feasible … but require a software infrastructure and much computing power
83 Ensemble methods : combining trading rules In ML, ensemble methods have proven to be very effective Majority rule tested in [ChKuHo06] with some success Efficiency requirement : accuracy (better than random) and diversity (uncorrelated errors) – what does it mean for trading rules? More fine grained selection / weighting scheme may lead to better results …
84 Embed more domain specific knowledge Black-box algorithms are usually outperformed by domain-specific algorithms Domain-specific language is limited as yet … Enrich primitive set with volume, indexes, bid/ask spread, … Enrich function set with cross-correlation, predictability measure, …
85 References (1/2) [ChKuHo06] S.-H. Chen and T.-W. Kuo and K.-M. Hoi. Genetic Programming and Financial Trading: How Much about "What we Know. In 4th NTU International Conference on Economics, Finance and Accounting, April [ChKuHo06] S.-H. Chen and T.-W. Kuo and K.-M. Hoi. Genetic Programming and Financial Trading: How Much about "What we Know. In 4th NTU International Conference on Economics, Finance and Accounting, April [ChNa06] S.-H. Chen and N. Navet. Pretests for genetic-programming evolved trading programs : zero-intelligence strategies and lottery trading, Proc. ICONIP2006. [ChNa06] S.-H. Chen and N. Navet. Pretests for genetic-programming evolved trading programs : zero-intelligence strategies and lottery trading, Proc. ICONIP2006. [SiAl03] S. Silva and J. Almeida, Dynamic Maximum Tree Depth - A Simple Technique for Avoiding Bloat in Tree-Based GP, GECCO 2003, LNCS 2724, pp. 1776–1787, [SiAl03] S. Silva and J. Almeida, Dynamic Maximum Tree Depth - A Simple Technique for Avoiding Bloat in Tree-Based GP, GECCO 2003, LNCS 2724, pp. 1776–1787, [Str03] M.J. Streeter, The Root Causes of Code Growth in Genetic Programming, EuroGP 2003, pp , [Str03] M.J. Streeter, The Root Causes of Code Growth in Genetic Programming, EuroGP 2003, pp , [TeHe04] M.D. Terrio, M. I. Heywood, On Naïve Crossover Biases with Reproduction for Simple Solutions to Classification Problems, GECCO 2004, [TeHe04] M.D. Terrio, M. I. Heywood, On Naïve Crossover Biases with Reproduction for Simple Solutions to Classification Problems, GECCO 2004, [ZuPiMa01] G. Zumbach, O.V. Pictet, and O. Masutti, Genetic Programming with Syntactic Restrictions applied to Financial Volatility Forecasting, Olsen & Associates, Research Report, [ZuPiMa01] G. Zumbach, O.V. Pictet, and O. Masutti, Genetic Programming with Syntactic Restrictions applied to Financial Volatility Forecasting, Olsen & Associates, Research Report, [EvFe98] M. Evett, T. Fernandez, Numeric Mutation Improves the Discovery of Numeric Constants in Genetic Programming, Genetic Programming 1998: Proceedings of the Third Annual Conference, [EvFe98] M. Evett, T. Fernandez, Numeric Mutation Improves the Discovery of Numeric Constants in Genetic Programming, Genetic Programming 1998: Proceedings of the Third Annual Conference, 1998.
86 References (2/2) [Kab02] M. Kaboudan, GP Forecasts of Stock Prices for Profitable Trading, Evolutionary computation in economics and finance, [Kab02] M. Kaboudan, GP Forecasts of Stock Prices for Profitable Trading, Evolutionary computation in economics and finance, [SaTe02] M. Santini, A. Tettamanzi, Genetic Programming for Financial Series Prediction, Proceedings of EuroGP'2001, [SaTe02] M. Santini, A. Tettamanzi, Genetic Programming for Financial Series Prediction, Proceedings of EuroGP'2001, [BhPiZu02] S. Bhattacharyya, O. V. Pictet, G. Zumbach, Knowledge-Intensive Genetic Discovery in Foreign Exchange Markets, IEEE Transactions on Evolutionary Computation, vol 6, n° 2, April [BhPiZu02] S. Bhattacharyya, O. V. Pictet, G. Zumbach, Knowledge-Intensive Genetic Discovery in Foreign Exchange Markets, IEEE Transactions on Evolutionary Computation, vol 6, n° 2, April [LaPo02] W.B. Langdon, R. Poli, Fondations of Genetic Programming, Springer Verlag, [LaPo02] W.B. Langdon, R. Poli, Fondations of Genetic Programming, Springer Verlag, [Kab00] M. Kaboudan, Genetic Programming Prediction of Stock Prices, Computational Economics, vol16, [Kab00] M. Kaboudan, Genetic Programming Prediction of Stock Prices, Computational Economics, vol16, [Wag03] L. Wagman, Stock Portfolio Evaluation: An Application of Genetic- Programming-Based Technical Analysis, Genetic Algorithms and Genetic Programming at Stanford 2003, [Wag03] L. Wagman, Stock Portfolio Evaluation: An Application of Genetic- Programming-Based Technical Analysis, Genetic Algorithms and Genetic Programming at Stanford 2003, [GoFe99] W. Golubski and T. Feuring, Evolving Neural Network Structures by Means of Genetic Programming, Proceedings of EuroGP'99, [GoFe99] W. Golubski and T. Feuring, Evolving Neural Network Structures by Means of Genetic Programming, Proceedings of EuroGP'99, [Dem05] I. Dempsey, Constant Generation for the Financial Domain using Grammatical Evolution, Proceedings of the 2005 workshops on Genetic and evolutionary computation 2005, pp 350 – 353, Washington, June , [Dem05] I. Dempsey, Constant Generation for the Financial Domain using Grammatical Evolution, Proceedings of the 2005 workshops on Genetic and evolutionary computation 2005, pp 350 – 353, Washington, June , 2005.