Download presentation

Presentation is loading. Please wait.

Published byBrennan Delaware Modified over 2 years ago

1
Learning to Trade via Direct Reinforcement John Moody International Computer Science Institute, Berkeley & J E Moody & Company LLC, Portland Global Derivatives Trading & Risk Management Paris, May 2008

2
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 What is Reinforcement Learning? RL Considers: A Goal-Directed “Learning” Agent interacting with an Uncertain Environment that attempts to maximize Reward / Utility RL is an Active Paradigm: Agent “Learns” by “Trial & Error” Discovery Actions result in Reinforcement RL Paradigms: Value Function Learning (Dynamic Programming) Direct Reinforcement (Adaptive Control)

3
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 I. Why Direct Reinforcement? Direct Reinforcement Learning : Finds predictive structure in financial data Integrates Forecasting w/ Decision Making Balances Risk vs. Reward Incorporates Transaction Costs Discover Trading Strategies!

4
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 Optimizing Trades based on Forecasts Indirect Approach: Two sets of parameters Forecast error is not Utility Forecaster ignores transaction costs Information bottleneck

5
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 Learning to Trade via Direct Reinforcement Trader Properties: One set of parameters A single utility function U includes transaction costs Direct mapping from inputs to actions

6
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 Direct RL Trader (USD/GBP): Return A =15%, SR A =2.3, DDR A =3.3

7
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 II. Direct Reinforcement: Algorithms & Illustrations Algorithms: Recurrent Reinforcement Learning (RRL) Stochastic Direct Reinforcement (SDR) Illustrations: Sensitivity to Transaction Costs Risk-Averse Reinforcement

8
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 Learning to Trade via Direct Reinforcement DR Trader: Recurrent policy (Trading signals, Portfolio weights) Takes action, Receives reward (Trading Return w/ Transaction Costs) Causal performance function (Generally path-dependent) Learn policy by varying GOAL: Maximize performance or marginal performance

9
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 Recurrent Reinforcement Learning (RRL) (Moody & Wu 1997) Deterministic gradient (batch): with recursion: Stochastic gradient (on-line): stochastic recursion: Stochastic parameter update (on-line): Constant : adaptive learning.Declining : stochastic approx.

10
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 Structure of Traders Single Asset - Price series - Return series Traders - Discrete position size - Recurrent policy Observations: –Full system State is not known Simple Trading Returns and Profit: Transaction Costs: represented by.

11
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 Risk-Averse Reinforcement: Financial Performance Measures Performance Functions: Path independent: (Standard Utility Functions) Path dependent: Performance Ratios: Sharpe Ratio: Downside Deviation Ratio: For Learning: Per-Period Returns: Marginal Performance: e.g. Differential Sharpe Ratio.

12
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 Long / Short Trader Simulation Sensitivity to Transaction Costs Learns from scratch and on-line Moving average Sharpe Ratio with = 0.01

13
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 Trader Simulation Transaction Costs vs. Performance 100 Runs; Costs = 0.2%, 0.5%, and 1.0% Sharpe Ratio Trading Frequency

14
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 Minimizing Downside Risk: Artificial Price Series w/ Heavy Tails

15
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 Comparison of Risk-Averse Traders Underwater Curves

16
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 Comparison of Risk-Averse Traders: Draw-Downs

17
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 III. Direct Reinforcement vs. Dynamic Programming Algorithms: Value Function Method (Q-Learning) Direct Reinforcement Learning (RRL) Illustration: Asset Allocation: S&P 500 & T-Bills RRL vs. Q-Learning

18
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 RL Paradigms Compared Value Function Learning Origins: Dynamic Programming Learn “optimal” Q-Function Q: state action value Solve Bellman’s Equation Action: “Indirect” Direct Reinforcement Origins: Adaptive Control Learn “good” Policy P P: observations p(action) Optimize “Policy Gradient” Action: “Direct”

19
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 S&P-500 / T-Bill Asset Allocation: Maximizing the Differential Sharpe Ratio

20
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 S&P-500: Opening Up the Black Box 85 series: Learned relationships are nonstationary over time

21
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 Closing Remarks Direct Reinforcement Learning: –Discovers Trading Opportunities in Markets –Integrates Forecasting w/ Trading –Maximizes Risk-Adjusted Returns –Optimizes Trading w/ Transaction Costs Direct Reinforcement Offers Advantages Over: –Trading based on Forecasts (Supervised Learning) –Dynamic Programming RL (Value Function Methods) Illustrations: –Controlled Simulations –FX Currency Trader –Asset Allocation: S&P 500 vs. Cash &

22
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 Selected References: [1] John Moody and Lizhong Wu. Optimization of trading systems and portfolios. Decision Technologies for Financial Engineering, [2] John Moody, Lizhong Wu, Yuansong Liao, and Matthew Saffell. Performance functions and reinforcement learning for trading systems and portfolios. Journal of Forecasting, 17: , [3] Jonathan Baxter and Peter L. Bartlett. Direct gradient-based reinforcement learning: Gradient estimation algorithms [4] John Moody and Matthew Saffell. Learning to trade via direct reinforcement. IEEE Transactions on Neural Networks, 12(4): , July [5] Carl Gold. FX Trading via Recurrent Reinforcement Learning. Proceedings of IEEE CIFEr Conference, Hong Kong, [6] John Moody, Y. Liu, M. Saffell and K.J. Youn. Stochastic Direct Reinforcement: Application to Simple Games with Recurrence. In Artificial Multiagent Learning, Sean Luke et al. eds, AAAI Press, 2004.

23
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 Supplemental Slides Differential Sharpe Ratio Portfolio Optimization Stochastic Direct Reinforcement (SDR)

24
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 Maximizing the Sharpe Ratio Sharpe Ratio: Exponential Moving Average Sharpe Ratio: with time scale and Motivation: EMA Sharpe ratio emphasizes recent patterns; can be updated incrementally.

25
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 Differential Sharpe Ratio for Adaptive Optimization Expand to first order in : Define Differential Sharpe Ratio as: where

26
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 Learning with the Differential SR Evaluate “Marginal Utility” Gradient: Motivation for DSR: isolates contribution of to (“marginal utility” ); provides interpretability; adapts to changing market conditions; facilitates efficient on-line learning (stochastic optimization).

27
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 Trader Simulation Transaction costs vs. Performance 100 runs; Costs = 0.2%, 0.5%, and 1.0% Trading Frequency Cumulative Profit Sharpe Ratio

28
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 Portfolio Optimization (3 Securities)

29
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 Stochastic Direct Reinforcement: Probabilistic Policies

30
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 Learning to Trade Single Asset - Price series - Return series Trader - Discrete position size - Recurrent policy Observations: –Full system State is not known Simple Trading Returns and Profit: Transaction cost rate.

31
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 Consider a learning agent with stochastic policy function whose inputs include recent observations o and actions a : Why should past actions (recurrence) be included? Examples: Games (observations o are opponent’s actions) Trading financial markets In General: Why does Reinforcement need Recurrence? Model opponent’s responses o to previous actions a Minimize transaction costs, market impact Recurrence enables discovery of better policies that capture an agent’s impact on the world !!

32
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 Stochastic Direct Reinforcement (SDR): Maximize Performance Expected total performance of a sequence of T actions Maximize performance via direct gradient ascent Must evaluate total policy gradient for a policy represented by

33
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 Stochastic Direct Reinforcement (SDR): Maximize Performance The goal of SDR is to maximize expected total performance of a sequence of T actions via direct gradient ascent Must evaluate for a policy represented by Notation: The complete history is denoted. is a partial history of length (n,m).

34
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 Stochastic Direct Reinforcement: First Order Recurrent Policy Gradient For first order recurrence ( m=1 ), conditional action probability is given by the policy: The probabilities of current actions depend upon the probabilities of prior actions: The total (recurrent) policy gradient is computed as : with partial (naïve) policy gradient :

35
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 SDR Trader Simulation w/ Transaction Costs

36
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 Trading Frequency vs. Transaction Costs Recurrent SDRNon-Recurrent

37
Learning to Trade via Direct Reinforcement Global Derivatives Trading & Risk Management – May 2008 Sharpe Ratio vs. Transaction Costs Recurrent SDRNon-Recurrent

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google