Download presentation

Presentation is loading. Please wait.

1
**Learning to Trade via Direct Reinforcement**

John Moody International Computer Science Institute, Berkeley & J E Moody & Company LLC, Portland Global Derivatives Trading & Risk Management Paris, May 2008

2
**What is Reinforcement Learning?**

RL Considers: A Goal-Directed “Learning” Agent interacting with an Uncertain Environment that attempts to maximize Reward / Utility RL is an Active Paradigm: Agent “Learns” by “Trial & Error” Discovery Actions result in Reinforcement RL Paradigms: Value Function Learning (Dynamic Programming) Direct Reinforcement (Adaptive Control) Global Derivatives Trading & Risk Management – May 2008

3
**I. Why Direct Reinforcement?**

Direct Reinforcement Learning: Finds predictive structure in financial data Integrates Forecasting w/ Decision Making Balances Risk vs. Reward Incorporates Transaction Costs Discover Trading Strategies! Global Derivatives Trading & Risk Management – May 2008

4
**Optimizing Trades based on Forecasts**

Indirect Approach: Two sets of parameters Forecast error is not Utility Forecaster ignores transaction costs Information bottleneck Global Derivatives Trading & Risk Management – May 2008

5
**Learning to Trade via Direct Reinforcement**

Trader Properties: One set of parameters A single utility function U includes transaction costs Direct mapping from inputs to actions Global Derivatives Trading & Risk Management – May 2008

6
**Direct RL Trader (USD/GBP): ReturnA=15%, SRA=2.3, DDRA=3.3**

Global Derivatives Trading & Risk Management – May 2008

7
**II. Direct Reinforcement: Algorithms & Illustrations**

Recurrent Reinforcement Learning (RRL) Stochastic Direct Reinforcement (SDR) Illustrations: Sensitivity to Transaction Costs Risk-Averse Reinforcement Global Derivatives Trading & Risk Management – May 2008

8
**Learning to Trade via Direct Reinforcement**

DR Trader: Recurrent policy (Trading signals, Portfolio weights) Takes action, Receives reward (Trading Return w/ Transaction Costs) Causal performance function (Generally path-dependent) Learn policy by varying GOAL: Maximize performance or marginal performance Global Derivatives Trading & Risk Management – May 2008

9
**Recurrent Reinforcement Learning (RRL) (Moody & Wu 1997)**

Deterministic gradient (batch): with recursion: Stochastic gradient (on-line): stochastic recursion: Stochastic parameter update (on-line): Constant : adaptive learning. Declining : stochastic approx. Global Derivatives Trading & Risk Management – May 2008

10
**Global Derivatives Trading & Risk Management – May 2008**

Structure of Traders Single Asset - Price series - Return series Traders - Discrete position size - Recurrent policy Observations: Full system State is not known Simple Trading Returns and Profit: Transaction Costs: represented by Global Derivatives Trading & Risk Management – May 2008

11
**Risk-Averse Reinforcement: Financial Performance Measures**

Performance Functions: Path independent: (Standard Utility Functions) Path dependent: Performance Ratios: Sharpe Ratio: Downside Deviation Ratio: For Learning: Per-Period Returns: Marginal Performance: e.g. Differential Sharpe Ratio . Global Derivatives Trading & Risk Management – May 2008

12
**Long / Short Trader Simulation Sensitivity to Transaction Costs**

Learns from scratch and on-line Moving average Sharpe Ratio with = 0.01 Global Derivatives Trading & Risk Management – May 2008

13
**Trader Simulation Sharpe Ratio Trading Frequency**

Transaction Costs vs. Performance 100 Runs; Costs = 0.2%, 0.5%, and 1.0% Sharpe Ratio Trading Frequency Global Derivatives Trading & Risk Management – May 2008

14
**Minimizing Downside Risk: Artificial Price Series w/ Heavy Tails**

Global Derivatives Trading & Risk Management – May 2008

15
**Comparison of Risk-Averse Traders Underwater Curves**

Global Derivatives Trading & Risk Management – May 2008

16
**Comparison of Risk-Averse Traders: Draw-Downs**

Global Derivatives Trading & Risk Management – May 2008

17
**III. Direct Reinforcement vs. Dynamic Programming**

Algorithms: Value Function Method (Q-Learning) Direct Reinforcement Learning (RRL) Illustration: Asset Allocation: S&P 500 & T-Bills RRL vs. Q-Learning Global Derivatives Trading & Risk Management – May 2008

18
**Global Derivatives Trading & Risk Management – May 2008**

RL Paradigms Compared Value Function Learning Origins: Dynamic Programming Learn “optimal” Q-Function Q: state action value Solve Bellman’s Equation Action: “Indirect” Direct Reinforcement Origins: Adaptive Control Learn “good” Policy P P: observations p(action) Optimize “Policy Gradient” Action: “Direct” Global Derivatives Trading & Risk Management – May 2008

19
**Global Derivatives Trading & Risk Management – May 2008**

S&P-500 / T-Bill Asset Allocation: Maximizing the Differential Sharpe Ratio Global Derivatives Trading & Risk Management – May 2008

20
**S&P-500: Opening Up the Black Box**

85 series: Learned relationships are nonstationary over time Global Derivatives Trading & Risk Management – May 2008

21
**Closing Remarks Direct Reinforcement Learning:**

Discovers Trading Opportunities in Markets Integrates Forecasting w/ Trading Maximizes Risk-Adjusted Returns Optimizes Trading w/ Transaction Costs Direct Reinforcement Offers Advantages Over: Trading based on Forecasts (Supervised Learning) Dynamic Programming RL (Value Function Methods) Illustrations: Controlled Simulations FX Currency Trader Asset Allocation: S&P 500 vs. Cash & Global Derivatives Trading & Risk Management – May 2008

22
**Global Derivatives Trading & Risk Management – May 2008**

Selected References: [1] John Moody and Lizhong Wu. Optimization of trading systems and portfolios. Decision Technologies for Financial Engineering, 1997. [2] John Moody, Lizhong Wu, Yuansong Liao, and Matthew Saffell. Performance functions and reinforcement learning for trading systems and portfolios. Journal of Forecasting, 17: , 1998. [3] Jonathan Baxter and Peter L. Bartlett. Direct gradient-based reinforcement learning: Gradient estimation algorithms [4] John Moody and Matthew Saffell. Learning to trade via direct reinforcement. IEEE Transactions on Neural Networks, 12(4): , July 2001. [5] Carl Gold. FX Trading via Recurrent Reinforcement Learning. Proceedings of IEEE CIFEr Conference, Hong Kong, 2003. [6] John Moody, Y. Liu, M. Saffell and K.J. Youn. Stochastic Direct Reinforcement: Application to Simple Games with Recurrence. In Artificial Multiagent Learning, Sean Luke et al. eds, AAAI Press, 2004. Global Derivatives Trading & Risk Management – May 2008

23
**Global Derivatives Trading & Risk Management – May 2008**

Supplemental Slides Differential Sharpe Ratio Portfolio Optimization Stochastic Direct Reinforcement (SDR) Global Derivatives Trading & Risk Management – May 2008

24
**Maximizing the Sharpe Ratio**

Exponential Moving Average Sharpe Ratio: with time scale and Motivation: EMA Sharpe ratio emphasizes recent patterns; can be updated incrementally. Global Derivatives Trading & Risk Management – May 2008

25
**Differential Sharpe Ratio for Adaptive Optimization**

Expand to first order in : Define Differential Sharpe Ratio as: where Global Derivatives Trading & Risk Management – May 2008

26
**Learning with the Differential SR**

Evaluate “Marginal Utility” Gradient: Motivation for DSR: isolates contribution of to (“marginal utility” ); provides interpretability; adapts to changing market conditions; facilitates efficient on-line learning (stochastic optimization). Global Derivatives Trading & Risk Management – May 2008

27
**Trader Simulation Transaction costs vs. Performance**

100 runs; Costs = 0.2%, 0.5%, and 1.0% Trading Frequency Cumulative Profit Sharpe Ratio Global Derivatives Trading & Risk Management – May 2008

28
**Portfolio Optimization (3 Securities)**

Global Derivatives Trading & Risk Management – May 2008

29
**Stochastic Direct Reinforcement: Probabilistic Policies**

Global Derivatives Trading & Risk Management – May 2008

30
**Global Derivatives Trading & Risk Management – May 2008**

Learning to Trade Single Asset - Price series - Return series Trader - Discrete position size - Recurrent policy Observations: Full system State is not known Simple Trading Returns and Profit: Transaction cost rate Global Derivatives Trading & Risk Management – May 2008

31
**Why does Reinforcement need Recurrence?**

Consider a learning agent with stochastic policy function whose inputs include recent observations o and actions a : Why should past actions (recurrence) be included? Examples: Games (observations o are opponent’s actions) Trading financial markets In General: Model opponent’s responses o to previous actions a Minimize transaction costs, market impact Recurrence enables discovery of better policies that capture an agent’s impact on the world !! Global Derivatives Trading & Risk Management – May 2008

32
**Stochastic Direct Reinforcement (SDR): Maximize Performance**

Expected total performance of a sequence of T actions Maximize performance via direct gradient ascent Must evaluate total policy gradient for a policy represented by Global Derivatives Trading & Risk Management – May 2008

33
**Stochastic Direct Reinforcement (SDR): Maximize Performance**

The goal of SDR is to maximize expected total performance of a sequence of T actions via direct gradient ascent Must evaluate for a policy represented by Notation: The complete history is denoted is a partial history of length (n,m) . Global Derivatives Trading & Risk Management – May 2008

34
**Stochastic Direct Reinforcement: First Order Recurrent Policy Gradient**

For first order recurrence (m=1), conditional action probability is given by the policy: The probabilities of current actions depend upon the probabilities of prior actions: The total (recurrent) policy gradient is computed as : with partial (naïve) policy gradient : Global Derivatives Trading & Risk Management – May 2008

35
**SDR Trader Simulation w/ Transaction Costs**

Global Derivatives Trading & Risk Management – May 2008

36
**Trading Frequency vs. Transaction Costs**

Recurrent SDR Non-Recurrent Global Derivatives Trading & Risk Management – May 2008

37
**Sharpe Ratio vs. Transaction Costs**

Recurrent SDR Non-Recurrent Global Derivatives Trading & Risk Management – May 2008

Similar presentations

OK

Aggregate Stock Market 1. Introduction The standard framework for thinking about aggregate stock market behavior has been the consumption-based approach.

Aggregate Stock Market 1. Introduction The standard framework for thinking about aggregate stock market behavior has been the consumption-based approach.

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google