Chen Cai, Benjamin Heydecker Presentation for the 4th CREST Open Workshop Operation Research for Software Engineering Methods, London, 2010 Approximate Dynamic Programming & Adaptive Traffic Control
Contents Dynamic Programming Curse of Dimensionality Approximate Dynamic Programming Adaptive Traffic Signal Control
1. Dynamic Programming
What it does? –Sequential decision-making for discrete systems –Iterative computing rather than enumeration –Global optimality t0t0 t1t1 t2t2 t3t3 t m-2 tmtm t m-1 Stage 0 Stage m-1 tt
1. Dynamic Programming How it does? –DP decomposes a complex problem to a group of sub-problems called stage; and by recursively finding optimal solution at each stage, its solution converges to global optimality. –It can be mathematically interpreted as: By recursively computing for all i t at stage t.
2. Curse of Dimensionality
State Space –i t = ( i t (1), i t (2), …, i t (K) ) is K-dimensional, each i t (n) takes one of M i possible values, the total number of states at each step t is M i K Decision Space –u t = ( u t (0), u t (1), …, u t (N) ) is N-dimensional, each u t (n) may take M u possible values, the total number of eligible decision is M u N. Information Space –w t = ( w t (1), w t (2), …, w t (L) ) is L-dimensional, each w t (n) takes one of M w possible values, the size of information space is M w L
2. Curse of Dimensionality Three curses of dimensionality Computational demand is In the case that K=10, L=5, and N=5, and M i K = M w L = M u N =10, the total computational demand is state information decision
3. Approximate Dynamic Programming
What it does? –Reduce computational demand How it does? –Model approximation Models describe system dynamics. Complex system is hard to model and may be partially observable –Policy approximation Parameterisation that captures the relationship between control policy and state variables –Function approximation Parameterisation of value function
3. Approximate Dynamic Programming Approximatio n Equation Parameterisation of value function
3. Approximate Dynamic Programming Progressive update of approximation function
4. ADP in Adaptive Traffic Signal Control
4. Adaptive Traffic Signals Adaptive traffic signal control is a complex problem Real-time dynamic decision-making reduces vehicle delays and stops substantially
4. Adaptive Traffic Signals Sensing Control Real world
4. Adaptive Traffic Signals Numerical example LinkL1L2L6L7 Flow rate (vehicles per hour) DownstreamL3 L5L8L4L8 Turning ratio100%25%75%100%25%75%
4. Adaptive Traffic Signals Signal sequences Link 7 Signal 5 Link 6 Signal 4 Link 8 Signal Link 1 Signa l 1 Link 2 Signa l 2 Link 3 Signa l
4. Adaptive Traffic Signals Up to 60% reduction in vehicle delays in comparison with optimised fixed-time plans Fully adaptive and applicable to distributed network control Computation demand manageable by real-time systems
5. Conclusion Dynamic programming is the only exact solution to sequential decision-making for discrete systems DP is difficult for real-time control because of computational demand Approximation to DP can reduce dimensionality and therefore make problem-solving tractable ADP is a general framework in which various approximation architectures and machine learning techniques can be used Adaptive traffic signal controller using ADP demonstrated promising results in reducing vehicle delays
From imagination to impact