Chen Cai, Benjamin Heydecker Presentation for the 4th CREST Open Workshop Operation Research for Software Engineering Methods, London, 2010 Approximate.

Slides:



Advertisements
Similar presentations
Bayesian Belief Propagation
Advertisements

Software-defined networking: Change is hard Ratul Mahajan with Chi-Yao Hong, Rohan Gandhi, Xin Jin, Harry Liu, Vijay Gill, Srikanth Kandula, Mohan Nanduri,
Partially Observable Markov Decision Process (POMDP)
1 Approximated tracking of multiple non-rigid objects using adaptive quantization and resampling techniques. J. M. Sotoca 1, F.J. Ferri 1, J. Gutierrez.
In this handout Stochastic Dynamic Programming
Decision Making: An Introduction 1. 2 Decision Making Decision Making is a process of choosing among two or more alternative courses of action for the.
What Are Partially Observable Markov Decision Processes and Why Might You Care? Bob Wall CS 536.
Planning under Uncertainty
University of Minho School of Engineering Centre Algoritmi Uma Escola a Reinventar o Futuro – Semana da Escola de Engenharia - 24 a 27 de Outubro de 2011.
Lecturer: Sebastian Coope Ashton Building, Room G.18 COMP 201 web-page: Lecture.
Systems Engineering for Automating V&V of Dependable Systems John S. Baras Institute for Systems Research University of Maryland College Park
© 2003 Warren B. Powell Slide 1 Approximate Dynamic Programming for High Dimensional Resource Allocation NSF Electric Power workshop November 3, 2003 Warren.
Petroleum Reservoir Management Based on Approximate Dynamic Programming Zheng Wen, Benjamin Van Roy, Louis Durlofsky and Khalid Aziz Smart Field Consortium,
Part 4 b Forward-Backward Algorithm & Viterbi Algorithm CSE717, SPRING 2008 CUBS, Univ at Buffalo.
© 2004 Warren B. Powell Slide 1 Outline A car distribution problem.
Approximate Dynamic Programming for High-Dimensional Asset Allocation Ohio State April 16, 2004 Warren Powell CASTLE Laboratory Princeton University
Strategic Decisions Using Dynamic Programming
Network and Grid Computing –Modeling, Algorithms, and Software Mo Mu Joint work with Xiao Hong Zhu, Falcon Siu.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
The Islamic University of Gaza Faculty of Engineering Numerical Analysis ECIV 3306 Introduction.
Kalman Filtering Jur van den Berg. Kalman Filtering (Optimal) estimation of the (hidden) state of a linear dynamic process of which we obtain noisy (partial)
Nonlinear Stochastic Programming by the Monte-Carlo method Lecture 4 Leonidas Sakalauskas Institute of Mathematics and Informatics Vilnius, Lithuania EURO.
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
ADITI BHAUMICK ab3585. To use reinforcement learning algorithm with function approximation. Feature-based state representations using a broad characterization.
Source: NHI course on Travel Demand Forecasting (152054A) Session 10 Traffic (Trip) Assignment Trip Generation Trip Distribution Transit Estimation & Mode.
Wavelets Series Used to Solve Dynamic Optimization Problems Lizandro S. Santos, Argimiro R. Secchi, Evaristo. C. Biscaia Jr. Programa de Engenharia Química/COPPE,
Chapter 1 Introduction to Simulation
ECES 741: Stochastic Decision & Control Processes – Chapter 1: The DP Algorithm 1 Chapter 1: The DP Algorithm To do:  sequential decision-making  state.
REINFORCEMENT LEARNING LEARNING TO PERFORM BEST ACTIONS BY REWARDS Tayfun Gürel.
NW Computational Intelligence Laboratory Implementing DHP in Software: Taking Control of the Pole-Cart System Lars Holmstrom.
SIS Sequential Importance Sampling Advanced Methods In Simulation Winter 2009 Presented by: Chen Bukay, Ella Pemov, Amit Dvash.
Generalized and Bounded Policy Iteration for Finitely Nested Interactive POMDPs: Scaling Up Ekhlas Sonu, Prashant Doshi Dept. of Computer Science University.
Regional Traffic Simulation/Assignment Model for Evaluation of Transit Performance and Asset Utilization April 22, 2003 Athanasios Ziliaskopoulos Elaine.
An Overview of Dynamic Programming Seminar Series Joe Hartman ISE October 14, 2004.
Slide 1 Mixed Model Production lines  2000 C.S.Kanagaraj Mixed Model Production Lines C.S.Kanagaraj ( Kana + Garage ) IEM 5303.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.
MURI: Integrated Fusion, Performance Prediction, and Sensor Management for Automatic Target Exploitation 1 Dynamic Sensor Resource Management for ATE MURI.
Fuzzy Reinforcement Learning Agents By Ritesh Kanetkar Systems and Industrial Engineering Lab Presentation May 23, 2003.
Approximate Dynamic Programming Methods for Resource Constrained Sensor Management John W. Fisher III, Jason L. Williams and Alan S. Willsky MIT CSAIL.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Bernhard Friedrich Hanover, May 2001 Strategic Control in Metropolitan Areas Bernhard Friedrich Institute of Transport Engineering and Planning Hanover.
Akram Bitar and Larry Manevitz Department of Computer Science
Maximum a posteriori sequence estimation using Monte Carlo particle filters S. J. Godsill, A. Doucet, and M. West Annals of the Institute of Statistical.
Quality of model and Error Analysis in Variational Data Assimilation François-Xavier LE DIMET Victor SHUTYAEV Université Joseph Fourier+INRIA Projet IDOPT,
Dynamic Programming Discrete time frame Multi-stage decision problem Solves backwards.
Dynamic Programming. A Simple Example Capital as a State Variable.
D Nagesh Kumar, IIScOptimization Methods: M6L5 1 Dynamic Programming Applications Capacity Expansion.
Design and Analysis of Algorithms (09 Credits / 5 hours per week) Sixth Semester: Computer Science & Engineering M.B.Chandak
Benjamin N. Passow De Montfort University Leicester, UK The Power of Computational Intelligence Case study: iTRAQ Benjamin N. Passow,
Sporadic model building for efficiency enhancement of the hierarchical BOA Genetic Programming and Evolvable Machines (2008) 9: Martin Pelikan, Kumara.
Smart Sleeping Policies for Wireless Sensor Networks Venu Veeravalli ECE Department & Coordinated Science Lab University of Illinois at Urbana-Champaign.
DEPARTMENT/SEMESTER ME VII Sem COURSE NAME Operation Research Manav Rachna College of Engg.
Instructor: Spyros Reveliotis IE7201: Production & Service Systems Engineering Fall 2009 Closure.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Optimization-based Cross-Layer Design in Networked Control Systems Jia Bai, Emeka P. Eyisi Yuan Xue and Xenofon D. Koutsoukos.
Team  Spatially distributed deterministic models  Many hydrological phenomena vary spatially and temporally in accordance with the conservation.
Learning Analytics isn’t new Ways in which we might build on the long history of adaptive learning systems within contemporary online learning design Professor.
Computational Fluid Dynamics Lecture II Numerical Methods and Criteria for CFD Dr. Ugur GUVEN Professor of Aerospace Engineering.
Dynamic Programming - DP December 18, 2014 Operation Research -RG 753.
Traffic Simulation L2 – Introduction to simulation Ing. Ondřej Přibyl, Ph.D.
Design and Analysis of Algorithms (09 Credits / 5 hours per week)
Comparing Dynamic Programming / Decision Trees and Simulation Techniques BDAuU, Prof. Eckstein.
Dynamic Programming Copyright © 2007 Pearson Addison-Wesley. All rights reserved.
PSG College of Technology
Lecture 5 Dynamic Programming
Design and Analysis of Algorithms (07 Credits / 4 hours per week)
Towards Neuromorphic Complexity Analysis
CS 416 Artificial Intelligence
Design and Analysis of Algorithms (04 Credits / 4 hours per week)
Presentation transcript:

Chen Cai, Benjamin Heydecker Presentation for the 4th CREST Open Workshop Operation Research for Software Engineering Methods, London, 2010 Approximate Dynamic Programming & Adaptive Traffic Control

Contents Dynamic Programming Curse of Dimensionality Approximate Dynamic Programming Adaptive Traffic Signal Control

1. Dynamic Programming

What it does? –Sequential decision-making for discrete systems –Iterative computing rather than enumeration –Global optimality t0t0 t1t1 t2t2 t3t3 t m-2 tmtm t m-1 Stage 0 Stage m-1 tt

1. Dynamic Programming How it does? –DP decomposes a complex problem to a group of sub-problems called stage; and by recursively finding optimal solution at each stage, its solution converges to global optimality. –It can be mathematically interpreted as: By recursively computing for all i t at stage t.

2. Curse of Dimensionality

State Space –i t = ( i t (1), i t (2), …, i t (K) ) is K-dimensional, each i t (n) takes one of M i possible values, the total number of states at each step t is M i K Decision Space –u t = ( u t (0), u t (1), …, u t (N) ) is N-dimensional, each u t (n) may take M u possible values, the total number of eligible decision is M u N. Information Space –w t = ( w t (1), w t (2), …, w t (L) ) is L-dimensional, each w t (n) takes one of M w possible values, the size of information space is M w L

2. Curse of Dimensionality Three curses of dimensionality Computational demand is In the case that K=10, L=5, and N=5, and M i K = M w L = M u N =10, the total computational demand is state information decision

3. Approximate Dynamic Programming

What it does? –Reduce computational demand How it does? –Model approximation Models describe system dynamics. Complex system is hard to model and may be partially observable –Policy approximation Parameterisation that captures the relationship between control policy and state variables –Function approximation Parameterisation of value function

3. Approximate Dynamic Programming Approximatio n Equation Parameterisation of value function

3. Approximate Dynamic Programming Progressive update of approximation function

4. ADP in Adaptive Traffic Signal Control

4. Adaptive Traffic Signals Adaptive traffic signal control is a complex problem Real-time dynamic decision-making reduces vehicle delays and stops substantially

4. Adaptive Traffic Signals Sensing Control Real world

4. Adaptive Traffic Signals Numerical example LinkL1L2L6L7 Flow rate (vehicles per hour) DownstreamL3 L5L8L4L8 Turning ratio100%25%75%100%25%75%

4. Adaptive Traffic Signals Signal sequences Link 7 Signal 5 Link 6 Signal 4 Link 8 Signal Link 1 Signa l 1 Link 2 Signa l 2 Link 3 Signa l

4. Adaptive Traffic Signals Up to 60% reduction in vehicle delays in comparison with optimised fixed-time plans Fully adaptive and applicable to distributed network control Computation demand manageable by real-time systems

5. Conclusion Dynamic programming is the only exact solution to sequential decision-making for discrete systems DP is difficult for real-time control because of computational demand Approximation to DP can reduce dimensionality and therefore make problem-solving tractable ADP is a general framework in which various approximation architectures and machine learning techniques can be used Adaptive traffic signal controller using ADP demonstrated promising results in reducing vehicle delays

From imagination to impact