Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)

Slides:



Advertisements
Similar presentations
Lirong Xia Reinforcement Learning (2) Tue, March 21, 2014.
Advertisements

Markov Decision Process
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Batch RL Via Least Squares Policy Iteration
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Solving POMDPs Using Quadratically Constrained Linear Programs Christopher Amato.
Constraint Optimization Presentation by Nathan Stender Chapter 13 of Constraint Processing by Rina Dechter 3/25/20131Constraint Optimization.
SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.
1 Modeling and Optimization of VLSI Interconnect Lecture 9: Multi-net optimization Avinoam Kolodny Konstantin Moiseev.
Decision Theoretic Planning
An Analysis of Linear Models, Linear Value-Function Approximation, and Feature Selection for Reinforcement Learning Ronald Parr, Lihong Li, Gavin Taylor,
Steepest Decent and Conjugate Gradients (CG). Solving of the linear equation system.
Infinite Horizon Problems
Advanced MDP Topics Ron Parr Duke University. Value Function Approximation Why? –Duality between value functions and policies –Softens the problems –State.
Generalizing Plans to New Environments in Relational MDPs
Max-norm Projections for Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.
Generalizing Plans to New Environments in Multiagent Relational MDPs Carlos Guestrin Daphne Koller Stanford University.
Solving Factored POMDPs with Linear Value Functions Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.
Distributed Planning in Hierarchical Factored MDPs Carlos Guestrin Stanford University Geoffrey Gordon Carnegie Mellon University.
Markov Decision Processes
Reinforcement Learning Mitchell, Ch. 13 (see also Barto & Sutton book on-line)
Multiagent Planning with Factored MDPs Carlos Guestrin Stanford University.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
1 Hybrid Agent-Based Modeling: Architectures,Analyses and Applications (Stage One) Li, Hailin.
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.
Unsupervised Learning
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Optimal Fixed-Size Controllers for Decentralized POMDPs Christopher Amato Daniel.
Multiagent Planning with Factored MDPs Carlos Guestrin Daphne Koller Stanford University Ronald Parr Duke University.
Making Decisions CSE 592 Winter 2003 Henry Kautz.
Incentive-compatible Approximation Andrew Gilpin 10/25/07.
Summarized by Soo-Jin Kim
Principle Component Analysis (PCA) Networks (§ 5.8) PCA: a statistical procedure –Reduce dimensionality of input vectors Too many features, some of them.
A1A1 A4A4 A2A2 A3A3 Context-Specific Multiagent Coordination and Planning with Factored MDPs Carlos Guestrin Shobha Venkataraman Daphne Koller Stanford.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
MAKING COMPLEX DEClSlONS
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Planning and Execution with Phase Transitions Håkan L. S. Younes Carnegie Mellon University Follow-up paper to Younes & Simmons’ “Solving Generalized Semi-Markov.
AN ORTHOGONAL PROJECTION
Constraint Satisfaction Problems (CSPs) CPSC 322 – CSP 1 Poole & Mackworth textbook: Sections § Lecturer: Alan Mackworth September 28, 2012.
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Efficient Solution Algorithms for Factored MDPs by Carlos Guestrin, Daphne Koller, Ronald Parr, Shobha Venkataraman Presented by Arkady Epshteyn.
1 Factored MDPs Alan Fern * * Based in part on slides by Craig Boutilier.
1 Variable Elimination Graphical Models – Carlos Guestrin Carnegie Mellon University October 11 th, 2006 Readings: K&F: 8.1, 8.2, 8.3,
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Orthogonalization via Deflation By Achiya Dax Hydrological Service Jerusalem, Israel
© D. Weld and D. Fox 1 Reinforcement Learning CSE 473.
A Note on Rectangular Quotients By Achiya Dax Hydrological Service Jerusalem, Israel
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Regularization and Feature Selection in Least-Squares Temporal Difference Learning J. Zico Kolter and Andrew Y. Ng Computer Science Department Stanford.
Intro. ANN & Fuzzy Systems Lecture 14. MLP (VI): Model Selection.
Tractable Inference for Complex Stochastic Processes X. Boyen & D. Koller Presented by Shiau Hong Lim Partially based on slides by Boyen & Koller at UAI.
OR Chapter 7. The Revised Simplex Method  Recall Theorem 3.1, same basis  same dictionary Entire dictionary can be constructed as long as we.
Stochastic Optimization for Markov Modulated Networks with Application to Delay Constrained Wireless Scheduling Michael J. Neely University of Southern.
Decision Making Under Uncertainty Lec #9: Approximate Value Function UIUC CS 598: Section EA Professor: Eyal Amir Spring Semester 2006 Some slides by Jeremy.
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
Daphne Koller Overview Maximum a posteriori (MAP) Probabilistic Graphical Models Inference.
Kernelized Value Function Approximation for Reinforcement Learning Gavin Taylor and Ronald Parr Duke University.
Learning Deep Generative Models by Ruslan Salakhutdinov
Computation of the solutions of nonlinear polynomial systems
Non-additive Security Games
István Szita & András Lőrincz
Structured Models for Multi-Agent Interactions
6-3 Solving Systems Using Elimination
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Reinforcement Learning (2)
Outline Sparse Reconstruction RIP Condition
Reinforcement Learning (2)
Presentation transcript:

Projection Methods (Symbolic tools we have used to do…) Ron Parr Duke University Joint work with: Carlos Guestrin (Stanford) Daphne Koller (Stanford)

Overview Why? –MPDs need value functions –Value function approximation –“Good” approximate value functions How –Approximation architecture –Dot products in large state spaces –Expectation in large state spaces –Orthogonal projection –MAX in large state spaces

Why You Need Value Functions Given current configuration: Expected value of all widgets produced by factory Expected number of steps before failure

DBN - MDPs X Y Z State Variables Timett+1 Action

Adding rewards X Y Z tt+1 R1R1 Reward have small sets of parent variables too Total reward adds sub-rewards: R=R 1 +R 2 R2R2

Computing Values Value Function Symbolic transition model (DBN) Q: Does V have a convenient, compact form?

Compact Models = Compact V? X Y Z t t+1 R=+1 t+2t+3

Enter Value Function Approximation Not enough structure for exact, symbolic methods in many domains Our approach: –Combine symbolic methods with VFA –Define a restricted class of value functions –Find the “best” in that class –Bound error

Linearly Decomposable Value Functions Approximate high-dimensional functions with a combination of lower-dimensional functions Motivation: Multi-attribute utility theory (Keeney & Raifa) Note: Overlapping is allowed!

Decomposable Value Functions Each has a domain of a small set of variables Each a feature of a complex system –status of a machine –inventory of a store Also: think of each as a basis function Linear combination of functions:

Matrix Form Note for linear Algebra fans: is a linear function in the column space of h 1 …h k K basis functions states h 1 (s1) h 2 (s1)... h 1 (s2) h 2 (s2)…. A= assigns a value to every state

Defining a fixed point Standard fixed point equation Projection operator Fixed point With approximation We use orthogonal projection to force V to have the desired form.

Solving for the fixed point Theorem: w has a solution for all but finitely many discount factors [Koller & Parr 00] Note: The existence of a solution is a weaker condition than the contraction property required for iterative, value iteration based methods. LSTD[Bratdke & Barto 96] O(k 2 n)

Key Operations Backprojection of a basis function: Dot product of two restricted domain basis functions If these two operations can be done efficiently: kx1kxk Solution Cost for k basis functions: matrix inversion kxk

Backprojection = 1-step Expectation Important: Single step of lookahead only - no more X Y Z

Efficient dot product Need to compute: e.g.: h 1 = f(x), h 2 = f(y)

Symbolic Linear VFA Incurs only 1 step worth representation blowup –Solve directly for fixed point –Contrast with bisimulation/structured DP Exact Iterative – representation grows with each step No a priori quality guarantees a posteriori quality guarantees

Error Bounds How are we doing?: Claim: Equivalent to maximizing sum of restricted domain functions Use a cost network (Dechter 99) (one-step lookahead expected value) (max one-step error)

Can use variable elimination to maximize over state space: [Bertele & Brioschi ‘72] Cost Networks A D BC As in Bayes nets, maximization is exponential in size of largest factor. NP-hard in general Here we need only 16, instead of 64 sum operations.

Checkpoint Starting with: –Factored model (DBN) –Restricted value function space (restricted domain basis functions) Find fixed point in restricted space Bound solution quality a posteriori But: Fixed point may not have lowest max norm error

Max-norm Error Minimization General max-norm error minimization Symbolic operation over large state spaces

Algorithm for finding: General Max-norm Error Minimization Solve by Linear Programming : [Cheney ’82]

Symbolic max-norm minimization For fixed weights w, compute max-norm: However, if basis and target are functions of only a few variables, we can do it efficiently! Cost Networks can maximize over large state spaces efficiently when function is factored:

Representing the Constraints Explicit representation is exponential (|S|=2 n ) : If basis and target are factored, can use Cost Networks to represent the constraints:

Conclusions Value function approximation w/error bounds Symbolic operations (no sampling!) Methods over large state spaces –Orthogonal Projection –Max-norm error minimization Tools over large state spaces –Expectation –Dot product –Max