Propagating Uncertainty In POMDP Value Iteration with Gaussian Process

Slides:

Advertisements

Similar presentations

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.

Advertisements

NORMAL OR GAUSSIAN DISTRIBUTION Chapter 5. General Normal Distribution Two parameter distribution with a pdf given by:

Copula Regression By Rahul A. Parsa Drake University &

Markov Decision Process

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.

SA-1 Probabilistic Robotics Planning and Control: Partially Observable Markov Decision Processes.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.

Planning under Uncertainty

POMDPs: Partially Observable Markov Decision Processes Advanced AI

Classification and risk prediction

Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.

AGC DSP AGC DSP Professor A G Constantinides© Estimation Theory We seek to determine from a set of data, a set of parameters such that their values would.

CS 547: Sensing and Planning in Robotics Gaurav S. Sukhatme Computer Science Robotic Embedded Systems Laboratory University of Southern California

An Optimal Learning Approach to Finding an Outbreak of a Disease Warren Scott Warren Powell

Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.

Estimation and the Kalman Filter David Johnson. The Mean of a Discrete Distribution “I have more legs than average”

EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013.

SIS Sequential Importance Sampling Advanced Methods In Simulation Winter 2009 Presented by: Chen Bukay, Ella Pemov, Amit Dvash.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.

Chapter 5.6 From DeGroot & Schervish. Uniform Distribution.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

A Tutorial on the Partially Observable Markov Decision Process and Its Applications Lawrence Carin June 7,2006.

General ideas to communicate Dynamic model Noise Propagation of uncertainty Covariance matrices Correlations and dependencs.

Gaussian Processes Li An Li An

HMM - Part 2 The EM algorithm Continuous density HMM.

Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.

Computer Vision Lecture 6. Probabilistic Methods in Segmentation.

Lecture 2: Statistical learning primer for biologists

Dropout as a Bayesian Approximation

Machine Learning 5. Parametric Methods.

Lecture 3: MLE, Bayes Learning, and Maximum Entropy

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

- 1 - Preliminaries Multivariate normal model (section 3.6, Gelman) –For a multi-parameter vector y, multivariate normal distribution is where  is covariance.

Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.

Data Modeling Patrice Koehl Department of Biological Sciences

CS479/679 Pattern Recognition Dr. George Bebis

Statistical Estimation

Chapter 7. Classification and Prediction

Chapter 3: Maximum-Likelihood Parameter Estimation

Probability Theory and Parameter Estimation I

LECTURE 11: Advanced Discriminant Analysis

ASEN 5070: Statistical Orbit Determination I Fall 2014

ICS 280 Learning in Graphical Models

Model Inference and Averaging

POMDPs Logistics Outline No class Wed

Ch3: Model Building through Regression

CH 5: Multivariate Methods

Special Topics In Scientific Computing

Outline Parameter estimation – continued Non-parametric methods.

Course Outline MODEL INFORMATION COMPLETE INCOMPLETE

Hidden Markov Models Part 2: Algorithms

Statistical Models for Automatic Speech Recognition

POINT ESTIMATOR OF PARAMETERS

EC 331 The Theory of and applications of Maximum Likelihood Method

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

10701 / Machine Learning Today: - Cross validation,

EE513 Audio Signals and Systems

EM for Inference in MV Data

Parametric Methods Berlin Chen, 2005 References:

Multivariate Methods Berlin Chen

EM for Inference in MV Data

Multivariate Methods Berlin Chen, 2005 References:

CS 416 Artificial Intelligence

Reinforcement Learning Dealing with Partial Observability

Topic 11: Matrix Approach to Linear Regression

Applied Statistics and Probability for Engineers

Probabilistic Surrogate Models

Presentation transcript:

Propagating Uncertainty In POMDP Value Iteration with Gaussian Process Written by Eric Tuttle and Zoubin Ghahramani Presenter by Hui Li May 20, 2005

Outline: Framework of POMDP Framework of Gaussian Process Gaussian Process Value Iteration Results Conclusions

Framework of POMDP The POMDP is defined by the tuple < S, A, T, R, ,O > S is a finite set of states of the world. A is a finite set of actions. T: SA  (S) is the state-transition function, the probability of an action changing the the world state from one to another,T(s, a, s’). R: SA   is the reward for the agent in a given world state after performing an action, R(s, a).  is a finite set of observations. O: SA  () is the observation function, the probability of making a certain observation after performing a particular action, landing in state s’, O(s’, a, o).

POMDP agent can be decomposed into two parts: a state estimator (SE) and a policy (). WORLD Observation Action o b a

The goal of a POMDP agent is to select actions which maximize its expected total sum of future rewards . Two functions are most often used in reinforcement learning algorithms: value function (function of state) Optimal value function: Q function (function of state-action) Optimal value function:

The key assumption of POMDP is that the state is unknown, partially observable. We rely on the concept of a belief state, denoted b, to represent a probability distribution over states. The belief is a sufficient statistic for a given history:

After taking an action a and seeing an observation o, the agent updates its belief state using Bayes’ rule:

Bellman’s equations for POMDP are as follows: Bellman’s equations for value function Bellman’s equations for Q function

Framework of Gaussian Process regression A Gaussian process regressor defines a distribution over possible functions that could fit the data. In particular, the distribution of a function y(x) is a Gaussian process if the probability density p(y(x1), y(x2), …, y(xN)) for any finite set of points {x1,…, xN} is a multivariate Gaussian.

Assume we have a Gaussian process with mean 0 and covariance function K(xi, xj). Suppose we have observed a set of training points and target function values D = {(xn,tn), n = 1,…, N}.  ~ N(0, ), a Gaussian noise. Then C =  + K. With a new data x’, we have

One general choice of covariance function is: With W a diagonal matrix. : expected amplitude of the function : a bias term that accommodates non-zero-mean functions. Using maximum likelihood or MAP methods, the parameters of the covariance function can be tuned.

Gaussian Process Value Iteration Q function: Model each of the action value functions Q( ,a) as a Gaussian process. According to the definition of the Gaussian process, Qt-1( ,a) is a multivariate normal distribution with mean a,bo and covariance a,bo. The major problem in computing the distribution Qt(b ,a) is the max operator.

Two approximate ways for dealing with the max operator: 1. Approximate the max operator as simply passing through the random variable with the highest mean. Where

2. Take into account the effects of the max operator, but ignore correlations among the function values. If q1 and q2 are independent with distributions: Then first two moments of variable q = max( q1, q2) are given: Where  is the cdf and  is the pdf for a zero mean, unit variance normal. And q can be approximate using a Gaussian distribution.

Based on that, we can use a Gaussian distribution to approximate Both methods produce a Gaussian approximation for the max of a set of normally distributed vectors. And since Qta is related to Qt-1* by a linear transformation , we have:

Results

Conclusions In this paper, authors presented an algorithms – Gaussian processes for approximate value iteration in POMDPs. The results using GP are comparable to that of the classical methods.