An Optimal Learning Approach to Finding an Outbreak of a Disease Warren Scott Warren Powell

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Pattern Recognition and Machine Learning
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Fast Bayesian Matching Pursuit Presenter: Changchun Zhang ECE / CMR Tennessee Technological University November 12, 2010 Reading Group (Authors: Philip.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Visual Recognition Tutorial
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
AGC DSP AGC DSP Professor A G Constantinides© Estimation Theory We seek to determine from a set of data, a set of parameters such that their values would.
Pattern Classification, Chapter 3 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P.
Presenting: Assaf Tzabari
Machine Learning CMPT 726 Simon Fraser University
Discussion of Profs. Robins’ and M  ller’s Papers S.A. Murphy ENAR 2003.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 3 (part 1): Maximum-Likelihood & Bayesian Parameter Estimation  Introduction  Maximum-Likelihood Estimation  Example of a Specific Case  The.
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
Slide 1 Tutorial: Optimal Learning in the Laboratory Sciences Working with nonlinear belief models December 10, 2014 Warren B. Powell Kris Reyes Si Chen.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Computational Stochastic Optimization: Bridging communities October 25, 2012 Warren Powell CASTLE Laboratory Princeton University
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Statistical Decision Theory
Slide 1 Tutorial: Optimal Learning in the Laboratory Sciences The knowledge gradient December 10, 2014 Warren B. Powell Kris Reyes Si Chen Princeton University.
Kalman Filter (Thu) Joon Shik Kim Computational Models of Intelligence.
CIS 2033 based on Dekking et al. A Modern Introduction to Probability and Statistics Instructor Longin Jan Latecki C22: The Method of Least Squares.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
INFORMS Annual Meeting San Diego 1 HIERARCHICAL KNOWLEDGE GRADIENT FOR SEQUENTIAL SAMPLING Martijn Mes Department of Operational Methods for.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Center for Radiative Shock Hydrodynamics Fall 2011 Review Assessment of predictive capability Derek Bingham 1.
- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Machine Learning 5. Parametric Methods.
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Javad Azimi, Ali Jalali, Xiaoli Fern Oregon State University University of Texas at Austin In NIPS 2011, Workshop in Bayesian optimization, experimental.
Application of Dynamic Programming to Optimal Learning Problems Peter Frazier Warren Powell Savas Dayanik Department of Operations Research and Financial.
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Sequential Off-line Learning with Knowledge Gradients Peter Frazier Warren Powell Savas Dayanik Department of Operations Research and Financial Engineering.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Bayesian Optimization. Problem Formulation Goal  Discover the X that maximizes Y  Global optimization Active experimentation  We can choose which values.
Introduction We consider the data of ~1800 phenotype measurements Each mouse has a given probability distribution of descending from one of 8 possible.
Slide 1 Tutorial: Optimal Learning in the Laboratory Sciences The value of information December 10, 2014 Warren B. Powell Kris Reyes Si Chen Princeton.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CS479/679 Pattern Recognition Dr. George Bebis
Chapter 3: Maximum-Likelihood Parameter Estimation
Parameter Estimation 主講人:虞台文.
Alan Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani
Statistical Learning Dong Liu Dept. EEIS, USTC.
Propagating Uncertainty In POMDP Value Iteration with Gaussian Process
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
LECTURE 09: BAYESIAN LEARNING
LECTURE 07: BAYESIAN ESTIMATION
Parametric Methods Berlin Chen, 2005 References:
Learning From Observed Data
Applied Statistics and Probability for Engineers
Probabilistic Surrogate Models
Stochastic Methods.
Presentation transcript:

An Optimal Learning Approach to Finding an Outbreak of a Disease Warren Scott Warren Powell Source: Finding an Outbreak in Manhattan Manhattan Introduction We describe an optimal learning policy to sequentially decide on locations of a city to test for an outbreak of a particular disease. We use Gaussian process regression to model the level of the disease throughout the city, and then use the correlated knowledge gradient, which implicitly uses exploration versus exploitation concepts, to choose where to test next. The correlated knowledge gradient policy is a general framework that can be used to find the maximum of an expensive function with noisy observations. Modeling the Disease Level Approximate Knowledge Gradient Policy Simulation of Policy We next summarize how we sequentially choose where to test the level of the disease in order to find high levels of the disease quickly. The knowledge gradient policy as described in Frazier et al. (2007) for a discrete decision space, X, is the policy which chooses the next sampling decision by maximizing the expected incremental value of a measurement, We consider the following optimization problem where x is a p -dimensional decision vector which represents a physical location, and θ: R p → R is the function we wish to maximize which gives the level of disease at each location. Let ŷ n+1 be the sample observation of the sampling decision x n. Assume ŷ n +1 ~ Normal (θ( x n ), λ( x n )). The goal is to sequentially choose x n for n = 0,1,2,… in order to find the maximum of θ() as fast as possible. Adopting a Bayesian framework, we start with some information about the truth, θ. We treat θ as a realization of a random variable θ and assign a Gaussian process prior density to θ. The multivariate normal distribution is a conjugate family when the observations come from a normal distribution with known variance. We define Σ n ( u 1, u 2 ) = Cov[θ( u 1 ), θ( u 2 ) | F n ] and μ n ( u ) = E [θ( u ) | F n ]. We use the squared exponential covariance function for our Gaussian process prior,. The following equations summarize Gaussian process regression, giving the updated mean and variance of the disease level at some location x after n observations: We define the following approximation of the knowledge gradient which can easily be computed even when X is continuous, At each iteration we choose our sampling decision by maximizing the approximate knowledge gradient. We generate the true level of the disease across the city in the figure below. We start by knowing nothing and are then able to sequentially choose locations to test for the disease. When we sample a location we receive a noisy observation of the true disease level. Unknown True Level of the Disease across Manhattan. The following plots show the estimate of the disease level and the approximate knowledge gradient after 6 iterations of the approximate knowledge gradient policy. The locations with a larger approximate knowledge gradient are more valuable to measure. Below we plot the estimate of the disease level and the approximate knowledge gradient after 10 iterations. Summary The approximate knowledge gradient policy can be used to efficiently sequentially choose where to sample an expensive function with noise such as a disease level. December 9, After 10 observations our estimate of the disease level and the location of the outbreak is quite good.