Javad Azimi, Ali Jalali, Xiaoli Fern Oregon State University University of Texas at Austin In NIPS 2011, Workshop in Bayesian optimization, experimental.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Pattern Recognition and Machine Learning
Rachel T. Johnson Douglas C. Montgomery Bradley Jones
Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.
Pattern Recognition and Machine Learning
Ch11 Curve Fitting Dr. Deshi Ye
Chapter 4: Linear Models for Classification
Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.
Visual Recognition Tutorial
On Systems with Limited Communication PhD Thesis Defense Jian Zou May 6, 2004.
10-1 Introduction 10-2 Inference for a Difference in Means of Two Normal Distributions, Variances Known Figure 10-1 Two independent populations.
L15:Microarray analysis (Classification) The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Lecture 5: Learning models using EM
Statistical Decision Theory, Bayes Classifier
An Optimal Learning Approach to Finding an Outbreak of a Disease Warren Scott Warren Powell
Nonmyopic Active Learning of Gaussian Processes An Exploration – Exploitation Approach Andreas Krause, Carlos Guestrin Carnegie Mellon University TexPoint.
Maximum Likelihood (ML), Expectation Maximization (EM)
Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 7: Coding and Representation 1 Computational Architectures in.
Selective Sampling on Probabilistic Labels Peng Peng, Raymond Chi-Wing Wong CSE, HKUST 1.
Biointelligence Laboratory, Seoul National University
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Bayesian Optimization with Experimental Constraints Javad Azimi Advisor: Dr. Xiaoli Fern PhD Proposal Exam April
Gaussian process modelling
Myopic Policies for Budgeted Optimization with Constrained Experiments Javad Azimi, Xiaoli Fern, Alan Fern Oregon State University AAAI, July
Go to Index Analysis of Means Farrokh Alemi, Ph.D. Kashif Haqqi M.D.
Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.
Model Inference and Averaging
1 Physical Experiments & Computer Experiments, Preliminaries Chapters 1&2 “Design and Analysis of Experiments“ by Thomas J. Santner, Brian J. Williams.
Learning Theory Reza Shadmehr logistic regression, iterative re-weighted least squares.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Model-based Bayesian Reinforcement Learning in Partially Observable Domains by Pascal Poupart and Nikos Vlassis (2008 International Symposium on Artificial.
Brian Macpherson Ph.D, Professor of Statistics, University of Manitoba Tom Bingham Statistician, The Boeing Company.
Center for Radiative Shock Hydrodynamics Fall 2011 Review Assessment of predictive capability Derek Bingham 1.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Biointelligence Laboratory, Seoul National University
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
Hypothesis Testing. Suppose we believe the average systolic blood pressure of healthy adults is normally distributed with mean μ = 120 and variance σ.
Application of Dynamic Programming to Optimal Learning Problems Peter Frazier Warren Powell Savas Dayanik Department of Operations Research and Financial.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.
Bayesian Optimization. Problem Formulation Goal  Discover the X that maximizes Y  Global optimization Active experimentation  We can choose which values.
CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.
Budgeted Optimization with Concurrent Stochastic-Duration Experiments
Probability Theory and Parameter Estimation I
Latent Variables, Mixture Models and EM
Statistical Learning Dong Liu Dept. EEIS, USTC.
Propagating Uncertainty In POMDP Value Iteration with Gaussian Process
Information Based Criteria for Design of Experiments
More about Posterior Distributions
Online Learning Kernels
Summarizing Data by Statistics
Biointelligence Laboratory, Seoul National University
Machine Learning: Lecture 6
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
Machine Learning: UNIT-3 CHAPTER-1
Applied Statistics and Probability for Engineers
Probabilistic Surrogate Models
Uncertainty Propagation
Presentation transcript:

Javad Azimi, Ali Jalali, Xiaoli Fern Oregon State University University of Texas at Austin In NIPS 2011, Workshop in Bayesian optimization, experimental design and bandits: Theory and applications 1 Dynamic Batch Bayesian Optimization

Bayesian Optimization (BO) 2 Finding the Maximizer of an unknown function by requesting a small set of function evaluations (experiments) – experiments are costly BO assumes prior over – select next experiment based on posterior Current Experiments Gaussian Process Surface Select Single/Multiple Experiment Run Experiment(s)

Traditional Approaches 3 Sequential: Only one experiment is selected at each iteration Pros: Performance is optimized Cons: Can be very costly when running one experiment takes long time Batch: experiments are selected at each iteration Pros: times speed-up comparing to sequential approaches Cons: Can not performs as well as sequential algorithms

Batch Performance (Azimi et.al NIPS 2010) 4 Given a sequential policy, it chooses a batch of samples which are likely to be selected by the sequential policy. k=5 k=10

Motivation 5 Given a sequential policy, is it possible to simultaneously, select a batch of experiments approximately preserve the sequential policy performance. Size of the batch can change at each time step Dynamic batch size

Proposed Idea: Big Picture 6 Based on a given prior (blue circles) and an objective function (G), is selected To select the next experiment,, we need, which is not available The statistics of the samples inside the red circle are expected to change after observing at Set the G values for all samples inside the red circle as their upper bound value If the next selected experiment is outside of the red circle, we claim it is independent from. x1x1 x2x2 x3x3

Problems 7 Which samples statistics are changed after selecting an/a set of experiment? How can we upper bound the objective function G?

Gaussian Process (GP) 8 GP is used to model the posterior over the unobserved samples in BO Statistical prediction for each point by a normal random variable rather than deterministic prediction The posterior variance is independent from the observation

Definition 9 z z z Unobserved set of points Corresponding Outputs Any point :

GP Theorems 10

Expected Improvement (EI) 11 Our algorithm inputs a sequential policy to compete with. We choose Expected Improvement (EI) as criterion our approach extends to other policies EI simply computes the expected improvement after sampling at each point

Dynamic Batch 12 samples are asked at each iteration. if the selected samples are independent from each other. The first selected sample, is the same as sequential Choice of the second point depends on Setting (maximum possible value) EI of the next step is upper bounded The next sample is selected, if it is not inside the red circle (not significantly effected by )

Dynamic Batch: Algorithm 13

Experimental Results: Setting 14 GP with squared exponential kernel is used as the model We set n l = 20(total number of experiments), and n b =5 (maximum batch size) The average regret over 100 independent runs is reported where regret is: Speedup of each framework is reported which is the percentage of experiments asked in batch mode. ε=0.02 for 2-3 dimensional and 0.2 for higher dimensional frameworks An alternative and more realistic approach is to set M=(1+ α) y m which means (100* α)% improvement at each iteration.

Experimental Results: Results 15

Experimental Results: Speedup vs Budget 16

Conclusion and Future works 17 Conclusion The proposed dynamic batch approach selects variable number of experiments at each step The selected experiments are approximately independent from each other The proposed approach approximately preserves the sequential performance Future Works Theoretical analysis of the distance between selected samples in batch and sequential approach. The analysis of choose of epsilon in performance