Download presentation

Published byEdwin Mabbitt Modified over 2 years ago

1
**From Cognitive Science and Machine Learning Summer School 2010**

Multi-armed Bandit Problem and Bayesian Optimization in Reinforcement Learning From Cognitive Science and Machine Learning Summer School 2010 Loris Bazzani

2
Outline Summer School

3
Outline Summer School

4
**Outline Presentation What are Machine Learning and Cognitive Science?**

How are they related each other? Reinforcement Learning Background Discrete case Continuous case

5
**Outline Presentation What are Machine Learning and Cognitive Science?**

How are they related each other? Reinforcement Learning Background Discrete case Continuous case

6
**What is Machine Learning (ML)?**

Endow computers with the ability to “learn” from “data” Present data from sensors, the internet, experiments Expect computer to make decisions Traditionally categorized as: Supervised Learning: classification, regression Unsupervised Learning: dimensionality reduction, clustering Reinforcement Learning: learning from feedback, planning From N. Lawrence slides

7
**What is Cognitive Science (CogSci)?**

How does the mind get so much out of so little? Rich models of the world Make strong generalizations Process of reverse engineering of the brain Create computational models of the brain Much of cognition involves induction: finding patterns in data From N. Chater slides

8
**Outline Presentation What are Machine Learning and Cognitive Science?**

How are they related each other? Reinforcement Learning Background Discrete case Continuous case

9
**Link between CogSci and ML**

ML takes inspiration from psychology, CogSci and computer science Rosenblatt’s Perceptron Neural Networks … CogSci uses ML as engineering toolkit Bayesian inference in generative models Hierarchical probabilistic models Approximated methods of learning and inference

10
**Probabilistic grammars**

Human learning Categorization Causal learning Function learning Representations Language Experiment design … Machine learning Density estimation Graphical models Regression Nonparametric Bayes Probabilistic grammars Inference algorithms …

11
**Outline Presentation What are Machine Learning and Cognitive Science?**

How are they related each other? Reinforcement Learning Background Discrete case Continuous case

12
… Psicologia comportamentale

14
**Outline Presentation What are Machine Learning and Cognitive Science?**

How are they related each other? Reinforcement Learning Background Discrete case Continuous case

15
**Multi-armed Bandit Problem [Auer et al. ‘95]**

I wanna win a lot of cash!

16
**Multi-armed Bandit Problem [Auer et al. ‘95]**

Trade-off between Exploration and Exploitation Adversary controls payoffs No statistical assumptions on the rewards distribution Performances measurement: Regret = Player Reward – Best Reward Upper Bound on the Expected Regret

17
**Multi-armed Bandit Problem [Auer et al. ‘95]**

Reward(s) Sequence of Trials Actions Goal: define a probability distribution over

18
**The Full Information Game [Freund & Shapire ‘95]**

Regret Bound: Problem: Compute the reward for each action!

19
**The Partial Information Game**

Exp3 = Exponential-weight algorithm for Exploration and Exploitation Bound for certain values of and depending on the best reward Tries out all the possible actions Update only the selected action

20
**The Partial Information Game**

Exp3.1 = Exp3 with rounds, where a round consists of a sequence of trials Each round guesses a bound for the total reward of the best action Bound:

21
**The Partial Information Game with experts**

Player have a set of strategies for choosing the best action External advise is given by “experts” Goal: combine them in a such way that its total reward is close to the best expert Expert could be seen as a “meta-action” in a higher-level bandit problem

22
**Multi-armed Bandit Problem with experts [Auer et al. ‘95]**

Actions Reward(s) Sequence of Trials

23
**The Partial Information Game with experts**

Exp4 = Exponential-weight algorithm for Exploration and Exploitation using Expert advise Act on the distribution over experts Same as Exp3

24
Applications [Hedge]

25
**Applications [Hedge] [Bazzani et al. ‘10]**

26
**Outline Presentation What are Machine Learning and Cognitive Science?**

How are they related each other? Reinforcement Learning Background Discrete case Continuous case

27
**Bayesian Optimization [Brochu et al. ‘10]**

Optimize a nonlinear function over a set: Function that gives rewards actions Classic Optimization Tools Bayesian Optimization Tools Known math representation Convex Evaluation of the function on all the points Not close-form expression Not convex Evaluation of the function only on one point gets noisy response Bayesian optimization is a powerful strategy for finding the extrema of objective functions that are expensive to evaluate. It is applicable in situations where one does not have a closed-form expression for the objective function, but where one can obtain observations (possibly noisy) of this function at sampled values. It is particularly useful when these evaluations are costly, when one does not have access to derivatives, or when the problem at hand is non-convex.

28
**Bayesian Optimization [Brochu et al. ‘10]**

Uses the Bayesian Theorem where Posterior: our updated beliefs about the unknown objective function Prior: our beliefs about the space of possible objective functions Likelihood: given what we think we know about the prior, how likely is the data we have seen? - Bayes' theorem", which states (simplifying somewhat) that the posterior probability of a model (or theory, or hypothesis) M given evidence (or data, or observations) E is proportional to the likelihood of E given M multiplied by the prior probability of M Although the cost function is unknown, it is reasonable to assume that there exists prior knowledge about some of its properties, such as smoothness, and this makes some possible objective functions more plausible than others Goal: maximize the posterior at each step, so that each new evaluation decreases the distance between the true global maximum and the expected maximum given the model.

29
**Bayesian Optimization [Brochu et al. ‘10]**

30
**Priors over Functions Convergence conditions of BO:**

The acquisition function is continuous and approximately minimizes the risk Conditional variance converges to zero The objective is continuous The prior is homogeneous The optimization is independent of the m-th differences -Risk = expected deviation from the global minimum at a fixed point x -Conditional variance = variance of f given D_{1:t} Many model could be used for this prior (e.g., Weiner process), but Gaussian processes assure other interesting conditions that make stronger convergence -Homogeneous = S(tx) = t^m S(x) for all t>0 -m-th differences = optimization is independent taking time t with the same m Guaranteed by Gaussian Processes (GP)

31
Priors over Functions GP = extension of the multivariate Gaussian distribution to an infinite dimension stochastic process Any finite linear combination of samples will be normally distributed Defined by its mean function and covariance function Stochastic process = Instead of dealing with only one possible reality of how the process might evolve under time (as is the case, for example, for solutions of an ordinary differential equation), in a stochastic or random process there is some indeterminacy in its future evolution described by probability distributions. Focus on defining the covariance function

32
Why use GPs? Assume zero-mean GP, function values are drawn according to , where When a new observation comes Using Sherman-Morrison-Woodbury formula Message: GP predictions are easy to compute

33
**Choice of Covariance Functions**

Isotropic model with hyperparameter Squared Exponential Kernel Mater Kernel Gamma function Bessel function

34
**Acquisition Functions**

The role of the acquisition function is to guide the search for the optimum and the uncertainty is great Assumption: Optimize the acquisition function is simple and cheap Goal: high acquisition corresponds to potentially high values of the objective function Exploration and Exploitation trade-off DIRECT (Divide the feasible space into finer RECTangles) for optimizing the acquisition function Maximizing the probability of improvement

35
**Acquisition Functions**

Expected improvement Confidence bound criterion CDF and PDF of normal distribution With several different parameterized acquisition functions in the literature, it is often unclear which one to use

36
Applications [BO] Learn a set of robot gait parameters that maximize velocity of a Sony AIBO ERS-7 robot Find a policy for robot path planning that would minimize uncertainty about its location and heading Select the locations of a set of sensors (e.g., cameras) in a dynamic system

37
**Take-home Message ML and CogSci are connected**

Reinforcement Learning is useful for optimization when dealing with temporal information Discrete case: Multi-armed bandit problem Continuous case: Bayesian optimization We can employ these techniques for Computer Vision and System Control problems

38
**[Abbeel et al. 2007] http://heli.stanford.edu/**

Acrobazie sono note (date da utente). L’obiettivo e` quello di far imparare all’elicottero a settare I giusti parametri di velocita` e accelerazione per fargliele fare.

39
Some References P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire Gambling in a rigged casino: The adversarial multi-armed bandit problem. FOCS '95. Yoav Freund and Robert E. Schapire A decision-theoretic generalization of on-line learning and an application to boosting. EuroCOLT '95. Eric Brochu, Vlad Cora and Nando de Freitas A Tutorial on Bayesian Optimization of Expensive Cost Functions, with Application to Active User Modeling and Hierarchical Reinforcement Learning. Technical Report TR UBC. Loris Bazzani, Nando de Freitas and Jo-Anne Ting Learning attentional mechanisms for simultaneous object tracking and recognition with deep networks. NIPS 2010 Deep Learning and Unsupervised Feature Learning Workshop. Carl Edward Rasmussen and Christopher K. I. Williams Gaussian Processes for Machine Learning. The MIT Press. Pieter Abbeel, Adam Coates, Morgan Quigley, and Andrew Y. Ng An Application of Reinforcement Learning to Aerobatic Helicopter Flight. NIPS 2007.

Similar presentations

OK

Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification Introduction Parametric classifiers Semi-parametric.

Speech Recognition Pattern Classification. 22 September 2015Veton Këpuska2 Pattern Classification Introduction Parametric classifiers Semi-parametric.

© 2018 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Science ppt on electricity for class 10th Ppt on central limit theorem proof Download ppt on oxidation and reduction definitions Ppt on high voltage engineering lecture Ppt on tcp/ip protocol driver Ppt on tata trucks specifications Ppt on saving endangered species Ppt on advanced bluetooth technology Ppt on cross docking definition Ppt on employee motivation and job satisfaction