Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bayesian Learning & Estimation Theory. Maximum likelihood estimation L = Example: For Gaussian likelihood P(x| ) = N (x|, 2 ), Objective of regression:

Similar presentations


Presentation on theme: "Bayesian Learning & Estimation Theory. Maximum likelihood estimation L = Example: For Gaussian likelihood P(x| ) = N (x|, 2 ), Objective of regression:"— Presentation transcript:

1 Bayesian Learning & Estimation Theory

2 Maximum likelihood estimation L = Example: For Gaussian likelihood P(x| ) = N (x|, 2 ), Objective of regression: Minimize error E(w) = ½ n ( t n - y(x n,w) ) 2

3 A probabilistic view of linear regression Compare to error function: E(w) = ½ n ( t n - y(x n,w) ) 2 Since argmin w E(w) = argmax w, regression is equivalent to ML estimation of w Precision =1/ 2

4 Bayesian learning View the data D and parameter as random variables (for regression, D = ( x, t ) and = w ) The data induces a distribution over the parameter: P( | D ) = P( D, ) / P( D ) P( D, ) Substituting P( D, ) = P( D | ) P( ), we obtain Bayes theorem: P( | D ) P( D | ) P( ) Posterior Likelihood x Prior

5 Bayesian prediction Predictions (eg, predict t from x using data D ) are mediated through the parameter: P(prediction| D ) = P(prediction| ) P( | D ) d Maximum a posteriori (MAP) estimation: MAP = argmax P( | D ) P(prediction| D ) P(prediction| MAP ) –Accurate when P( | D ) is concentrated on MAP

6 A probabilistic view of regularized regression E(w) = ½ n ( t n - y(x n,w) ) 2 + / 2 m w m 2 Prior: w s are IID Gaussian p(w) = m (1/ 2 -1 ) exp{- w m 2 / 2 } Since argmin w E(w) = argmax w p(t|x,w) p(w), regularized regression is equivalent to MAP estimation of w ln p(w)ln p(t|x,w)

7 Bayesian linear regression Likelihood: – specifies precision of data noise Prior: – specifies precision of weights Posterior: –This is an M+1 dimensional Gaussian density Prediction: m = 0 M w m | 0, -1 Computed using linear algebra (see textbook)

8 Example: y(x) = w 0 + w 1 x No data 1 st point 2 nd point 20 th point... Data Posterior y(x) sampled from posterior Prior Likelihood

9 Example: y(x) = w 0 + w 1 x + … + w M x M M = 9, = 5 x : Gives a reasonable range of functions = 11.1 : Known precision of noise Mean and one std dev of the predictive distribution

10 Example: y(x) = w 0 + w 1 1 (x) + … + w M M (x) Gaussian basis functions: 0 1

11 How are we doing on the pass sequence? Least squares regression… Hand-labeled horizontal coordinate, t The red line doesnt reveal different levels of uncertainty in predictions Cross validation reduced the training data, so the red line isnt as accurate as it should be Choosing a particular M and w seems wrong – we should hedge our bets

12 How are we doing on the pass sequence? Hand-labeled horizontal coordinate, t The red line doesnt reveal different levels of uncertainty in predictions Cross validation reduced the training data, so the red line isnt as accurate as it should be Choosing a particular M and w seems wrong – we should hedge our bets Hand-labeled horizontal coordinate, t Bayesian regression

13 Estimation theory Provided with a predictive distribution p(t|x), how do we estimate a single value for t ? –Example: In the pass sequence, Cupid must aim at and hit the man in the white shirt, without hitting the man in the striped shirt Define L(t,t*) as the loss incurred by estimating t* when the true value is t Assuming p(t|x) is correct, the expected loss is E[L] = t L(t,t*) p(t|x) dt The minimum loss estimate is found by minimizing E[L] w.r.t. t*

14 Squared loss A common choice: L(t,t*) = ( t - t* ) 2 E[L] = t ( t - t* ) 2 p(t|x) dt –Not appropriate for Cupids problem To minimize E[L], set its derivative to zero: dE[L]/dt* = -2 t ( t - t* ) p(t|x) dt = 0 -2 t t p(t|x)dt + t* = 0 Minimum mean squared error (MMSE) estimate: t* = E[t|x] = t t p(t|x)dt For regression: t* = y(x,w)

15 Other loss functions Squared loss Absolute loss

16 L = |t*-t 1 | + |t*-t 2 | + |t*-t 3 | + |t*-t 4 | + |t*-t 5 | + |t*-t 6 | + |t*-t 7 | Consider moving t* to the left by –L decreases by 6 and increases by –Changes in L are balanced when t* = t 4 The median of t under p(t|x) minimizes absolute loss Important: The median is invariant to monotonic transformations of t t t* t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 Mean and median Median Mean

17 D -dimensional estimation Suppose t is D -dimensional, t = (t 1,…,t D ) –Example: 2-dimensional tracking Approach 1: Minimum marginal loss estimation – Find t d * that minimizes t L(t d,t d *) p(t d |x) dt d Approach 2: Minimum joint loss estimation –Define joint loss L(t,t*) –Find t* that minimizes t L(t,t*) p(t|x) dt

18 Questions?

19 Feature, x Hand-labeled horizontal coordinate, t Compute 1 st moment: x = 224 How are we doing on the pass sequence? Bayesian regression and estimation enables us to track the man in the striped shirt based on labeled data Can we track the man in the white shirt? t = Horizontal location Fraction of pixels in column with intensity > 0.9 Man in white shirt is occluded

20 How are we doing on the pass sequence? Bayesian regression and estimation enables us to track the man in the striped shirt based on labeled data Can we track the man in the white shirt? Not very well. Feature, x Hand-labeled horizontal coordinate, t Regression fails to identify that there really are two classes of solution

21


Download ppt "Bayesian Learning & Estimation Theory. Maximum likelihood estimation L = Example: For Gaussian likelihood P(x| ) = N (x|, 2 ), Objective of regression:"

Similar presentations


Ads by Google