Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bayesian Learning Part 3+/- σ. Administrivia Final project/proposal Hand-out/brief discussion today Proposal due: Mar 27 Midterm exam: Thurs, Mar 22 (Thurs.

Similar presentations


Presentation on theme: "Bayesian Learning Part 3+/- σ. Administrivia Final project/proposal Hand-out/brief discussion today Proposal due: Mar 27 Midterm exam: Thurs, Mar 22 (Thurs."— Presentation transcript:

1 Bayesian Learning Part 3+/- σ

2 Administrivia Final project/proposal Hand-out/brief discussion today Proposal due: Mar 27 Midterm exam: Thurs, Mar 22 (Thurs after Spring Break)

3 Whence now, and whither? Last time: Modeling and parameter assumptions Gaussian distributions The Gaussian optimal decision surface Today: PDF learning: Maximum likelihood estimation IID data and MLE

4 Exercise: For the 1-d Gaussian: Given two classes, with means μ 1 and μ 2 and std devs σ 1 and σ 2 Find a description of the decision point if the std devs are the same, but diff means And if means are the same, but std devs are diff For the d -dim Gaussian, What shapes are the isopotentials? Why? Repeat above exercise for d -dim Gaussian

5 What’s the point? Why do we care about Gaussians? Good models of (certain classes of) data distributions Very commonly encountered (sometimes too often!) More important: good example of how to work w/ probability distributions for Bayesian learning General process same for many other distributions...

6 Statistical model learning Now have a decision rule: Given probability distributions for classes of data,, pick class with maximum posterior likelihood, Problem 1: Still need learning rule: Given data for class, find the best probability model (from some family) for that data Problem 2: Learning will give us ; how do we get ?

7 Statistical model learning Goal: given data set, X, drawn from class, find “best” model,, out of some family of probability distributions Family of probability distributions thorough which you search is your hypothesis space “best” is your loss (objective) function Method for picking best is your search strategy We’ll start with maximum likelihood

8 Parameterizing PDFs Given training data, [X, Y], w/ discrete labels Y Break data out into sets, etc. Want to come up with models,, Suppose the individual f() s are Gaussian, need the params μ and σ How do you get the params?, etc.

9 Parameterizing PDFs Given training data, [X, Y], w/ discrete labels Y Break data out into sets, etc. Want to come up with models,, Suppose the individual f() s are Gaussian, need the params μ and σ How do you get the params? Now, what if the f() s are something really funky you’ve never seen before in your life, with parameters, etc.

10 Principle of Maximum Likelihood: Pick the parameters that make the data as probable (or, in general “likely”) as possible

11 Maximum likelihood Regard the probability function as a func of two variables: data and parameters: Function L is the “likelihood function” Want to pick the that maximizes L

12 Example Consider the exponential PDF: Can think of this as either a function of x or 

13 Exponential as fn of x

14 Exponential as a fn of 

15 Max likelihood params So, for a fixed set of data, X, want the parameter that maximizes L Hold X constant, optimize How? More important: f() is usually a function of a single data point (possibly vector), but L is a func. of a set of data How do you extend f() to set of data?

16 5 minutes of math... Joint probabilities Given d different random vars, The “joint” probability of them taking on the simultaneous values given by Or, for shorthand, Closely related to the “joint PDF”

17 5 minutes of math... Independence: Two random variables are statistically independent iff: Or, equivalently (usually for discrete RVs): For multivariate RVs:

18 Exercise Suppose you’re given the PDF: Where z is a normalizing constant. What must z be to make this a legitimate PDF? Are x 1 and x 2 independent? Why or why not?

19 Exercise Suppose you’re given the PDF: Where z is a normalizing constant. What must z be to make this a legitimate PDF? Are x 1 and x 2 independent? Why or why not? What about the PDF:

20 IID Samples In supervised learning, we usually assume that data points are sampled independently and from the same distribution IID assumption: data are independent and identically distributed ⇒ joint distribution over all data factors into product of marginal distributions over individual data points

21 IID Samples Joint PDF can be written as product of individual (marginal) PDFs: N data points PDF evaluated at a single datum

22 The max likelihood recipe Start with IID data Assume model for individual data point, f(x; Θ ) Construct joint likelihood function (PDF): Find the params Θ that maximize L (If you’re lucky): Differentiate L w.r.t. Θ, set =0 and solve Repeat for each class

23 Exercise Find the maximum likelihood estimator of μ for the univariate Gaussian: Find the maximum likelihood estimator of β for the degenerate gamma distribution: Hint: consider the log of the likelihood fns in both cases


Download ppt "Bayesian Learning Part 3+/- σ. Administrivia Final project/proposal Hand-out/brief discussion today Proposal due: Mar 27 Midterm exam: Thurs, Mar 22 (Thurs."

Similar presentations


Ads by Google