Bayesian Learning, cont’d
Administrivia Homework 1 returned today (details in a second) Reading 2 assigned today S. Thrun, Learning occupancy grids with forward sensor models. Autonomous Robots, Due: Oct 26 Much crunchier than the first! Don’t slack. Work with your group to sort out the math. Questions to mailing list and me. Midterm exam: Oct 21
Homework 1 results Mean=30.3; std=6.9
IID Samples In supervised learning, we usually assume that data points are sampled independently and from the same distribution IID assumption: data are independent and identically distributed ⇒ joint PDF can be written as product of individual (marginal) PDFs:
The max likelihood recipe Start with IID data Assume model for individual data point, f(X; Θ ) Construct joint likelihood function (PDF): Find the params Θ that maximize L (If you’re lucky): Differentiate L w.r.t. Θ, set =0 and solve Repeat for each class
Exercise Find the maximum likelihood estimator of μ for the univariate Gaussian: Find the maximum likelihood estimator of β for the degenerate gamma distribution: Hint: consider the log of the likelihood fns in both cases
Solutions PDF for one data point: Joint likelihood of N data points:
Solutions Log-likelihood:
Solutions Log-likelihood: Differentiate w.r.t. μ:
Solutions Log-likelihood: Differentiate w.r.t. μ:
Solutions Log-likelihood: Differentiate w.r.t. μ:
Solutions Log-likelihood: Differentiate w.r.t. μ:
Solutions What about for the gamma PDF?
Putting the parts together [X,Y][X,Y] complete training data
Putting the parts together Assumed distribution family (hyp. space) w/ parameters Θ Parameters for class a: Specific PDF for class a
Putting the parts together
Gaussian Distributions
5 minutes of math... Recall your friend the Gaussian PDF: I asserted that the d-dimensional form is: Let’s look at the parts...
5 minutes of math...
Ok, but what do the parts mean? Mean vector, : mean of data along each dimension
5 minutes of math... Covariance matrix Like variance, but describes spread of data
5 minutes of math... Note: covariances on the diagonal of are same as standard variances on that dimension of data But what about skewed data?
5 minutes of math... Off-diagonal covariances ( ) describe the pairwise variance How much x i changes as x j changes (on avg)
5 minutes of math... Calculating from data: In practice: you want to measure the covariance between every pair of random variables (dimensions): Or, in linear algebra: