Bayesian Learning, cont’d. Administrivia Homework 1 returned today (details in a second) Reading 2 assigned today S. Thrun, Learning occupancy grids with.

Slides:



Advertisements
Similar presentations
Clustering Beyond K-means
Advertisements

CS Statistical Machine learning Lecture 13 Yuan (Alan) Qi Purdue CS Oct
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.
Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one.
Visual Recognition Tutorial
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.
Bayesian Learning, Part 1 of (probably) 4 Reading: DH&S, Ch. 2.{1-5}, 3.{1-4}
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Bayesianness, cont’d Part 2 of... 4?. Administrivia CSUSC (CS UNM Student Conference) March 1, 2007 (all day) That’s a Thursday... Thoughts?
AGC DSP AGC DSP Professor A G Constantinides© Estimation Theory We seek to determine from a set of data, a set of parameters such that their values would.
Bayesian learning finalized (with high probability)
A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Chapter 5 Part II 5.3 Spread of Data 5.4 Fisher Discriminant.
Bayesian Learning, Cont’d. Administrivia Various homework bugs: Due: Oct 12 (Tues) not 9 (Sat) Problem 3 should read: (duh) (some) info on naive Bayes.
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
Mixed models Various types of models and their relation
7. Least squares 7.1 Method of least squares K. Desch – Statistical methods of data analysis SS10 Another important method to estimate parameters Connection.
SVMs Finalized. Where we are Last time Support vector machines in grungy detail The SVM objective function and QP Today Last details on SVMs Putting it.
Visual Recognition Tutorial
Bayesian Learning 1 of (probably) 2. Administrivia Readings 1 back today Good job, overall Watch your spelling/grammar! Nice analyses, though Possible.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Linear and generalised linear models
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Bayesian Wrap-Up (probably). Administrivia Office hours tomorrow on schedule Woo hoo! Office hours today deferred... [sigh] 4:30-5:15.
SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.
Bayesian Learning Part 3+/- σ. Administrivia Final project/proposal Hand-out/brief discussion today Proposal due: Mar 27 Midterm exam: Thurs, Mar 22 (Thurs.
The Multivariate Normal Distribution, Part 2 BMTRY 726 1/14/2014.
Today Wrap up of probability Vectors, Matrices. Calculus
Maximum Likelihood Estimation
: Appendix A: Mathematical Foundations 1 Montri Karnjanadecha ac.th/~montri Principles of.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
PATTERN RECOGNITION AND MACHINE LEARNING
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Lecture note for Stat 231: Pattern Recognition and Machine Learning 4. Maximum Likelihood Prof. A.L. Yuille Stat 231. Fall 2004.
IID Samples In supervised learning, we usually assume that data points are sampled independently and from the same distribution IID assumption: data are.
Linear Regression Andy Jacobson July 2006 Statistical Anecdotes: Do hospitals make you sick? Student’s story Etymology of “regression”
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
Machine Learning Recitation 6 Sep 30, 2009 Oznur Tastan.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
HMM - Part 2 The EM algorithm Continuous density HMM.
Confidence Interval & Unbiased Estimator Review and Foreword.
Math 4030 – 6a Joint Distributions (Discrete)
Maximum Likelihood Estimation
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
CS479/679 Pattern Recognition Dr. George Bebis
Usman Roshan CS 675 Machine Learning
Probability Theory and Parameter Estimation I
Empirical risk minimization
Ch3: Model Building through Regression
Parameter Estimation 主講人:虞台文.
CH 5: Multivariate Methods
The Maximum Likelihood Method
ECE 417 Lecture 4: Multivariate Gaussians
EC 331 The Theory of and applications of Maximum Likelihood Method
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Empirical risk minimization
Learning From Observed Data
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Probabilistic Surrogate Models
Maximum Likelihood Estimation (MLE)
Presentation transcript:

Bayesian Learning, cont’d

Administrivia Homework 1 returned today (details in a second) Reading 2 assigned today S. Thrun, Learning occupancy grids with forward sensor models. Autonomous Robots, Due: Oct 26 Much crunchier than the first! Don’t slack. Work with your group to sort out the math. Questions to mailing list and me. Midterm exam: Oct 21

Homework 1 results Mean=30.3; std=6.9

IID Samples In supervised learning, we usually assume that data points are sampled independently and from the same distribution IID assumption: data are independent and identically distributed ⇒ joint PDF can be written as product of individual (marginal) PDFs:

The max likelihood recipe Start with IID data Assume model for individual data point, f(X; Θ ) Construct joint likelihood function (PDF): Find the params Θ that maximize L (If you’re lucky): Differentiate L w.r.t. Θ, set =0 and solve Repeat for each class

Exercise Find the maximum likelihood estimator of μ for the univariate Gaussian: Find the maximum likelihood estimator of β for the degenerate gamma distribution: Hint: consider the log of the likelihood fns in both cases

Solutions PDF for one data point: Joint likelihood of N data points:

Solutions Log-likelihood:

Solutions Log-likelihood: Differentiate w.r.t. μ:

Solutions Log-likelihood: Differentiate w.r.t. μ:

Solutions Log-likelihood: Differentiate w.r.t. μ:

Solutions Log-likelihood: Differentiate w.r.t. μ:

Solutions What about for the gamma PDF?

Putting the parts together [X,Y][X,Y] complete training data

Putting the parts together Assumed distribution family (hyp. space) w/ parameters Θ Parameters for class a: Specific PDF for class a

Putting the parts together

Gaussian Distributions

5 minutes of math... Recall your friend the Gaussian PDF: I asserted that the d-dimensional form is: Let’s look at the parts...

5 minutes of math...

Ok, but what do the parts mean? Mean vector, : mean of data along each dimension

5 minutes of math... Covariance matrix Like variance, but describes spread of data

5 minutes of math... Note: covariances on the diagonal of are same as standard variances on that dimension of data But what about skewed data?

5 minutes of math... Off-diagonal covariances ( ) describe the pairwise variance How much x i changes as x j changes (on avg)

5 minutes of math... Calculating from data: In practice: you want to measure the covariance between every pair of random variables (dimensions): Or, in linear algebra: