Bayesian Learning, Cont’d. Administrivia Various homework bugs: Due: Oct 12 (Tues) not 9 (Sat) Problem 3 should read: (duh) (some) info on naive Bayes.

Slides:



Advertisements
Similar presentations
Basics of Statistical Estimation
Advertisements

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one.
What is Statistical Modeling
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.
Classification and risk prediction
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
Bayesian Learning, Part 1 of (probably) 4 Reading: DH&S, Ch. 2.{1-5}, 3.{1-4}
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1 Statistical Data Analysis: Lecture 2 1Probability, Bayes’ theorem 2Random variables and.
Bayesianness, cont’d Part 2 of... 4?. Administrivia CSUSC (CS UNM Student Conference) March 1, 2007 (all day) That’s a Thursday... Thoughts?
Bayesian learning finalized (with high probability)
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.
Presenting: Assaf Tzabari
Machine Learning CMPT 726 Simon Fraser University
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Mixed models Various types of models and their relation
Visual Recognition Tutorial
Bayesian Learning 1 of (probably) 2. Administrivia Readings 1 back today Good job, overall Watch your spelling/grammar! Nice analyses, though Possible.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Bayesian Wrap-Up (probably). Administrivia Office hours tomorrow on schedule Woo hoo! Office hours today deferred... [sigh] 4:30-5:15.
Bayesian Learning, cont’d. Administrivia Homework 1 returned today (details in a second) Reading 2 assigned today S. Thrun, Learning occupancy grids with.
Bayesian Learning Part 3+/- σ. Administrivia Final project/proposal Hand-out/brief discussion today Proposal due: Mar 27 Midterm exam: Thurs, Mar 22 (Thurs.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Sampling Distributions  A statistic is random in value … it changes from sample to sample.  The probability distribution of a statistic is called a sampling.
Review of Lecture Two Linear Regression Normal Equation
Crash Course on Machine Learning
Recitation 1 Probability Review
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Functions of Random Variables. Methods for determining the distribution of functions of Random Variables 1.Distribution function method 2.Moment generating.
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Lecture note for Stat 231: Pattern Recognition and Machine Learning 4. Maximum Likelihood Prof. A.L. Yuille Stat 231. Fall 2004.
IID Samples In supervised learning, we usually assume that data points are sampled independently and from the same distribution IID assumption: data are.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
ELEC 303 – Random Signals Lecture 18 – Classical Statistical Inference, Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 4, 2010.
Elementary manipulations of probabilities Set probability of multi-valued r.v. P({x=Odd}) = P(1)+P(3)+P(5) = 1/6+1/6+1/6 = ½ Multi-variant distribution:
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Applied statistics Usman Roshan.
CS479/679 Pattern Recognition Dr. George Bebis
Usman Roshan CS 675 Machine Learning
Chapter 3: Maximum-Likelihood Parameter Estimation
12. Principles of Parameter Estimation
Probability Theory and Parameter Estimation I
Parameter Estimation 主講人:虞台文.
Computing and Statistical Data Analysis / Stat 8
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Statistical NLP: Lecture 4
Computing and Statistical Data Analysis / Stat 7
LECTURE 07: BAYESIAN ESTIMATION
Parametric Methods Berlin Chen, 2005 References:
Learning From Observed Data
12. Principles of Parameter Estimation
Probabilistic Surrogate Models
Presentation transcript:

Bayesian Learning, Cont’d

Administrivia Various homework bugs: Due: Oct 12 (Tues) not 9 (Sat) Problem 3 should read: (duh) (some) info on naive Bayes in Sec. 4.3 of text

Administrivia Another bug in last time’s lecture: Multivariate Gaussian should look like:

5 minutes of math... Joint probabilities Given d different random vars, The “joint” probability of them taking on the simultaneous values given by Or, for shorthand, Closely related to the “joint PDF”

5 minutes of math... Independence: Two random variables are statistically independent iff: Or, equivalently (usually for discrete RVs): For multivariate RVs:

Exercise Suppose you’re given the PDF: Where z is a normalizing constant. What must z be to make this a legitimate PDF? Are and independent? Why or why not? What about the PDF:

Parameterizing PDFs Given training data, [X, Y], w/ discrete labels Y Break data out into sets, etc. Want to come up with models,, Suppose the individual f() s are Gaussian, need the params μ and σ How do you get the params? Now, what if the f()s are something really funky you’ve never seen before in your life, with parameters, etc.

Maximum likelihood Principle of maximum likelihood: Pick the parameters that make the data as probable (or, in general “likely”) as possible Regard the probability function as a func of two variables: data and parameters: Function L is the “likelihood function” Want to pick the that maximizes L

Example Consider the exponential PDF: Can think of this as either a function of x or τ

Exponential as fn of x

Exponential as a fn of τ

Max likelihood params So, for a fixed set of data, X, want the parameter that maximizes L Hold X constant, optimize How? More important: f() is usually a function of a single data point (possibly vector), but L is a func. of a set of data How do you extend f() to set of data?

IID Samples In supervised learning, we usually assume that data points are sampled independently and from the same distribution IID assumption: data are independent and identically distributed

IID Samples In supervised learning, we usually assume that data points are sampled independently and from the same distribution IID assumption: data are independent and identically distributed ⇒ joint PDF can be written as product of individual (marginal) PDFs:

The max likelihood recipe Start with IID data Assume model for individual data point, f(X; Θ ) Construct joint likelihood function (PDF): Find the params Θ that maximize L (If you’re lucky): Differentiate L w.r.t. Θ, set =0 and solve Repeat for each class

Exercise Find the maximum likelihood estimator of μ for the univariate Gaussian: Find the maximum likelihood estimator of β for the degenerate gamma distribution: Hint: consider the log of the likelihood fns in both cases

Putting the parts together [X,Y][X,Y] complete training data

5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one RV (regardless of what happens to the others) Marginal PDF of or :

5 minutes of math... Conditional probabilities Suppose you have a joint PDF, f(H,W) Now you get to see one of the values, e.g., H=“183cm” What’s your probability estimate of A, given this new knowledge?

5 minutes of math... Conditional probabilities Suppose you have a joint PDF, f(H,W) Now you get to see one of the values, e.g., H=“183cm” What’s your probability estimate of A, given this new knowledge?

Everything’s random... Basic Bayesian viewpoint: Treat (almost) everything as a random variable Data/independent var: X vector Class/dependent var: Y Parameters: Θ E.g., mean, variance, correlations, multinomial params, etc. Use Bayes’ Rule to assess probabilities of classes