Maximum likelihood estimators Example: Random data X i drawn from a Poisson distribution with unknown  We want to determine  For any assumed value of.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
The Maximum Likelihood Method
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Probability and Statistics Basic concepts II (from a physicist point of view) Benoit CLEMENT – Université J. Fourier / LPSC
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.
Sampling: Final and Initial Sample Size Determination
The General Linear Model. The Simple Linear Model Linear Regression.
1 Chapter 2 Simple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Visual Recognition Tutorial
Today Today: Chapter 9 Assignment: Recommended Questions: 9.1, 9.8, 9.20, 9.23, 9.25.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Lecture 8 The Principle of Maximum Likelihood. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Today Today: Chapter 9 Assignment: 9.2, 9.4, 9.42 (Geo(p)=“geometric distribution”), 9-R9(a,b) Recommended Questions: 9.1, 9.8, 9.20, 9.23, 9.25.
Visual Recognition Tutorial
Computer vision: models, learning and inference
Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Chi-squared distribution  2 N N = number of degrees of freedom Computed using incomplete gamma function: Moments of  2 distribution:
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Model Inference and Averaging
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
Random Sampling, Point Estimation and Maximum Likelihood.
R. Kass/W03P416/Lecture 7 1 Lecture 7 Some Advanced Topics using Propagation of Errors and Least Squares Fitting Error on the mean (review from Lecture.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Chapter 7 Estimation Procedures. Basic Logic  In estimation procedures, statistics calculated from random samples are used to estimate the value of population.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Sample variance and sample error We learned recently how to determine the sample variance using the sample mean. How do we translate this to an unbiased.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Functions of random variables Sometimes what we can measure is not what we are interested in! Example: mass of binary-star system: We want M but can only.
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Confidence Interval & Unbiased Estimator Review and Foreword.
The chi-squared statistic  2 N Measures “goodness of fit” Used for model fitting and hypothesis testing e.g. fitting a function C(p 1,p 2,...p M ; x)
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Machine Learning 5. Parametric Methods.
Lecture 1: Basic Statistical Tools. A random variable (RV) = outcome (realization) not a set value, but rather drawn from some probability distribution.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and  2 Now, we need procedures to calculate  and  2, themselves.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
CHAPTER 4 ESTIMATES OF MEAN AND ERRORS. 4.1 METHOD OF LEAST SQUARES I n Chapter 2 we defined the mean  of the parent distribution and noted that the.
R. Kass/Sp07P416/Lecture 71 More on Least Squares Fit (LSQF) In Lec 5, we discussed how we can fit our data points to a linear function (straight line)
Presentation : “ Maximum Likelihood Estimation” Presented By : Jesu Kiran Spurgen Date :
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Naive Bayes (Generative Classifier) vs. Logistic Regression (Discriminative Classifier) Minkyoung Kim.
The Maximum Likelihood Method
Bayesian Estimation and Confidence Intervals
Deep Feedforward Networks
12. Principles of Parameter Estimation
LECTURE 06: MAXIMUM LIKELIHOOD ESTIMATION
Probability Theory and Parameter Estimation I
Model Inference and Averaging
Ch3: Model Building through Regression
The Maximum Likelihood Method
The Maximum Likelihood Method
Modelling data and curve fitting
EC 331 The Theory of and applications of Maximum Likelihood Method
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Computing and Statistical Data Analysis / Stat 7
Parametric Methods Berlin Chen, 2005 References:
Learning From Observed Data
12. Principles of Parameter Estimation
Applied Statistics and Probability for Engineers
Presentation transcript:

Maximum likelihood estimators Example: Random data X i drawn from a Poisson distribution with unknown  We want to determine  For any assumed value of  the probability density at X=X i is: Likelihood of full set of measurements for any given  is: Maximum likelihood estimator of  is  then given by i XiXi

Take logs and maximize likelihood: Note that result is unbiased since Maximum likelihood estimate of i XiXi    XiXi

Variance of ML estimate Algebra of random variables gives This is a minimum-variance estimate -- it’s independent of  and  Important note: the error bars on the X i are derived from the model, not from the data!

Error bars attach to the model, not to the data! Example: Poisson data X i. How can you attach an error bar to the data points? The right way: is the mean count rate predicted by the model. The wrong way: if you assign then when X i =0,  (0)=0, giving: Assigning  (X i )= √X i gives a downward bias because points lower than average get smaller error bars, and hence more weight than they deserve. i XiXi

Confidence interval on a single parameter  The 1  confidence interval on  includes 68% of the area under the likelihood function: 22 L(  )     2  

Fitting a line to data – 1 Fit a line y = ax + b to a single data point: Blue lines have  2 = 0 Red lines have  2 = 1  2 contours in the (a,b) plane look like this: Solution is not unique, since 2 parameters are constrained by only 1 data point. Bayes: prior P(a,b) will determine value of a. a b  2 = 0  2 = 1

Fitting a line to data – 2a Fitting a line y = ax + b to 2 data points: –red lines give  2 = 2 –blue line gives  2 = 0 Note that a, b are not independent. b x y b a  2 = 0  2 = 2 All solutions (a,b) lying on red ellipse give  2 = 2

Independent vs. correlated parameters a and b are not independent in this example. To find the optimal (a,b) we must: –minimize  2 with respect to a at a sequence of fixed b –then minimise the resulting  2 values with respect to b. If a and b were independent, then all slices through the  2 surface at each fixed b would have same shape. Similarly for a. So we could optimize them independently, saving a lot of calculation. How can we make a and b independent of each other?

Fitting a line to data – 2b Fitting a line y = a(x-x) + b to 2 data points: –red lines give  2 = 2 –blue line gives  2 = 0 Note that a, b are now independent. b x y b a  2 = 0  2 = 2 All solutions (a,b) lying on red ellipse give  2 = 2

Intercept and slope for independent a, b Intercept: Slope: b a 22 22 ab

Choosing orthogonal parameters Good practice. Results for any one parameter don’t depend on values of other parameters. Example: fitting a gaussian profile. Parameters to be fitted are: –Width, w –Area or peak value. Which is best? Area is independent of width – good Peak value depends on width – bad P A