Maximum Likelihood Estimation Navneet Goyal BITS, Pilani.

Slides:



Advertisements
Similar presentations
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 6 Point Estimation.
Advertisements

Point Estimation Notes of STAT 6205 by Dr. Fan.
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
The Estimation Problem How would we select parameters in the limiting case where we had ALL the data? k → l  l’ k→ l’ Intuitively, the actual frequencies.
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
2 – In previous chapters: – We could design an optimal classifier if we knew the prior probabilities P(wi) and the class- conditional probabilities P(x|wi)
Chapter 7. Statistical Estimation and Sampling Distributions
Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.
Fundamentals of Data Analysis Lecture 12 Methods of parametric estimation.
.. . Parameter Estimation using likelihood functions Tutorial #1 This class has been cut and slightly edited from Nir Friedman’s full course of 12 lectures.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Chapter 18 Sampling Distribution Models
Parameter Estimation using likelihood functions Tutorial #1
Maximum likelihood (ML) and likelihood ratio (LR) test
Today Today: Chapter 9 Assignment: Recommended Questions: 9.1, 9.8, 9.20, 9.23, 9.25.
Statistical Inference Chapter 12/13. COMP 5340/6340 Statistical Inference2 Statistical Inference Given a sample of observations from a population, the.
This presentation has been cut and slightly edited from Nir Friedman’s full course of 12 lectures which is available at Changes.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Today Today: Chapter 9 Assignment: 9.2, 9.4, 9.42 (Geo(p)=“geometric distribution”), 9-R9(a,b) Recommended Questions: 9.1, 9.8, 9.20, 9.23, 9.25.
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
. Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.
1 STATISTICAL INFERENCE PART I EXPONENTIAL FAMILY & POINT ESTIMATION.
Maximum likelihood (ML)
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Statistical Decision Theory
.. . Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 6a Presentation taken from.
STATISTICAL INFERENCE PART I POINT ESTIMATION
A statistical model Μ is a set of distributions (or regression functions), e.g., all uni-modal, smooth distributions. Μ is called a parametric model if.
Chapter 7 Point Estimation
IE241: Introduction to Hypothesis Testing. We said before that estimation of parameters was one of the two major areas of statistics. Now let’s turn to.
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
1 Lecture 16: Point Estimation Concepts and Methods Devore, Ch
ELEC 303 – Random Signals Lecture 18 – Classical Statistical Inference, Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 4, 2010.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Consistency An estimator is a consistent estimator of θ, if , i.e., if
Statistics What is the probability that 7 heads will be observed in 10 tosses of a fair coin? This is a ________ problem. Have probabilities on a fundamental.
Week 41 How to find estimators? There are two main methods for finding estimators: 1) Method of moments. 2) The method of Maximum likelihood. Sometimes.
Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.
Statistical Estimation Vasileios Hatzivassiloglou University of Texas at Dallas.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Conditional Expectation
Estimation Econometría. ADE.. Estimation We assume we have a sample of size T of: – The dependent variable (y) – The explanatory variables (x 1,x 2, x.
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Probability and Probability Distributions. Probability Concepts Probability: –We now assume the population parameters are known and calculate the chances.
Bayesian Estimation and Confidence Intervals Lecture XXII.
Maximum Likelihood Estimate
STATISTICS POINT ESTIMATION
12. Principles of Parameter Estimation
Ch3: Model Building through Regression
STATISTICAL INFERENCE PART I POINT ESTIMATION
Parameter, Statistic and Random Samples
Review of Probability and Estimators Arun Das, Jason Rebello
Tutorial #3 by Ma’ayan Fishelson
Lecture 11 Sections 5.1 – 5.2 Objectives: Probability
CONCEPTS OF ESTIMATION
Introduction to EM algorithm
POINT ESTIMATOR OF PARAMETERS
10701 / Machine Learning Today: - Cross validation,
Simple Linear Regression
STATISTICAL INFERENCE PART I POINT ESTIMATION
12. Principles of Parameter Estimation
Data Exploration and Pattern Recognition © R. El-Yaniv
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Presentation transcript:

Maximum Likelihood Estimation Navneet Goyal BITS, Pilani

Maximum Likelihood Estimation (MLE)  Parameter estimation is a fundamental problem in Data Analytics  MLE finds the most likely value for the parameter based on the data set collected  Other methods:  least squares  method of moments  Bayesian estimation  Applications of MLE  Relationship with other methods

Maximum Likelihood Estimation (MLE)  The method of maximum likelihood provides estimators that have both a reasonable intuitive basis and many desirable statistical properties.  The method is very broadly applicable and is simple to apply  A disadvantage of the method is that it frequently requires strong assumptions about the structure of the data © 2010 by John Fox York SPIDA

Maximum Likelihood Estimation (MLE)  A coin is flipped  What is the probability that it will fall Heads up?  What will you do?  Toss the coin a few times  3 H & 2 T  Prob. is 3/5  Why?  Because… Reference: Aarti Singh slides for ML Course, ML CMU

Maximum Likelihood Estimation (MLE)  Data D = {Xi} i=1,2,…N & Xi  {H,T}  P (H)= , P(T)=1-   Flips are iid Reference: Aarti Singh slides for ML Course, ML CMU

Population of Size of Seals blue=0 seals, red=1 seal, purple=2 seals

Likelihood is not Probablity!!

MLE Example  We want to estimate the probability π of getting a head upon flipping a particular coin.  We flip the coin ‘independently’ 10 times (i.e., we sample n = 10 flips), obtaining the following result: HHTHHHTTHH  The probability of obtaining this sequence — in advance of collecting the data — is a function of the unknown parameter π :  Pr(data|parameter) = Pr(HHTHHHTTHH| π ) = π 7 (1 − π ) 3  But the data for our particular sample are fixed: We have already collected them.  The parameter π also has a fixed value, but this value is unknown and so we can let it vary in our imagination between 0 and 1, treating the probability of the observed data as a function of π. © 2010 by John Fox York SPIDA

MLE Example  This function is called the likelihood function:  L(parameter|data) = Pr( π | HHTHHHTTHH) = π 7 (1 − π ) 3  The probability function and the likelihood function are given by the same equation, but the probability function is a function of the data with the value of the parameter fixed, while the likelihood function is a function of the parameter with the data fixed © 2010 by John Fox York SPIDA

MLE Example  Here are some representative values of the likelihood for different values of π : π L( π |data) = π 7 (1 − π ) © 2010 by John Fox York SPIDA

MLE Example © 2010 by John Fox York SPIDA Likelihood Finction

MLE Example © 2010 by John Fox York SPIDA  Although each value of L( π |data) is a notional probability  The function L( π |data) is not a probability or density function — it does not enclose an area of 1.  The probability of obtaining the sample of data that we have in hand, HHTHHHTTHH is small regardless of the true value of π.  This is usually the case: Any specific sample result — including the one that is realized — will have low probability.  Nevertheless, the likelihood contains useful information about the unknown parameter π.  For example, π cannot be 0 or 1, and is ‘unlikely’ to be close to 0 or 1.  Reversing this reasoning, the value of π that is most supported by the data is the one for which the likelihood is largest.  This value is the maximum-likelihood estimate (MLE), denoted by  Here, =0.7, which is the sample proportion of heads, 7/10

Likelihood Fn. (LF) Vs. PDF © 2010 by John Fox York SPIDA  LF is a function of the parameter with data fixed  PDF is a function of the data with parameter fixed  LF is “unnormalized” probability  Although each value of L( π |data) is a notional probability  The function L( π |data) is not a probability or density function — it does not enclose an area of 1.  Both functions are defined on different axes and therefore not directly comparable to each other  PDF is defined on the data scale  LF is defined on parameter scale

Generalization © 2010 by John Fox York SPIDA  n independent flips of a coin  A particular sequence which includes x heads and n-x tails  L( π |data)= Pr(data| π ) = ?  We want the values of π that maximizes L( π |data)  It is simpler, and equivalent to find the values of π that maximizes the log of the likelihood