Maximum Likelihood Estimation

Slides:



Advertisements
Similar presentations
Basics of Statistical Estimation
Advertisements

EMNLP, June 2001Ted Pedersen - EM Panel1 A Gentle Introduction to the EM Algorithm Ted Pedersen Department of Computer Science University of Minnesota.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Learning: Parameter Estimation
Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Flipping A Biased Coin Suppose you have a coin with an unknown bias, θ ≡ P(head). You flip the coin multiple times and observe the outcome. From observations,
The EM algorithm LING 572 Fei Xia Week 10: 03/09/2010.
.. . Parameter Estimation using likelihood functions Tutorial #1 This class has been cut and slightly edited from Nir Friedman’s full course of 12 lectures.
Parameter Estimation using likelihood functions Tutorial #1
This presentation has been cut and slightly edited from Nir Friedman’s full course of 12 lectures which is available at Changes.
Probabilistic Graphical Models Tool for representing complex systems and performing sophisticated reasoning tasks Fundamental notion: Modularity Complex.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference (Sec. )
Maximum Likelihood We have studied the OLS estimator. It only applies under certain assumptions In particular,  ~ N(0, 2 ) But what if the sampling distribution.
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference.
The EM algorithm LING 572 Fei Xia 03/01/07. What is EM? EM stands for “expectation maximization”. A parameter estimation method: it falls into the general.
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
. Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.
Visual Recognition Tutorial
Class 3: Estimating Scoring Rules for Sequence Alignment.
Thanks to Nir Friedman, HU
CSE 221: Probabilistic Analysis of Computer Systems Topics covered: Statistical inference.
Review of Lecture Two Linear Regression Normal Equation
Crash Course on Machine Learning
Recitation 1 Probability Review
Chapter Two Probability Distributions: Discrete Variables
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Likelihood probability of observing the data given a model with certain parameters Maximum Likelihood Estimation (MLE) –find the parameter combination.
Probability theory: (lecture 2 on AMLbook.com)
.. . Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 6a Presentation taken from.
Lecture note for Stat 231: Pattern Recognition and Machine Learning 4. Maximum Likelihood Prof. A.L. Yuille Stat 231. Fall 2004.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Consistency An estimator is a consistent estimator of θ, if , i.e., if
Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)
CS498-EA Reasoning in AI Lecture #10 Instructor: Eyal Amir Fall Semester 2009 Some slides in this set were adopted from Eran Segal.
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
5. Maximum Likelihood –II Prof. Yuille. Stat 231. Fall 2004.
Week 41 How to find estimators? There are two main methods for finding estimators: 1) Method of moments. 2) The method of Maximum likelihood. Sometimes.
Statistical Estimation Vasileios Hatzivassiloglou University of Texas at Dallas.
Machine Learning 5. Parametric Methods.
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Week 31 The Likelihood Function - Introduction Recall: a statistical model for some data is a set of distributions, one of which corresponds to the true.
Review of statistical modeling and probability theory Alan Moses ML4bio.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
Maximum Likelihood Estimate Jyh-Shing Roger Jang ( 張智星 ) CSIE Dept, National Taiwan University.
Information Bottleneck versus Maximum Likelihood Felix Polyakov.
Conditional Expectation
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Crash course in probability theory and statistics – part 2 Machine Learning, Wed Apr 16, 2008.
Statistical Estimation
Maximum Likelihood Estimate
Probability Theory and Parameter Estimation I
CS 2750: Machine Learning Density Estimation
Parameter Estimation 主講人:虞台文.
Maximum Likelihood Estimation
Oliver Schulte Machine Learning 726
Tutorial #3 by Ma’ayan Fishelson
More about Posterior Distributions
CS498-EA Reasoning in AI Lecture #20
Covered only ML estimator
LECTURE 07: BAYESIAN ESTIMATION
Parametric Methods Berlin Chen, 2005 References:
Maximum Likelihood We have studied the OLS estimator. It only applies under certain assumptions In particular,  ~ N(0, 2 ) But what if the sampling distribution.
Presentation transcript:

Maximum Likelihood Estimation Learning Probabilistic Graphical Models Parameter Estimation Maximum Likelihood Estimation

Biased Coin Example Tosses are independent of each other P is a Bernoulli distribution: P(X=1) = , P(X=0) = 1- sampled IID from P Tosses are independent of each other Tosses are sampled from the same distribution (identically distributed)

IID as a PGM   X X[1] . . . X[M] Data m

Maximum Likelihood Estimation Goal: find [0,1] that predicts D well Prediction quality = likelihood of D given  0.2 0.4 0.6 0.8 1  L(D:)

Maximum Likelihood Estimator Observations: MH heads and MT tails Find  maximizing likelihood Equivalent to maximizing log-likelihood Differentiating the log-likelihood and solving for :

Sufficient Statistics For computing  in the coin toss example, we only needed MH and MT since  MH and MT are sufficient statistics

Sufficient Statistics A function s(D) is a sufficient statistic from instances to a vector in k if for any two datasets D and D’ and any  we have Datasets Statistics

Sufficient Statistic for Multinomial For a dataset D over variable X with k values, the sufficient statistics are counts <M1,...,Mk> where Mi is the # of times that X[m]=xi in D Sufficient statistic s(x) is a tuple of dimension k s(xi)=(0,...0,1,0,...,0) i

Sufficient Statistic for Gaussian Gaussian distribution: Rewrite as Sufficient statistics for Gaussian: s(x)=<1,x,x2>

Maximum Likelihood Estimation MLE Principle: Choose  to maximize L(D:) Multinomial MLE: Gaussian MLE:

Summary Maximum likelihood estimation is a simple principle for parameter selection given D Likelihood function uniquely determined by sufficient statistics that summarize D MLE has closed form solution for many parametric distributions

END END END