Maximum Likelihood Estimation

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Probabilistic models Haixu Tang School of Informatics.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Sampling: Final and Initial Sample Size Determination
Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.
Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.
Visual Recognition Tutorial
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Today Today: Chapter 9 Assignment: Recommended Questions: 9.1, 9.8, 9.20, 9.23, 9.25.
Statistical Inference Chapter 12/13. COMP 5340/6340 Statistical Inference2 Statistical Inference Given a sample of observations from a population, the.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Today Today: Chapter 9 Assignment: 9.2, 9.4, 9.42 (Geo(p)=“geometric distribution”), 9-R9(a,b) Recommended Questions: 9.1, 9.8, 9.20, 9.23, 9.25.
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
Lecture 7 1 Statistics Statistics: 1. Model 2. Estimation 3. Hypothesis test.
1 Inference About a Population Variance Sometimes we are interested in making inference about the variability of processes. Examples: –Investors use variance.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Probability theory: (lecture 2 on AMLbook.com)
Random Sampling, Point Estimation and Maximum Likelihood.
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Chapter 7 Point Estimation
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
INTRODUCTION TO Machine Learning 3rd Edition
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Maximum Likelihood Estimation
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Statistical Estimation Vasileios Hatzivassiloglou University of Texas at Dallas.
Machine Learning 5. Parametric Methods.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Crash course in probability theory and statistics – part 2 Machine Learning, Wed Apr 16, 2008.
Data Modeling Patrice Koehl Department of Biological Sciences
Evaluating Hypotheses
CS479/679 Pattern Recognition Dr. George Bebis
Statistical Estimation
Chapter 3: Maximum-Likelihood Parameter Estimation
STATISTICS POINT ESTIMATION
LECTURE 06: MAXIMUM LIKELIHOOD ESTIMATION
Probability Theory and Parameter Estimation I
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Parameter Estimation 主講人:虞台文.
More about Posterior Distributions
INTRODUCTION TO Machine Learning
Covered only ML estimator
Statistical Assumptions for SLR
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Statistical NLP: Lecture 4
Parametric Methods Berlin Chen, 2005 References:
Learning From Observed Data
Multivariate Methods Berlin Chen
Linear Discrimination
Further Topics on Random Variables: 1
Discrete Random Variables: Joint PMFs, Conditioning and Independence
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Maximum Likelihood Estimation (MLE)
Presentation transcript:

Maximum Likelihood Estimation Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Ethem Alpaydin, Introduction to Machine Learning , Chapter 4, MIT Press, 2004

Sample Statistics and Population Parameters A Schematic Depiction Population (sample space) Sample Statistics Inference Parameters

Introduction Statistic Parametric methods Any value (or function) that is calculated from a given sample Statistical inference: make a decision using the information provided by a sample (or a set of examples/instances) Parametric methods Assume that examples are drawn from some distribution that obeys a known model Advantage: the model is well defined up to a small number of parameters E.g., mean and variance are sufficient statistics for the Gaussian distribution Model parameters are typically estimated by either maximum likelihood estimation (MLE) or Bayesian (MAP) estimation

Maximum Likelihood Estimation (MLE) (1/2) Assume the instances are independent and identically distributed (iid), and drawn from some known probability distribution : model parameters (assumed to be fixed but unknown here) MLE attempts to find that make the most likely to be drawn Namely, maximize the likelihood of the instances

MLE (2/2) Because logarithm will not change the value of when it take its maximum (monotonically increasing/decreasing) Finding that maximizes the likelihood of the instances is equivalent to finding that maximizes the log likelihood of the sample instances ( that best fits the instances) As we shall see, logarithmic operation can further simplify the computation when estimating the parameters of those distributions that have exponents

MLE: Bernoulli Distribution (1/3) A random variable takes either the value (with probability ) or the value (with probability ) Can be thought of as is generated form two distinct states The associated probability distribution The log likelihood for a set of iid instances drawn from Bernoulli distribution

MLE: Bernoulli Distribution (2/3) MLE of the distribution parameter The estimate for is the ratio of the number of occurrences of the event ( ) to the number of experiments The expected value for The variance value for

MLE: Bernoulli Distribution (3/3) Appendix A The maximum likelihood estimate of the mean is the sample average

MLE: Multinomial Distribution (1/4) A generalization of Bernoulli distribution The value of a random variable can be one of K mutually exclusive and exhaustive states with probabilities , respectively The associated probability distribution The log likelihood for a set of iid instances drawn from a multinomial distribution

MLE: Multinomial Distribution (2/4) MLE of the distribution parameter The estimate for is the ratio of the number of experiments with outcome of state ( ) to the number of experiments Urn

MLE: Multinomial Distribution (3/4) Appendix B Lagrange Multiplier =1 Lagrange Multiplier: http://www.slimy.com/~steuard/teaching/tutorials/Lagrange.html

MLE: Multinomial Distribution (4/4) Urn P(B)=3/10 P(W)=4/10 P(R)=3/10

MLE: Gaussian Distribution (1/3) Also called Normal Distribution Characterized with mean and variance Recall that mean and variance are sufficient statistics for Gaussian The log likelihood for a set of iid instances drawn from Gaussian distribution

MLE: Gaussian Distribution (2/3) MLE of the distribution parameters and Remind that and are still fixed but unknown sample average sample variance

MLE: Gaussian Distribution (3/3) Appendix C

Evaluating an Estimator : Bias and Variance (1/6) The mean square error of the estimator can be further decomposed into two parts respectively composed of bias and variance constant constant variance bias2

Evaluating an Estimator : Bias and Variance (2/6)

Evaluating an Estimator : Bias and Variance (3/6) Example 1: sample average and sample variance Assume samples are independent and identically distributed (iid), and drawn from some known probability distribution with mean and variance Mean Variance Sample average (mean) for the observed samples Sample variance for the observed samples for discrete random variables estimate estimator ?

Evaluating an Estimator : Bias and Variance (4/6) Example 1 (count.) Sample average is an unbiased estimator of the mean is also a consistent estimator:

Evaluating an Estimator : Bias and Variance (5/6) Example 1 (count.) Sample variance is an asymptotically unbiased estimator of the variance

Evaluating an Estimator : Bias and Variance (6/6) Example 1 (count.) Sample variance is an asymptotically unbiased estimator of the variance The size of the observed sample set

Bias and Variance: Example 2 different samples for an unknown population error of measurement

Big Data => Deep Learning (Deep Neural Networks, etc.) complicated or sophisticated modeling structures Simple is Elegant ? Occam’s razor favors simple solutions over complex ones.