Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Bayes rule, priors and maximum a posteriori
Basics of Statistical Estimation
A Tutorial on Learning with Bayesian Networks
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests
Learning: Parameter Estimation
Oliver Schulte Machine Learning 726
Statistical Estimation and Sampling Distributions
Fundamentals of Data Analysis Lecture 12 Methods of parametric estimation.
Learning Bayesian Networks. Dimensions of Learning ModelBayes netMarkov net DataCompleteIncomplete StructureKnownUnknown ObjectiveGenerativeDiscriminative.
Flipping A Biased Coin Suppose you have a coin with an unknown bias, θ ≡ P(head). You flip the coin multiple times and observe the outcome. From observations,
.. . Parameter Estimation using likelihood functions Tutorial #1 This class has been cut and slightly edited from Nir Friedman’s full course of 12 lectures.
Parameter Estimation using likelihood functions Tutorial #1
Visual Recognition Tutorial
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.
This presentation has been cut and slightly edited from Nir Friedman’s full course of 12 lectures which is available at Changes.
Probabilistic Graphical Models Tool for representing complex systems and performing sophisticated reasoning tasks Fundamental notion: Modularity Complex.
Learning Bayesian Networks. Dimensions of Learning ModelBayes netMarkov net DataCompleteIncomplete StructureKnownUnknown ObjectiveGenerativeDiscriminative.
Presenting: Assaf Tzabari
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
. Maximum Likelihood (ML) Parameter Estimation with applications to reconstructing phylogenetic trees Comput. Genomics, lecture 6b Presentation taken from.
Maximum Likelihood (ML), Expectation Maximization (EM)
Visual Recognition Tutorial
Computer vision: models, learning and inference
Thanks to Nir Friedman, HU
Learning Bayesian Networks (From David Heckerman’s tutorial)
Crash Course on Machine Learning
Chapter Two Probability Distributions: Discrete Variables
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
.. . Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 6a Presentation taken from.
Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
A statistical model Μ is a set of distributions (or regression functions), e.g., all uni-modal, smooth distributions. Μ is called a parametric model if.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Lecture note for Stat 231: Pattern Recognition and Machine Learning 4. Maximum Likelihood Prof. A.L. Yuille Stat 231. Fall 2004.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.
CSE 446: Point Estimation Winter 2012 Dan Weld Slides adapted from Carlos Guestrin (& Luke Zettlemoyer)
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.
CS498-EA Reasoning in AI Lecture #10 Instructor: Eyal Amir Fall Semester 2009 Some slides in this set were adopted from Eran Segal.
Confidence Interval & Unbiased Estimator Review and Foreword.
Bayesian Prior and Posterior Study Guide for ES205 Yu-Chi Ho Jonathan T. Lee Nov. 24, 2000.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Maximum Likelihood Estimation
Machine Learning 5. Parametric Methods.
1 Learning P-maps Param. Learning Graphical Models – Carlos Guestrin Carnegie Mellon University September 24 th, 2008 Readings: K&F: 3.3, 3.4, 16.1,
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Joo-kyung Kim Biointelligence Laboratory,
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Bayesian Estimation and Confidence Intervals Lecture XXII.
MLPR - Questions. Can you go through integration, differentiation etc. Why do we need priors? Difference between prior and posterior. What does Bayesian.
Crash course in probability theory and statistics – part 2 Machine Learning, Wed Apr 16, 2008.
Oliver Schulte Machine Learning 726
Bayesian Estimation and Confidence Intervals
Probability Theory and Parameter Estimation I
CS 2750: Machine Learning Density Estimation
Ch3: Model Building through Regression
Bayes Net Learning: Bayesian Approaches
Oliver Schulte Machine Learning 726
Distributions and Concepts in Probability Theory
Tutorial #3 by Ma’ayan Fishelson
OVERVIEW OF BAYESIAN INFERENCE: PART 1
More about Posterior Distributions
CS498-EA Reasoning in AI Lecture #20
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Important Distinctions in Learning BNs
Parametric Methods Berlin Chen, 2005 References:
Presentation transcript:

Basics of Statistical Estimation

Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown Given iid data, estimate  using an estimator with good properties: low bias, low variance, consistent (e.g., maximum likelihood estimate)

Maximum Likelihood Principle Choose the parameters that maximize the probability of the observed data

Maximum Likelihood Estimation (Number of heads is binomial distribution)

Computing the ML Estimate Use log-likelihood Differentiate with respect to parameter(s) Equate to zero and solve Solution:

Sufficient Statistics (#h,#t) are sufficient statistics

Bayesian Estimation tails heads True probability  is unknown Bayesian probability density for  p()p()  01

Use of Bayes’ Theorem prior likelihood posterior

Example: Application to Observation of Single “Heads" p(  |heads)  01 p()p()  01 p(heads|  )=   01 priorlikelihoodposterior

Probability of Heads on Next Toss

MAP Estimation Approximation: –Instead of averaging over all parameter values –Consider only the most probable value (i.e., value with highest posterior probability) Usually a very good approximation, and much simpler MAP value ≠ Expected value MAP → ML for infinite data (as long as prior ≠ 0 everywhere)

Prior Distributions for  Direct assessment Parametric distributions –Conjugate distributions (for convenience) –Mixtures of conjugate distributions

Conjugate Family of Distributions Beta distribution: Resulting posterior distribution:

Estimates Compared Prior prediction: Posterior prediction: MAP estimate: ML estimate:

Intuition The hyperparameters  h and  t can be thought of as imaginary counts from our prior experience, starting from "pure ignorance" Equivalent sample size =  h +  t The larger the equivalent sample size, the more confident we are about the true probability

Beta Distributions Beta(3, 2 )Beta(1, 1 )Beta(19, 39 )Beta(0.5, 0.5 )

Assessment of a Beta Distribution Method 1: Equivalent sample - assess  h and  t - assess  h +  t and  h /(  h +  t ) Method 2: Imagined future samples

Generalization to m Outcomes (Multinomial Distribution) Dirichlet distribution: Properties:

Other Distributions Likelihoods from the exponential family Binomial Multinomial Poisson Gamma Normal