Bayesian Estimation and Confidence Intervals Lecture XXII.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Bayes rule, priors and maximum a posteriori
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Basics of Statistical Estimation
Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Bayesian Wrap-Up (probably). 5 minutes of math... Marginal probabilities If you have a joint PDF:... and want to know about the probability of just one.
Parameter Estimation using likelihood functions Tutorial #1
Maximum likelihood (ML) and likelihood ratio (LR) test
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Bayesian learning finalized (with high probability)
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Thanks to Nir Friedman, HU
Maximum likelihood (ML)
1 Confidence Intervals for Means. 2 When the sample size n< 30 case1-1. the underlying distribution is normal with known variance case1-2. the underlying.
Hamid R. Rabiee Fall 2009 Stochastic Processes Review of Elementary Probability Lecture I.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Estimation and Hypothesis Testing. The Investment Decision What would you like to know? What will be the return on my investment? Not possible PDF for.
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
Additional Slides on Bayesian Statistics for STA 101 Prof. Jerry Reiter Fall 2008.
Bayesian Model Selection in Factorial Designs Seminal work is by Box and Meyer Seminal work is by Box and Meyer Intuitive formulation and analytical approach,
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Statistical Decision Theory
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
A statistical model Μ is a set of distributions (or regression functions), e.g., all uni-modal, smooth distributions. Μ is called a parametric model if.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
IID Samples In supervised learning, we usually assume that data points are sampled independently and from the same distribution IID assumption: data are.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.
Maximum Likelihood - "Frequentist" inference x 1,x 2,....,x n ~ iid N( ,  2 ) Joint pdf for the whole random sample Maximum likelihood estimates.
- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.
Example: Bioassay experiment Problem statement –Observations: At each level of dose, 5 animals are tested, and number of death are observed.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
1 Standard error Estimated standard error,s,. 2 Example 1 While measuring the thermal conductivity of Armco iron, using a temperature of 100F and a power.
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
Conditional Probability Mass Function. Introduction P[A|B] is the probability of an event A, giving that we know that some other event B has occurred.
Sampling and estimation Petter Mostad
Bayes Theorem. Prior Probabilities On way to party, you ask “Has Karl already had too many beers?” Your prior probabilities are 20% yes, 80% no.
1 Optimizing Decisions over the Long-term in the Presence of Uncertain Response Edward Kambour.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Outline Historical note about Bayes’ rule Bayesian updating for probability density functions –Salary offer estimate Coin trials example Reading material:
CSC321: Lecture 8: The Bayesian way to fit models Geoffrey Hinton.
Applied statistics Usman Roshan.
Bayesian Estimation and Confidence Intervals
Applied statistics Usman Roshan.
STATISTICS POINT ESTIMATION
LECTURE 06: MAXIMUM LIKELIHOOD ESTIMATION
Ch3: Model Building through Regression
Bayes Net Learning: Bayesian Approaches
Special Topics In Scientific Computing
More about Posterior Distributions
Econometric Models The most basic econometric model consists of a relationship between two variables which is disturbed by a random error. We need to use.
'Linear Hierarchical Models'
Example Human males have one X-chromosome and one Y-chromosome,
Simple Linear Regression
LECTURE 07: BAYESIAN ESTIMATION
Bayes for Beginners Luca Chech and Jolanda Malamud
CS 594: Empirical Methods in HCC Introduction to Bayesian Analysis
Parametric Methods Berlin Chen, 2005 References:
Learning From Observed Data
11. Conditional Density Functions and Conditional Expected Values
11. Conditional Density Functions and Conditional Expected Values
Mathematical Foundations of BME Reza Shadmehr
Classical regression review
Presentation transcript:

Bayesian Estimation and Confidence Intervals Lecture XXII

Bayesian Estimation Implicitly in our previous discussions about estimation, we adopted a classical viewpoint. –We had some process generating random observations. –This random process was a function of fixed, but unknown. –We then designed procedures to estimate these unknown parameters based on observed data.

Specifically, if we assumed that a random process such as students admitted to the University of Florida, generated heights. This height process can be characterized by a normal distribution. –We can estimate the parameters of this distribution using maximum likelihood.

–The likelihood of a particular sample can be expressed as –Our estimates of  and  2 are then based on the value of each parameter that maximizes the likelihood of drawing that sample

Turning this process around slightly, Bayesian analysis assumes that we can make some kind of probability statement about parameters before we start. The sample is then used to update our prior distribution.

–First, assume that our prior beliefs about the distribution function can be expressed as a probability density function  (  ) where  is the parameter we are interested in estimating. –Based on a sample (the likelihood function) we can update our knowledge of the distribution using Bayes rule

Departing from the book’s example, assume that we have a prior of a Bernoulli distribution. Our prior is that P in the Bernoulli distribution is distributed B( ,  ).

Assume that we are interested in forming the posterior distribution after a single draw:

Following the original specification of the beta function

The posterior distribution, the distribution of P after the observation is then

The Bayesian estimate of P is then the value that minimizes a loss function. Several loss functions can be used, but we will focus on the quadratic loss function consistent with mean square errors

Taking the expectation of the posterior distribution yields

As before, we solve the integral by creating  * =  +X+1 and  * =  -X+1. The integral then becomes

–Which can be simplified using the fact –Therefore,

To make this estimation process operational, assume that we have a prior distribution with parameters  =  = that yields a beta distribution with a mean P of 0.5 and a variance of the estimate of

Next assume that we flip a coin and it comes up heads (X=1). The new estimate of P becomes If, on the other hand, the outcome is a tail (X=0) the new estimate of P is

Extending the results to n Bernoulli trials yields

where Y is the sum of the individual Xs or the number of heads in the sample. The estimated value of P then becomes:

Going back to the example in the last lecture, in the first draw Y=15 and n=50. This yields an estimated value of P of This value compares with the maximum likelihood estimate of Since the maximum likelihood estimator in this case is unbaised, the results imply that the Bayesian estimator is baised.

Bayesian Confidence Intervals Apart from providing an alternative procedure for estimation, the Bayesian approach provides a direct procedure for the formulation of parameter confidence intervals. Returning to the simple case of a single coin toss, the probability density function of the estimator becomes:

As previously discussed, we know that given  =  = and a head, the Bayesian estimator of P is.6252.

However, using the posterior distribution function, we can also compute the probability that the value of P is less than 0.5 given a head: Hence, we have a very formal statement of confidence intervals.