Presenting: Assaf Tzabari

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Copula Regression By Rahul A. Parsa Drake University &
Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.
Statistics review of basic probability and statistics.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
Visual Recognition Tutorial
Maximum likelihood (ML) and likelihood ratio (LR) test
The Mean Square Error (MSE):. Now, Examples: 1) 2)
Maximum likelihood (ML)
Minimaxity & Admissibility Presenting: Slava Chernoi Lehman and Casella, chapter 5 sections 1-2,7.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
Parametric Inference.
Maximum-Likelihood estimation Consider as usual a random sample x = x 1, …, x n from a distribution with p.d.f. f (x;  ) (and c.d.f. F(x;  ) ) The maximum.
Visual Recognition Tutorial
July 3, Department of Computer and Information Science (IDA) Linköpings universitet, Sweden Minimal sufficient statistic.
July 3, A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.
Introduction to Bayesian Parameter Estimation
Thanks to Nir Friedman, HU
Learning Bayesian Networks (From David Heckerman’s tutorial)
Bayesian Learning Part 3+/- σ. Administrivia Final project/proposal Hand-out/brief discussion today Proposal due: Mar 27 Midterm exam: Thurs, Mar 22 (Thurs.
Maximum likelihood (ML)
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Statistical Decision Theory
Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Random Sampling, Point Estimation and Maximum Likelihood.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Chapter 7 Point Estimation
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
1 Lecture 16: Point Estimation Concepts and Methods Devore, Ch
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Consistency An estimator is a consistent estimator of θ, if , i.e., if
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Confidence Interval & Unbiased Estimator Review and Foreword.
IE 300, Fall 2012 Richard Sowers IESE. 8/30/2012 Goals: Rules of Probability Counting Equally likely Some examples.
Bayesian Prior and Posterior Study Guide for ES205 Yu-Chi Ho Jonathan T. Lee Nov. 24, 2000.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Sampling and estimation Petter Mostad
Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.
1 Optimizing Decisions over the Long-term in the Presence of Uncertain Response Edward Kambour.
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
Chapter 5 Joint Probability Distributions and Random Samples  Jointly Distributed Random Variables.2 - Expected Values, Covariance, and Correlation.3.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Univariate Gaussian Case (Cont.)
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
Statistical NLP: Lecture 4 Mathematical Foundations I: Probability Theory (Ch2)
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.
Crash course in probability theory and statistics – part 2 Machine Learning, Wed Apr 16, 2008.
Univariate Gaussian Case (Cont.)
Statistical Estimation
STATISTICS POINT ESTIMATION
Probability Theory and Parameter Estimation I
Ch3: Model Building through Regression
More about Posterior Distributions
Statistical NLP: Lecture 4
Summarizing Data by Statistics
LECTURE 09: BAYESIAN LEARNING
Parametric Methods Berlin Chen, 2005 References:
Applied Statistics and Probability for Engineers
Moments of Random Variables
Presentation transcript:

Presenting: Assaf Tzabari Bayesian Inference Presenting: Assaf Tzabari

Agenda Basic concepts Conjugate priors Generalized Bayes rules Empirical Bayes Admissibility Asymptotic efficiency

Basic concepts - unknown parameter with prior density x - random vector with density Joint density of x and q : marginal density of x : Posterior density of q :

Basic concepts (cont.) Elements of a decision problem: - the set of all possible decisions - loss function defined for all - decision rule Risk function : Bayes risk function :

Basic concepts (cont.) A Bayes rule is a decision rule which minimizes A Bayes rule can be found by choosing, for each x, an action which minimizes the posterior expected loss: or, equivalently, which minimizes:

Basic concepts (cont.) Example: Bayesian estimation under MSE

Conjugate priors Definition: A class of prior distributions is a conjugate family for if for all Example: the class of normal priors is a conjugate family for the class of normal sample densities,

Using conjugate priors Step 1: Find a conjugate prior Choose a class with the same form as the likelihood functions Step 2: Calculate the posterior Gather the factors involving in

Using conjugate priors (cont.) Example: Finding conjugate prior for the Poisson distribution x=(x1,…, xn ) where xi~P(q) are iid, Factors fit to gamma distribution of q

Using conjugate priors (cont.) Example (cont.): Finding conjugate prior for the Poisson distribution The Bayes estimator under MSE is then, 1 2 3 4 5 6 7 8 9 10 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 a=1, b=2 a=2, b=2 a=3, b=2 a=10, b=0.5 q p(q) Gamma The ML estimator is

Using conjugate priors (cont.) More conjugate priors for common statistical distributions: Binomial x~b(p,n) and Beta prior 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.5 2 2.5 3 3.5 a=2,b=2 a=0.5,b=0.5 a=2,b=5 a=5,b=2 Beta p(q) q

Using conjugate priors (cont.) Uniform iid x=(x1,…, xn ) , xi ~U(0,q) and Pareto prior 0.5 1 1.5 2 2.5 3 3.5 4 a=1 a=2 a=3 Pareto p(q) q

Conjugate priors (cont.) Advantages Easy to calculate Intuitive Useful for sequential estimation Can a conjugate prior be a reasonable approximation to the true prior? Not always!

Conjugate priors (cont.) Example: estimating q under MSE based on x~N(q,1) Step 1: subjectively determine a-fractiles a point z(a) is defined a-fractile if Step 2: look for matching prior and find Bayes estimator Only p1 is conjugate prior, but which is a better estimator ?

Improper priors Improper prior – a prior with infinite mass Bayes risk has no meaning The posterior function usually exists Useful in the following cases: Prior information is not available (noninformative priors are usually improper) The parameter space is restricted

Generalized Bayes rules Definition: If p(q) is an improper prior, a generalized Bayes rule, for given x, is an action which minimizes or, if , which minimizes the posterior expected loss. Example: estimating q >0 under MSE based on

Generalized Bayes rules (cont.) -10 -5 5 1 2 3 4 6 Bayes estimator for s=2 Bayes estimator for s=1 Bayes estimator for s=1/2 ML estimator x

Generalized Bayes rules (cont.) Generalized Bayes rules are useful in solving problems which don’t include prior information Example: Location parameter estimation under L(a-q) fx|q is a location density with location parameter q if fx|q =f(x-q) Using p(q) =1 we get,

Generalized Bayes rules (cont.) Example (cont.): Location parameter estimation under L(a-q) The generalized Bayes rule has the form, This is a group of invariant rules, and the best invariant rule is the generalized Bayes rule with the prior p(q) =1

Generalized Bayes rules (cont.) Example (cont.): Location parameter estimation under L(a-q) Under MSE d(x) is the posterior mean, for x=(x1,…,xn) , Pitman’s estimator is derived:

Empirical Bayes Development of Bayes rules using auxiliary empirical (past or current) data Methods: Using past data in constructing the prior Using past data in estimating the marginal distribution Dealing simultaneously with several decision problems Xn+1 - sample information with density x1 ,…,xn - past observations with densities

Determination of the prior from past data Assumption: q1 ,…,qn ,qn+1 - parameters from a common prior p(q) - conditional mean and variance of xi - marginal mean and variance of xi Lemma 1: Result 1:

Determination of the prior from past data (cont.) Step 1: Assume a certain functional form for p Conjugate family of priors is convenient Step 2: Estimate mp , sp2 based on x1,…, xn Xn+1 can be included too If mf(q)=q and sf2 is constant then: Step 2a: Estimate mm , sm2 from the data E.g. Step 2b: Use result 1. to calculate mp , sp2

Determination of the prior from past data (cont.) Example: and p(q) is assumed to be normal (conjugate prior). Estimation of mp and sp2 is needed for determining the prior.

Estimation of marginal distribution from past data Assumption: The Bayes rule can be represented in terms of m(x) Advantage: No need to estimate the prior Advantage: No need to estimate the prior Step 1: Estimate m(x) x1,…, xn , xn+1 are a sample from the distribution with density m(x) E.g. in the discrete case, Step 2: Estimate the Bayes rule, using

Estimation of marginal distribution from past data (cont.) Example: The Bayes estimation of qn+1 when ,under MSE.

Compound decision problems Independent x1,…, xn are observed, where qi are from a common prior p(q) Goal: simultaneously make decisions involving q1 ,…,qn The loss is L(q1 ,…,qn,a) Solution: Determine the prior from x1,…, xn using empirical Bayes methods

Admissibility of Bayes rules Bayes rules with finite Bayes risk are typically admissible: If a Bayes rule, dp is unique then it is admissible E.g. Under MSE the Bayes rule is unique Proof: Any rule R-better than dp must be a Bayes rule itself For discrete q , assuming that p is positive, dp is admissible For continuous q , if R(q,d) is continuous in q for every d then dp is admissible

Admissibility of Bayes rules (cont.) Generalized Bayes rules can be inadmissible and the verification of their admissibility can be difficult. Example: generalized Bayes estimator of q based on versus the James-Stein estimator

Admissibility of Bayes rules (cont.) Example (cont.): generalized Bayes estimator of q versus the James-Stein estimator

Admissibility of Bayes rules (cont.) Theorem: If x is continues with p-dimensional exponential density and Q is closed, then any admissible estimator is a generalized Bayes rule fx|q is a p-dimensional exponential density if, E.g. The normal distribution

Asymptotic efficiency of Bayes estimators x1 ,…,xn are iid samples with density f(xi|q) Definitions: Estimator dn(x1,…,xn) of q is defined asymptotically unbiased if, Asymptotically unbiased estimator is defined asymptotically efficient if, v(q) – asymptotic variance I(q) – Fisher information in a single sample

Asymptotic efficiency of Bayes estimators (cont.) Assumptions for the next theorems The posterior is a proper continues and positive density, and The prior can be improper! The likelihood function l(q)=f(x|q) satisfies regularity conditions

Asymptotic efficiency of Bayes estimators (cont.) Theorem: For large values of n, the posterior distribution is approximately – Conclusion: Bayes estimators such as the posterior mean are asymptotic unbiased The effect of the prior declines as n increases

Asymptotic efficiency of Bayes estimators (cont.) Theorem: If dn is the Bayes estimator under MSE then, Conclusion: The Bayes estimator dn under MSE is asymptotically efficient

Asymptotic efficiency of Bayes estimators (cont.) Example: estimator of p based on binomial sample x~b(p,n) under MSE

Asymptotic efficiency of Bayes estimators (cont.) If the prior is concentrated it determines the estimator, “Don’t confuse me with the facts!” a=b=2 a=b=2000 1 10 20 30 40 50 60 70 80 90 100 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 x x 10 20 30 40 50 60 70 80 90 100 ML estimator dp(x) Bayes estimator

Asymptotic efficiency of Bayes estimators (cont.) For large sample, the Bayes estimator tends to become independent of the prior n=1000 n=10 9 10 1 2 3 4 5 6 7 8 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 300 400 500 600 700 800 900 1000 100 200 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x x ML estimator dp(x) Bayes estimator when a=b=2

Asymptotic efficiency of Bayes estimators (cont.) More examples of asymptotic efficient Bayes estimators Location distributions: if the likelihood function l(q)=f(x-q) satisfies the regularity conditions, then the Pitman estimator after one observation is asymptotically efficient Exponential distributions: if then it satisfies the regularity conditions, and the asymptotic efficiency depends on the prior

Conclusions Bayes rules are designed for problems with prior information, but useful in other cases as well Determining the prior is a crucial step, which affects the admissibility and the computational complexity Bayes estimators, under MSE, performs well on large sample