Bayesian Statistics on a Shoestring Assaf Oron, May 2008

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Contrastive Divergence Learning
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.
Bayesian statistics – MCMC techniques
Bayesian estimation Bayes’s theorem: prior, likelihood, posterior
CS 589 Information Risk Management 6 February 2007.
Presenting: Assaf Tzabari
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Thanks to Nir Friedman, HU
Additional Slides on Bayesian Statistics for STA 101 Prof. Jerry Reiter Fall 2008.
Bayesian Model Selection in Factorial Designs Seminal work is by Box and Meyer Seminal work is by Box and Meyer Intuitive formulation and analytical approach,
Statistical Decision Theory
Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.
St5219: Bayesian hierarchical modelling lecture 2.1.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.
- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.
Inference for 2 Proportions Mean and Standard Deviation.
Week 41 Estimation – Posterior mean An alternative estimate to the posterior mode is the posterior mean. It is given by E(θ | s), whenever it exists. This.
Example: Bioassay experiment Problem statement –Observations: At each level of dose, 5 animals are tested, and number of death are observed.
Bayesian Inference, Review 4/25/12 Frequentist inference Bayesian inference Review The Bayesian Heresy (pdf)pdf Professor Kari Lock Morgan Duke University.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
Ch15: Decision Theory & Bayesian Inference 15.1: INTRO: We are back to some theoretical statistics: 1.Decision Theory –Make decisions in the presence of.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Sampling considerations within Market Surveillance actions Nikola Tuneski, Ph.D. Department of Mathematics and Computer Science Faculty of Mechanical Engineering.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Markov-Chain-Monte-Carlo (MCMC) & The Metropolis-Hastings Algorithm P548: Intro Bayesian Stats with Psych Applications Instructor: John Miyamoto 01/19/2016:
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
SIR method continued. SIR: sample-importance resampling Find maximum likelihood (best likelihood × prior), Y Randomly sample pairs of r and N 1973 For.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Hierarchical Models. Conceptual: What are we talking about? – What makes a statistical model hierarchical? – How does that fit into population analysis?
Bayesian Estimation and Confidence Intervals Lecture XXII.
MLPR - Questions. Can you go through integration, differentiation etc. Why do we need priors? Difference between prior and posterior. What does Bayesian.
Markov Chain Monte Carlo in R
Virtual University of Pakistan
Statistics 200 Objectives:
Oliver Schulte Machine Learning 726
CHAPTER 8 Estimating with Confidence
Bayesian Estimation and Confidence Intervals
MCMC Output & Metropolis-Hastings Algorithm Part I
Bayesian estimation Bayes’s theorem: prior, likelihood, posterior
Bayesian data analysis
Model Inference and Averaging
Confidence Intervals for Proportions
CHAPTER 8 Estimating with Confidence
Predictive distributions
More about Posterior Distributions
(Very Brief) Introduction to Bayesian Statistics
Multidimensional Integration Part I
Bayesian Inference, Basics
CHAPTER 8 Estimating with Confidence
Computing and Statistical Data Analysis / Stat 8
CHAPTER 8 Estimating with Confidence
#21 Marginalize vs. Condition Uninteresting Fitted Parameters
Parametric Methods Berlin Chen, 2005 References:
CHAPTER 8 Estimating with Confidence
CHAPTER 8 Estimating with Confidence
CHAPTER 8 Estimating with Confidence
CS639: Data Management for Data Science
Conditional Probability Assaf Oron, May 2008
CHAPTER 8 Estimating with Confidence
CHAPTER 8 Estimating with Confidence
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Mathematical Foundations of BME Reza Shadmehr
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Computing and Statistical Data Analysis / Stat 10
Applied Statistics and Probability for Engineers
Presentation transcript:

Bayesian Statistics on a Shoestring Assaf Oron, May 2008 Stat 391 – Lecture 12 Bayesian Statistics on a Shoestring Assaf Oron, May 2008

Bayes’ Rule – and “Bayesians” Bayes lived and proved his rule a long time ago The rule, and the updating principle associated with it, belong to all branches of statistics The term “Bayesian statistics” is modern. Depending upon whom you ask, it may represent: A perspective and toolset, which are useful for many tasks; The only way to do statistics intelligently; …An irrational cult! (it’s somewhat of a generational gap right now) … I will try to present Bayesian statistics via the description marked in blue above

The Basic Principle Recall the trick we did a few weeks ago, calling the density “likelihood” and viewing it as a function of the fixed parameters Recall also more recently, the awkward jargon used to describe confidence intervals These somewhat inelegant fixes can be traced down to an asymmetry: The data are modeled as following some probability distribution The parameters are modeled as fixed, though usually unknown What if we decided that the parameters are random, too?...

The Basic Principle (2) Let’s view the data as an r.v. called X Parameters are, of course, θ; Write down Bayes’ rule, using densities: This is the ‘regular’ (“frequentist”) likelihood of the data given fixed parameter values This is the ‘prior’ density of the parameters (based on previous knowledge, usually unrelated to the current data) This marginal probability of the data over all possible parameter configurations, is not a function of θ and is irrelevant for estimation

The Basic Principle (3) …the Bayesian way of writing Bayes’ rule is usually this: The posterior distribution of the parameters, based on the data The prior distribution of the parameters, before the data (Since we omitted the marginal probability of the data, the equation becomes a proportionality; we don’t care, since we know the LHS is a density we can “find” the missing factor automatically by normalizing the integral of the LHS to 1)

Bayesian Estimation Bayesian estimation is based primarily on probability calculations from the posterior, The most common Bayesian point estimates are the posterior mean (i.e., E[θ|x]), median or mode These can be framed as solutions to different loss-minimization problems

A Brief History of Bayesianism The Bayesian idea has been around for while, but sat mostly on the shelf for practical reasons: If you take any two arbitrary distributions for data and prior, you will end up with an intractably complicated posterior (for each “common” data distribution, there exists at least one type of prior that fits it well; it is known as the “conjugate prior”) With the advent of computing, a statistical-simulation technology known as MCMC (“Markov Chain Monte Carlo”) has made (nearly) any combination of distributions possible to compute, sometimes instantly

Conjugate Prior Hands-on The conjugate prior for the Binomial is the Beta That is: X ~ Binomial(n,p) and p ~ Beta(α,β) should match nicely Write out the kernel of the posterior (i.e., the essential form – only terms with x or p in them): Simplify this a bit further; can you recognize the form of the posterior?

Advantages of Bayesian Methods A symmetry that is conceptually attractive Can incorporate prior content information (from scientists, etc.) that should play a role in evaluation of the data Hypothesis tests, model selection, confidence intervals become easier Risk of wrong model (=“model misspecification”) can be reduced More complete information about parameters

Advantages of Bayesian Methods (2) Avoids some of the counter-intuitive side-effects of MLE calculations Ability to fit complicated models, estimate complicated parameters, accommodate for errors in “fixed” values In many cases, a random interpretation fits the parameters more than a fixed one: Opinion polls and human behavior Ecology, Demographics (coming to think about it, natural populations are never really fixed)

Drawbacks of Bayesian Methods Symmetry? Not really “It’s Tortoises all the way down”: the prior needs parameters too… and they better be fixed, or else; which is exactly the problem The prior affects our estimation, whether or not it is really based on expert knowledge A workaround known as “flat” or “improper” priors, has made things worse in many ways: if you use them, you may find yourself not having a posterior distribution at all

Drawbacks of Bayesian Methods (2) Choice of prior form and details – adds yet another arbitrary element to the tenuous connection between model and reality MCMC simulations have a lot of “moving parts” and are not trivial to diagnose for problems Socially, the approach has “hype”, and dogmatic “group-think” overtones that are not helpful In many cases, a random interpretation is not appropriate