St5219: Bayesian hierarchical modelling lecture 2.1.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Introduction to Monte Carlo Markov chain (MCMC) methods
A Tutorial on Learning with Bayesian Networks
Estimation, Variation and Uncertainty Simon French
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Bayesian Estimation in MARK
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Bayesian posterior predictive probability - what do interim analyses mean for decision making? Oscar Della Pasqua & Gijs Santen Clinical Pharmacology Modelling.
1 Methods of Experimental Particle Physics Alexei Safonov Lecture #21.
.. . Parameter Estimation using likelihood functions Tutorial #1 This class has been cut and slightly edited from Nir Friedman’s full course of 12 lectures.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Parameter Estimation using likelihood functions Tutorial #1
Hidden Markov Model 11/28/07. Bayes Rule The posterior distribution Select k with the largest posterior distribution. Minimizes the average misclassification.
Basics of Statistical Estimation. Learning Probabilities: Classical Approach Simplest case: Flipping a thumbtack tails heads True probability  is unknown.
LARGE SAMPLE TESTS ON PROPORTIONS
Results 2 (cont’d) c) Long term observational data on the duration of effective response Observational data on n=50 has EVSI = £867 d) Collect data on.
Presenting: Assaf Tzabari
A Discussion of the Bayesian Approach Reference: Chapter 1 and notes from Dr. David Madigan.
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Applied Bayesian Analysis for the Social Sciences Philip Pendergast Computing and Research Services Department of Sociology
Introduction to Bayesian Parameter Estimation
Thanks to Nir Friedman, HU
Bayesian Analysis for Extreme Events Pao-Shin Chu and Xin Zhao Department of Meteorology School of Ocean & Earth Science & Technology University of Hawaii-
A Practical Course in Graphical Bayesian Modeling; Class 1 Eric-Jan Wagenmakers.
Crash Course on Machine Learning
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Statistical Decision Theory
Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013.
Slide 1 Tutorial: Optimal Learning in the Laboratory Sciences The knowledge gradient December 10, 2014 Warren B. Powell Kris Reyes Si Chen Princeton University.
Theory of Probability Statistics for Business and Economics.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
Simple examples of the Bayesian approach For proportions and means.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
MPS/MSc in StatisticsAdaptive & Bayesian - Lect 71 Lecture 7 Bayesian methods: a refresher 7.1 Principles of the Bayesian approach 7.2 The beta distribution.
1 Optimizing Decisions over the Long-term in the Presence of Uncertain Response Edward Kambour.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Univariate Gaussian Case (Cont.)
- 1 - Outline Introduction to the Bayesian theory –Bayesian Probability –Bayes’ Rule –Bayesian Inference –Historical Note Coin trials example Bayes rule.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Sampling Theory Determining the distribution of Sample statistics.
1 Chapter 6 SAMPLE SIZE ISSUES Ref: Lachin, Controlled Clinical Trials 2:93-113, 1981.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
Outline Historical note about Bayes’ rule Bayesian updating for probability density functions –Salary offer estimate Coin trials example Reading material:
NASSP Masters 5003F - Computational Astronomy Lecture 4: mostly about model fitting. The model is our estimate of the parent function. Let’s express.
CSC321: Lecture 8: The Bayesian way to fit models Geoffrey Hinton.
Hierarchical Models. Conceptual: What are we talking about? – What makes a statistical model hierarchical? – How does that fit into population analysis?
Prediction and Missing Data. Summarising Distributions ● Models are often large and complex ● Often only interested in some parameters – e.g. not so interested.
Bayesian Estimation and Confidence Intervals Lecture XXII.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Univariate Gaussian Case (Cont.)
When is the post-test probability sufficient for decision-making?
Oliver Schulte Machine Learning 726
Bayesian Estimation and Confidence Intervals
MCMC Output & Metropolis-Hastings Algorithm Part I
Bayes Net Learning: Bayesian Approaches
Inference Concerning a Proportion
LECTURE 09: BAYESIAN LEARNING
LECTURE 07: BAYESIAN ESTIMATION
CS 594: Empirical Methods in HCC Introduction to Bayesian Analysis
Learning From Observed Data
Bayesian Statistics on a Shoestring Assaf Oron, May 2008
Mathematical Foundations of BME Reza Shadmehr
Presentation transcript:

st5219: Bayesian hierarchical modelling lecture 2.1

 Priors: how to choose them, different types  The normal distribution in Bayesianism  Tutorial 1: over to you  Computing posteriors:  Monte Carlo  Importance Sampling  Markov chain Monte Carlo

FREQUENTISMBAYESIANISM  Something with a long run frequency distribution  E.g. coin tosses  Patients in a clinical trial  “Measurement” errors?  Everything  What you don’t know is random  Unobserved data, parameters, unknown states, hypotheses  Observed data still arise from probability model Knock on effects on how to estimate things and assess hypotheses

CHOOSING A PRIORDOING COMPUTATIONS  Very misunderstood  “How did you choose your priors?”  Please never answer “Oh, I just made them up”  For data analysis, you need strong rationale for choice of prior  (later)

 Following infection: body creates antibodies  These target pathogen and remain in the blood  Antibodies can provide data on historic disease exposure

Cook, Chen, Lim (2010) Emerg Inf Dis DOI: /EID

Singapore study longitudinal Chen et al (2010) J Am Med Assoc 303:

 Observation in ( x ij,2 x ij ) for individual i, observation j  Define “seroconversion” to be a “four-fold” rise in antibody levels, i.e.  y i = 1 if x i2 ≥ 4 x i1 and 0 otherwise  Out of 727 participants with follow up, we have 98 seroconversions Q: what proportion were infected?

 Seroconversion “test” not perfect: something about 80%  Infection rate should be higher than seroconversion rate Board work

 Need some priors  Last time: “U(0,1) good way to represent lack of knowledge of a probability”  Before we collected the JAMA data, we didn’t know what p would be, and a prior p~U(0,1) makes sense  But there are data out there on σ !

Zambon et al (2001) Arch Intern Med 161:

 m = 791  y = 629 This can give you a prior!!! σ~Be(630,163) Board work

NON-INFORMATIVEINFORMATIVE  p ~U(0,1)  σ ²~U(0,∞)  μ ~U(- ∞, ∞)  β ~N(0,1000²)  Should give you no information about that parameter except what is in the data  σ ~Be(630,163)  μ ~N(15.2,6.8²)  Lets you supplement natural information content of the data when not enough information on that aspect  Can give information on other parameters indirectly

Scenario 1. You are trying to reach an optimal decision in the presence of uncertainty: use whatever information you can, even if subjective, via informative priors Scenario 2. You are trying to estimate parameters for a scientific data analysis (you cannot or don’t want to use external data): use non-informative prior Scenario 3. You are trying to estimate parameters for a scientific data analysis (you have good external data): use non-informative priors for those bits you have no data for or in which you want your own data to speak for themselves; use informative priors elsewhere

Step 1: uniform prior for σ Step 2: fit model to Zambon data Step 3: posterior for that becomes prior for main analysis Board work

 The beta distribution is conjugate to a binomial model, in that if you start with a beta prior and use it in a binomial model for p and x, you end with a beta posterior of known form  I.e. if p~Be(a,b) and x~Bin(n,p), p|x~Be(a+x,b+n-x)  Other conjugate priors exist for simple models, e.g.... Board work

 It’s the incremental nature of accumulated knowledge  Eg Zambon study: StagePrior Data ( y, m )= Posterior 0Be(1,1)(0,0)Be(1,1) 1 (1,1)Be(2,1) 2 (1,2)Be(2,2) 3 (1,3)Be(2,3) 4 (2,4)Be(3,4)

 You can think of the parameters of the beta(a,b) as representing  a best guess of the proportion, a/(a+b)  a “sample size” that the prior is equivalent to (a+b)  This is an easy way to transform published results into beta priors: take the point estimate (MLE, say) and the sample size and transform to get a and b.  (So a uniform prior is like adding one positive and one negative value to your data set: is this fair???)

 Take a point estimate and CI and convert to 2 parameters to represent your prior.  Eg the infectious period is a popular parameter in infectious disease epidemiology: the average time from infection to recovery  For no good reason, often assumed to be exponential with mean λ, say  Fraser et al (2009) Science 324: suggest estimate of generation period of 1.91 with 95%CI (1.3,2.71) Board work

 I mentioned U(-∞, ∞) as a non-informative prior. What’s the density function for U(- ∞, ∞)? Board work

 A prior such as U(-∞, ∞) is called an improper prior as it does not have a proper density function.  Improper priors sometimes give proper posteriors: depending on the integral of the likelihood.  Not an improper prior is a proper one

 Just because a prior is flat in one representation does not mean it is flat in another  Eg for an exponential model (for survival analysis say) Board work