Introduction to the bayes Prefix in Stata 15

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Introduction to Monte Carlo Markov chain (MCMC) methods
MCMC estimation in MlwiN
Lecture #4: Bayesian analysis of mapped data Spatial statistics in practice Center for Tropical Ecology and Biodiversity, Tunghai University & Fushan Botanical.
Estimation, Variation and Uncertainty Simon French
Bayesian Estimation in MARK
.. . Parameter Estimation using likelihood functions Tutorial #1 This class has been cut and slightly edited from Nir Friedman’s full course of 12 lectures.
Part 24: Bayesian Estimation 24-1/35 Econometrics I Professor William Greene Stern School of Business Department of Economics.
Markov-Chain Monte Carlo
Parameter Estimation using likelihood functions Tutorial #1
Bayesian statistics – MCMC techniques
Bayesian inference Gil McVean, Department of Statistics Monday 17 th November 2008.
1 Point and Interval Estimates Examples with z and t distributions Single sample; two samples Result: Sums (and differences) of normally distributed RV.
Computing the Posterior Probability The posterior probability distribution contains the complete information concerning the parameters, but need often.
Machine Learning CMPT 726 Simon Fraser University CHAPTER 1: INTRODUCTION.
This presentation has been cut and slightly edited from Nir Friedman’s full course of 12 lectures which is available at Changes.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Course overview Tuesday lecture –Those not presenting turn in short review of a paper using the method being discussed Thursday computer lab –Turn in short.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Thanks to Nir Friedman, HU
Bayes Factor Based on Han and Carlin (2001, JASA).
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
Modeling Menstrual Cycle Length in Pre- and Peri-Menopausal Women Michael Elliott Xiaobi Huang Sioban Harlow University of Michigan School of Public Health.
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
Additional Slides on Bayesian Statistics for STA 101 Prof. Jerry Reiter Fall 2008.
A quick intro to Bayesian thinking 104 Frequentist Approach 10/14 Probability of 1 head next: = X Probability of 2 heads next: = 0.51.
Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.
Bayesian Inference Ekaterina Lomakina TNU seminar: Bayesian inference 1 March 2013.
Exam I review Understanding the meaning of the terminology we use. Quick calculations that indicate understanding of the basis of methods. Many of the.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Likelihood and Bayes Brian O’Meara
Markov Random Fields Probabilistic Models for Images
- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.
G. Cowan RHUL Physics Bayesian Higgs combination page 1 Bayesian Higgs combination based on event counts (follow-up from 11 May 07) ATLAS Statistics Forum.
1 Bayesian Essentials Slides by Peter Rossi and David Madigan.
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
Reducing MCMC Computational Cost With a Two Layered Bayesian Approach
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
Bayesian Modelling Harry R. Erwin, PhD School of Computing and Technology University of Sunderland.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
Outline Historical note about Bayes’ rule Bayesian updating for probability density functions –Salary offer estimate Coin trials example Reading material:
SIR method continued. SIR: sample-importance resampling Find maximum likelihood (best likelihood × prior), Y Randomly sample pairs of r and N 1973 For.
Overview G. Jogesh Babu. R Programming environment Introduction to R programming language R is an integrated suite of software facilities for data manipulation,
Bayesian Estimation and Confidence Intervals Lecture XXII.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Markov Chain Monte Carlo in R
Bayesian Estimation and Confidence Intervals
MCMC Output & Metropolis-Hastings Algorithm Part I
Advanced Statistical Computing Fall 2016
Bayesian data analysis
ERGM conditional form Much easier to calculate delta (change statistics)
Markov Chain Monte Carlo
Predictive distributions
More about Posterior Distributions
(Very Brief) Introduction to Bayesian Statistics
Ch13 Empirical Methods.
Bayesian Multilevel/Longitudinal Models Using Stata
Econometrics Chengyuan Yin School of Mathematics.
Bayes for Beginners Luca Chech and Jolanda Malamud
CS 594: Empirical Methods in HCC Introduction to Bayesian Analysis
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions July chonbuk national university.
CS639: Data Management for Data Science
Markov Networks.
Mathematical Foundations of BME Reza Shadmehr
Computing and Statistical Data Analysis / Stat 10
Maximum Likelihood Estimation (MLE)
Classical regression review
Presentation transcript:

Introduction to the bayes Prefix in Stata 15 Chuck Huber StataCorp chuber@stata.com 2017 French Stata Users Group Meeting Paris, France July 6, 2017

Outline Introduction to Bayesian Analysis Coin Toss Example Priors, Likelihoods, and Posteriors Markov Chain Monte Carlo (MCMC) Bayesian Linear Regression Advantages and Disadvantages of Bayes

The bayesmh Command bayesmh sbp age sex bmi, /// likelihood(normal({sigma2})) /// prior({sbp: _cons}, normal(0,100)) /// prior({sbp: age}, normal(0,100)) /// prior({sbp: sex}, normal(0,100)) /// prior({sbp: bmi}, normal(0,100)) /// prior({sigma2}, igamma(1,1))

The bayes Prefix regress sbp age sex bayes: regress sbp age sex logistic highbp age sex bayes: logistic highbp age sex

Two Paradigms Frequentist Statistics Bayesian Statistics Model parameters are considered to be unknown but fixed constants and the observed data are viewed as a repeatable random sample. Bayesian Statistics Model parameters are random quantities which have a posterior distribution formed by combining prior knowledge about parameters with the evidence from the observed data sample.

Reverend Thomas Bayes 1701 – born in London Presbyterian Minister Amateur Mathematician Published one paper on theology and one on mathematics 1761 – died in Kent 1763 - “Bayes Theorem” paper published by friend Richard Price https://bayesian.org/bayes

What is the probability of heads (θ)? Coin Toss Example What is the probability of heads (θ)?

Prior Distribution Prior distributions are probability distributions of model parameters based on some a priori knowledge about the parameters. Prior distributions are independent of the observed data.

Beta Prior for θ 𝑃 𝜃 =𝐵𝑒𝑡𝑎 𝛼,𝛽 = Γ 𝛼+𝛽 Γ 𝛼 Γ 𝛽 𝜃 (𝛼−1) (1−𝜃) (𝛽−1)

Uninformative Prior

Different Priors

Informative Prior

Coin Toss Experiment

Likelihood Function for the Data 𝑃 𝑦|𝜃 =𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙 𝑛,𝜃 = 𝑛 𝑦 𝜃 𝑦 (1−𝜃) (𝑛−𝑦)

Prior and Likelihood

Posterior Distribution 𝑃𝑜𝑠𝑡𝑒𝑟𝑖𝑜𝑟=𝑃𝑟𝑖𝑜𝑟×𝐿𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 𝑃 𝜃|𝑦 =𝑃 𝜃 𝑃 𝑦|𝜃 𝑃 𝜃|𝑦 =𝐵𝑒𝑡𝑎 𝛼,𝛽 ×𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙 𝑛,𝜃 =𝐵𝑒𝑡𝑎 𝑦+𝛼,𝑛−𝑦+𝛽

Posterior Distribution

Effect of Uninformative Prior

Effect of Informative Prior

Markov Chain Monte Carlo Often the posterior distribution does not have a simple form. We can use Markov Chain Monte Carlo (MCMC) with the Metropolis-Hastings algorithm to generate a sample from the posterior distribution.

MCMC and Metropolis-Hastings Monte Carlo Markov Chains Metropolis-Hastings

Monte Carlo ←Proposal Distribution

Monte Carlo

Markov Chain Monte Carlo

Markov Chain Monte Carlo

Markov Chain Monte Carlo

MCMC with Metropolis-Hastings 𝑟( 𝜃 𝑛𝑒𝑤 , 𝜃 𝑡−1 )= 𝑃𝑜𝑠𝑡𝑒𝑟𝑖𝑜𝑟 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝜃 𝑛𝑒𝑤 𝑃𝑜𝑠𝑡𝑒𝑟𝑖𝑜𝑟 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝜃 𝑡−1 = 𝐵𝑒𝑡𝑎 1,1, 𝜃 𝑛𝑒𝑤 × 𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙(10,4, 𝜃 𝑛𝑒𝑤 ) 𝐵𝑒𝑡𝑎 1,1, 𝜃 𝑡−1 × 𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙(10,4, 𝜃 𝑡−1 )

MCMC with Metropolis-Hastings 𝑎𝑐𝑐𝑒𝑝𝑡𝑎𝑛𝑐𝑒 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦=𝛼( 𝜃 𝑛𝑒𝑤 , 𝜃 𝑡−1 ) =𝑚𝑖𝑛 𝑟( 𝜃 𝑛𝑒𝑤 , 𝜃 𝑡−1 ) , 1

MCMC with Metropolis-Hastings 𝐷𝑟𝑎𝑤 𝑢 ~ 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(0,1) 𝐼𝑓 𝑢<𝛼 𝜃 𝑛𝑒𝑤 , 𝜃 𝑡−1 𝑇ℎ𝑒𝑛 𝜃 𝑡 = 𝜃 𝑛𝑒𝑤 𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 𝜃 𝑡 = 𝜃 𝑡−1

MCMC with Metropolis-Hastings

MCMC with Metropolis-Hastings

MCMC with Metropolis-Hastings

MCMC with Metropolis-Hastings

MCMC with Metropolis-Hastings

MCMC with Gibbs Sampling

The bayesmh command bayesmh heads, /// likelihood(dbernoulli({theta})) /// prior({theta}, beta(1,1))

Diagnostic Plots bayesgraph diagnostics {theta}

Outline Introduction to Bayesian Analysis Coin Toss Example Priors, Likelihoods, and Posteriors Markov Chain Monte Carlo (MCMC) Bayesian Linear Regression Advantages and Disadvantages of Bayes

Bayesian Linear Regression We will ignore the sample weights to keep things simple.

𝑠𝑏𝑝 𝑖 = 𝛽 0 + 𝛽 1 𝑎𝑔𝑒 𝑖 + 𝛽 2 𝑠𝑒𝑥 𝑖 + 𝑒 𝑖

bayes options

bayes options

bayes options

bayes options

bayes options

Checking “Convergence” of the Chain Effective Sample Size Trace Plots Histograms Correlegrams Scatterplot Matrices

Checking “Convergence” of the Chain

Checking “Convergence” of the Chain bayesgraph diagnostics {sigma2}

Checking “Convergence” of the Chain bayesgraph trace {sbp: _cons age sex} {sigma2}, byparm

Checking “Convergence” of the Chain bayesgraph ac {sbp: _cons age sex} {sigma2}, byparm

Checking “Convergence” of the Chain bayesgraph histogram {sbp: _cons age sex} {sigma2}, byparm

Checking “Convergence” of the Chain bayesgraph matrix _all

Bayesian Model Selection quietly { bayes, rseed(15): regress sbp age estimates store age bayes, rseed(15): regress sbp sex estimates store sex bayes, rseed(15): regress sbp age sex estimates store full }

Bayesian Model Selection

Bayesian Model Selection

Tests

Predictions Frequentist Bayesian

Predictions

Predictions

Predictions

Convert the Matrix to a Dataset matrix pred = r(summary) clear svmat pred /* convert matrix to dataset */ rename pred1 mean rename pred2 stddev rename pred3 mcse rename pred4 median rename pred5 lower rename pred6 upper gen sex = _n<6 label define sex 0 "Female" 1 "Male" label values sex sex gen age = (_n+1)*10 if _n<6 replace age = (_n-4)*10 if _n>5

Predictions

The bayes Prefix

The bayes Prefix

The bayes Prefix

The bayes Prefix

Outline Introduction to Bayesian Analysis Coin Toss Example Priors, Likelihoods, and Posteriors Markov Chain Monte Carlo (MCMC) Bayesian Linear Regression Advantages and Disadvantages of Bayes

Advantages of Bayesian Statistics Formally incorporate prior information into studies Works when maximum likelihood estimation (MLE) fails or is not identified Does not rely on asymptotic normality like MLE Works with small sample sizes Intuitive interpretation of results such as credible intervals

US Food and Drug Administration (FDA) Quote from page 22 https://www.fda.gov/downloads/MedicalDevices/DeviceRegulationandGuidance/GuidanceDocuments/ucm071121.pdf

Disadvantages of Bayesian Statistics Subjectivity in the selection of prior distributions Computational complexity

Outline Introduction to Bayesian Analysis Coin Toss Example Priors, Likelihoods, and Posteriors Markov Chain Monte Carlo (MCMC) Bayesian Linear Regression Advantages and Disadvantages of Bayes

Thank you! Questions? chuber@stata.com