ABC: Bayesian Computation Without Likelihoods David Balding Centre for Biostatistics Imperial College London (www.icbiostatistics.org.uk)

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Pattern Recognition and Machine Learning
Generalised linear mixed models in WinBUGS
Introduction to Monte Carlo Markov chain (MCMC) methods
MCMC estimation in MlwiN
Lecture 18: Temporal-Difference Learning
Jose-Luis Blanco, Javier González, Juan-Antonio Fernández-Madrigal University of Málaga (Spain) Dpt. of System Engineering and Automation May Pasadena,
Factorial Mixture of Gaussians and the Marginal Independence Model Ricardo Silva Joint work-in-progress with Zoubin Ghahramani.
CSCE643: Computer Vision Bayesian Tracking & Particle Filtering Jinxiang Chai Some slides from Stephen Roth.
Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin Presented by Yuting Qi 12/01/2006.
Monte Carlo Methods and Statistical Physics
Pattern Recognition and Machine Learning
Some Developments of ABC David Balding John Molitor David Welch Imperial College London.
Background The demographic events experienced by populations influence their genealogical history and therefore the pattern of neutral polymorphism observable.
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
METHODS FOR HAPLOTYPE RECONSTRUCTION
Recombination and genetic variation – models and inference
TRIM Workshop Arco van Strien Wildlife statistics Statistics Netherlands (CBS)
Sampling distributions of alleles under models of neutral evolution.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 10: The Bayesian way to fit models Geoffrey Hinton.
Data and Statistics: New methods and future challenges Phil O’Neill University of Nottingham.
Statistical inference for epidemics on networks PD O’Neill, T Kypraios (Mathematical Sciences, University of Nottingham) Sep 2011 ICMS, Edinburgh.
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Exact Computation of Coalescent Likelihood under the Infinite Sites Model Yufeng Wu University of Connecticut ISBRA
Association Mapping of Complex Diseases with Ancestral Recombination Graphs: Models and Efficient Algorithms Yufeng Wu UC Davis RECOMB 2007.
Particle filters (continued…). Recall Particle filters –Track state sequence x i given the measurements ( y 0, y 1, …., y i ) –Non-linear dynamics –Non-linear.
Approximate Bayesian Methods in Genetic Data Analysis Mark A. Beaumont, University of Reading,
Today Introduction to MCMC Particle filters and MCMC
Monte Carlo methods for estimating population genetic parameters Rasmus Nielsen University of Copenhagen.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Queensland University of Technology CRICOS No J Towards Likelihood Free Inference Tony Pettitt QUT, Brisbane Joint work with.
Computer vision: models, learning and inference Chapter 19 Temporal models.
SIS Sequential Importance Sampling Advanced Methods In Simulation Winter 2009 Presented by: Chen Bukay, Ella Pemov, Amit Dvash.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Monte Carlo Methods1 T Special Course In Information Science II Tomas Ukkonen
ABC The method: practical overview. 1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU.
Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of.
Bayesian MCMC QTL mapping in outbred mice Andrew Morris, Binnaz Yalcin, Jan Fullerton, Angela Meesaq, Rob Deacon, Nick Rawlins and Jonathan Flint Wellcome.
Continuous Variables Write message update equation as an expectation: Proposal distribution W t (x t ) for each node Samples define a random discretization.
Getting Parameters from data Comp 790– Coalescence with Mutations1.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
INTRODUCTION TO Machine Learning 3rd Edition
FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Sampling and estimation Petter Mostad
By Mireya Diaz Department of Epidemiology and Biostatistics for EECS 458.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Fixed Parameters: Population Structure, Mutation, Selection, Recombination,... Reproductive Structure Genealogies of non-sequenced data Genealogies of.
Markov-Chain-Monte-Carlo (MCMC) & The Metropolis-Hastings Algorithm P548: Intro Bayesian Stats with Psych Applications Instructor: John Miyamoto 01/19/2016:
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
Institute of Statistics and Decision Sciences In Defense of a Dissertation Submitted for the Degree of Doctor of Philosophy 26 July 2005 Regression Model.
CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.
Introduction to Sampling based inference and MCMC
MCMC Output & Metropolis-Hastings Algorithm Part I
Constrained Hidden Markov Models for Population-based Haplotyping
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Hidden Markov Models Part 2: Algorithms
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
The coalescent with recombination (Chapter 5, Part 1)
Probabilistic Surrogate Models
Presentation transcript:

ABC: Bayesian Computation Without Likelihoods David Balding Centre for Biostatistics Imperial College London (

Bayesian inference via rejection from prior I Generate a posterior random sample for a parameter of interest θ by a mechanical version of Bayes Theorem: 1.simulate θ from its prior; 2.accept/reject, with P(accept) likelihood; 3.if not enough acceptances yet, go to 1. Problem: if likelihood involves integration over many nuisance parameters, hard/slow to compute. Solution: use simulation to approximate likelihood.

Bayesian inference via rejection from prior II Generate an approximate posterior random sample: 1.simulate parameter vector θ from its prior; 2.simulate data X given value of θ from 1.; 2a.if X matches observed data, accept θ; 3.if not enough acceptances yet, go to 1. Problem: simulated X hardly ever matches observed. Solution: relax 2a so that θ is accepted when X is close to observed data; close to is usually measured in terms of a vector of summary statistics, S.

Summary statistic, S Parameter, Prior – p( ) Marginal likelihood – p(S) Posterior density – p( | S) Likelihood – p(S | )

Approximate Bayesian Computing (ABC) We simulate to approximate (1) the joint parameter/ data density then (2) a slice at the observed data. Few if any simulated points will lie on this slice so need to assume smoothness: required posterior is approximately the same for datasets close to that observed. Note: (1) we get approximate likelihood inferences but we didnt calculate the likelihood (2) different definitions of close can be tried for the same set of simulations (3) these can even be retained and used for different observed datasets.

θ values of these points are treated as random sample from posterior

When to use ABC ? When likelihood is hard to compute because of need for integration over many nuisance parameters BUT easy to simulate –Population genetics: nuisance parameters are the branching times and topology of the genealogical tree underlying the observed DNA sequences/genes. –Epidemic models: nuisance parameters are infection times and infectious periods. ABC implies 3 approximations: 1. finite # simulations; 2. non-sufficiency of S; 3. S need not match S exactly

Population genetics example Parameters: N = effective population size; μ = mutation rate per generation; G = genealogical tree (topology + branch lengths) – nuisance Summary Statistics: S 1 number of distinct alleles/sequences S 2 number of polymorphic/segregating sites Algorithm: 1. simulate N and μ from joint prior 2. simulate G from the standard coalescent model 3. simulate mutations on G and calculate S* 4. accept (N, μ,G) if S* S This generates a sample from the joint posterior of (N, μ,G). To make inference about θ =2Nμ, simply ignore G.

Model comparison via ABC Can also use ABC for model comparison, as well as for parameter estimation within models. Ratio of acceptances: approximates the Bayes Factor. Better: fit (weighted) multinomial regression to predict model from observed data. Beaumont (2006) used this to infer the topology of a tree representing the history of 3 Californian fox populations.

Problems/limitations Rejection-ABC is very inefficient: most simulated datasets are far from observed and must be rejected. No learning. How to find/assess good summary statistics? –Too many summary statistics can make matters worse (see later) How to choose metric for (high-dimensional) S

Beaumont, Zhang, and DJB Approximate Bayesian Computation in Population Genetics. Genetics 162: , 2002 Use local-linear regression to adjust for the distance between observed and simulated datasets. Use a smooth (Epanechnikov) weighting according to distance. Can now weaken the close criterion (i.e. increase the tolerance) and utilize many more points.

1 0 Summary Statistic Weight Parameter

1 0

Estimation of scaled mutation rate = 2N Tolerance Relative mean square error MCMC Standard Rejection With regression adjustment Summary statistics:- mean variance in length mean heterozygosity number of haplotypes i.e. 3 numbers Full data:- 445 Y chromosomes each typed at 8 microsatellite loci i.e numbers

Population growth Population constant size N A until t generations ago, then exponentially rate r per gen. growth to N C. 4 model params, but only 3 identifiable. We choose: Data same as above, except smaller sample size n = 200 (because of time taken for MCMC to converge).

Standard rejection method : Estoup et al. (2002, Genetics)– Demographic history of invasion of islands by cane toads. 10 microsatellite loci, 22 allozyme loci. 4/3 summary statistics, 6 demographic parameters. Estoup and Clegg (2003, Molecular Ecology) – Demographic history of colonisation of islands by silvereyes. With regression adjustment : Tallmon et al (2004, Genetics) – Estimating effective population size by temporal method. One main parameter of interest (Ne), 4 summary statistics. Estoup et al. (2004, Evolution) – Demographic history of invasion of Australia by cane toads. 75/63 summary statistics, model comparison, up to 5 demographic parameters. ABC applications in population genetics:

More sophisticated regressions? Although global linear regression usually gives a poor fit to joint θ/S density, Calabrese (USC, unpublished) uses projection pursuit regression: to fit a large feature set of summary statistics. Iterate to improve fit within vicinity of S. Application to estimate human recombination hotspots. Could also consider quantile regression to adapt adjustment to different parts of the distribution.

Do ABC within MCMC Marjoram et al. (2003). Two accept/reject steps: 1.Simulate a dataset at the current parameter values; if it isnt close to observed data, start again. 2.If it is close, accept or reject according to prior ratio times Hastings ratio (no likelihood ratio) Note: now close must be defined in advance; also cannot reuse simulations for different observed datasets. Can apply regression- adjustment to MCMC outputs. Problems: 1.proposals in tree space 2.few acceptances in tail of target distribution - stickiness

Importance sampling within MCMC In fact, the Marjoram et al. MCMC approach can be viewed as a special case of a more general approach developed by Beaumont (2003). Instead of simulating a new dataset forward-in-time, Beaumont used a backward-in-time IS approach to approximate the likelihood. His proof of the validity of the algorithm is readily extended to forwards-in-time approaches based on one or multiple datasets (cf ONeill et al. 2000). Could also use a regression adjustment.

ABC within Sequential MC Sisson et al at UNSW, Sydney Sample initial generation of θ particles from prior. Sample θ from previous generation, propose new value and generate dataset; calculate S*. Repeat until S* S – BUT tolerance reduces each gen. Calculate prior ratio times Hastings ratio: use as weight W for sampling the next generation. If variance of W is large, resample with replacement according to W and set all W=1/N. Application to estimate parameters of TB infection.

Adaptive simulation algorithm (Molitor and Welch, in progress) simulate N values of θ from prior calculate corresponding datasets and use similarity of S* with S to generate a density resample from density, replace value with lowest similarity of S* and S. use final density as importance sampling weights for a conventional ABC. –idea is to use preliminary pseudo-posterior based on weights to choose something better than prior as basis for ABC

"number of data generation steps for rejection ABC" [1] 35064[2] "number of data generation steps for SMC ABC" [1] 14730[2] "number of data generation steps for Johns ABC" [1] 10314[2] 6130

ABC to rescue poor estimators (inspired by DJ Wilson, Lancaster) evaluate estimator based on simplistic model at many datasets simulated under more sophisticated model. for observed dataset, use as estimator regression predictor of simplistic estimator at the observed data value. for example, many population genetics estimators assume no recombination, and infinite sites mutation model –use this estimator and simulations to correct for recombination and finite-sites mutation

Acknowledgments David Welch and John Molitor, both of Imperial College. David has just started on an EPSRC grant to further develop ABC ideas and apply particularly in population genomics.