ABC The method: practical overview. 1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics.

Slides:



Advertisements
Similar presentations
ABC: Bayesian Computation Without Likelihoods David Balding Centre for Biostatistics Imperial College London (
Advertisements

Introduction to Monte Carlo Markov chain (MCMC) methods
MCMC estimation in MlwiN
A Bayesian random coefficient nonlinear regression for a split-plot experiment for detecting differences in the half- life of a compound Reid D. Landes.
Hidden Markov Model in Biological Sequence Analysis – Part 2
Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin Presented by Yuting Qi 12/01/2006.
Some Developments of ABC David Balding John Molitor David Welch Imperial College London.
Background The demographic events experienced by populations influence their genealogical history and therefore the pattern of neutral polymorphism observable.
METHODS FOR HAPLOTYPE RECONSTRUCTION
Bayesian Estimation in MARK
TRIM Workshop Arco van Strien Wildlife statistics Statistics Netherlands (CBS)
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Gibbs Sampling Qianji Zheng Oct. 5th, 2010.
Markov-Chain Monte Carlo
Sampling distributions of alleles under models of neutral evolution.
Chapter 4: Linear Models for Classification
Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.
Particle filters (continued…). Recall Particle filters –Track state sequence x i given the measurements ( y 0, y 1, …., y i ) –Non-linear dynamics –Non-linear.
Course overview Tuesday lecture –Those not presenting turn in short review of a paper using the method being discussed Thursday computer lab –Turn in short.
Approximate Bayesian Methods in Genetic Data Analysis Mark A. Beaumont, University of Reading,
Today Introduction to MCMC Particle filters and MCMC
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Monte Carlo methods for estimating population genetic parameters Rasmus Nielsen University of Copenhagen.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Queensland University of Technology CRICOS No J Towards Likelihood Free Inference Tony Pettitt QUT, Brisbane Joint work with.
Bayesian parameter estimation in cosmology with Population Monte Carlo By Darell Moodley (UKZN) Supervisor: Prof. K Moodley (UKZN) SKA Postgraduate conference,
Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.
Speciation history inferred from gene trees L. Lacey Knowles Department of Ecology and Evolutionary Biology University of Michigan, Ann Arbor MI
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Overview Particle filtering is a sequential Monte Carlo methodology in which the relevant probability distributions are iteratively estimated using the.
Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models Mike West Computing Science and Statistics, Vol. 24, pp , 1993.
Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of.
Bayesian MCMC QTL mapping in outbred mice Andrew Morris, Binnaz Yalcin, Jan Fullerton, Angela Meesaq, Rob Deacon, Nick Rawlins and Jonathan Flint Wellcome.
Virtual Vector Machine for Bayesian Online Classification Yuan (Alan) Qi CS & Statistics Purdue June, 2009 Joint work with T.P. Minka and R. Xiang.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.
FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
CHAPTER 17 O PTIMAL D ESIGN FOR E XPERIMENTAL I NPUTS Organization of chapter in ISSO –Background Motivation Finite sample and asymptotic (continuous)
California Pacific Medical Center
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
Reducing MCMC Computational Cost With a Two Layered Bayesian Approach
Multilevel and multifrailty models. Overview  Multifrailty versus multilevel Only one cluster, two frailties in cluster e.g., prognostic index (PI) analysis,
Tutorial I: Missing Value Analysis
Lecture 22: Quantitative Traits II
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
HW7: Evolutionarily conserved segments ENCODE region 009 (beta-globin locus) Multiple alignment of human, dog, and mouse 2 states: neutral (fast-evolving),
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Neural Networks - Berrin Yanıkoğlu1 MLP & Backpropagation Issues.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Markov Chain Monte Carlo in R
Missing data: Why you should care about it and what to do about it
MCMC Output & Metropolis-Hastings Algorithm Part I
IMa2(Isolation with Migration)
Ch3: Model Building through Regression
CJT 765: Structural Equation Modeling
Bayesian inference Presented by Amir Hadadi
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Ch13 Empirical Methods.
CS 188: Artificial Intelligence Fall 2008
Bruce Rannala, Jeff P. Reeve  The American Journal of Human Genetics 
Presentation transcript:

ABC The method: practical overview

1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics of an ABC methodology 2. Algorithm of an ABC inference 3. Limitations of the ABC approach 4. Typical ABC run 4. Present work 1. Compare the ABC algorithm with MCMC 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index

1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics of an ABC methodology 2. Algorithm of an ABC inference 3. Limitations of the ABC approach 4. Typical ABC run 4. Present work 1. Compare the ABC algorithm with MCMC 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index

1.Application of ABC in population genetics Pop anc Pop 3 Pop 4 Pop 2 Pop 1

1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics of an ABC methodology 2. Algorithm of an ABC inference 3. Limitations of the ABC approach 4. Typical ABC run 4. Present work 1. Compare the ABC algorithm with MCMC 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index

 Two processes are usually considered important in determining population structure: - Gene flow; - Population splitting.  Most often these processes are modelled and inferred separately;  Recent advances by Nielsen and Wakeley (2001) and Hey and Nielsen (2004) for two-population scenario using Markov Chain Monte Carlo (MCMC) can study both processes at the same time;  An Approximate Bayesian Computation (ABC) method developed by (Beaumont, 2006) deals with the same problem but in a three-population scenario. The idea is to avoid problems associated with MCMC such as poor-mixing and long convergence times. But it relies in a couple of approximations. The aim of this study is to see how good these approximations are. 2.Motivation for the application of ABC

Wakeley, Hey (1997, Genetics) - developed an algorithm to estimate historic demographic parameters. Nielsen, Wakeley (2001, Genetics) - developed a MCMC algorithm to infer about demographic parameters in a “Isolation with Migration” model. Hey, Nielsen (2004, Genetics) - presents the IM program (software that uses the MCMC algorithm previously developed). Hey et al (2004, Mol. Ecol.) - introduce changes in IM software (HapSTR data can be used). Won, Hey (2005, Mol. Biol. Evol.) - presents a case study in 3 populations of chimpanzees. Hey (2005, PLoS. Biol.) – the peopling of the Americas. Introduce changes in IM software (founder population size can be inferred). Background using MCMC: 2.Motivation for the application of ABC

Background using ABC: 2.Motivation for the application of ABC Tavaré et al. (1997, Genetics) – presented a simulation based-algorithm to infer about specific demographic parameters Pritchard et al. (1999, MBE) - introduce the first ABC approach with a rejection method step to estimate demographic parameters. Beaumont et al. (2002, Genetics) – introduce a regression method within a ABC framework to estimate demographic parameters. Marjoram et al (2003, PNAS) – uses MCMC without likelihoods within an ABC framework. Beaumont (2006, “Simulation, Genetics, and Human Prehistory”) - uses regression based ABC to estimate demographic parameters within a “Isolation with Migration” model for microsatellites in three populations. Hickerson et al (2006, in press) – compares ABC with IM in two-population studies for sequence data.

 Estoup and Clegg (2003, Mol. Evol.)  Plagnol and Tavare (2003, “Monte Carlo and Quasi-Monte Carlo Methods 2002”)  Estoup et al. (2004, Evolution)  Tallmon et al. (2004, Genetics)  Excoffier et al. (2005, Genetics) - introduce the studies in admixture events  Hamilton et al. (2005, Genetics) - introduced WED (Weighted Euclidian distance)  Tanaka et al. (2006, Genetics) - applied to disease transmission  Sisson et al. (2006, under submission) - introduced a Sequential ABC approach (SABC) Examples of ABC use after (Beaumont, 2002): 2.Motivation for the application of ABC

1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics of an ABC methodology 2. Algorithm of an ABC inference 3. Limitations of the ABC approach 4. Typical ABC run 4. Present work 1. Compare the ABC algorithm with MCMC 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index

Replace the data with summary statistics: 2.ABC approach 2.Characteristics of an ABC methodology Get the posterior distribution by sampling values from it: 1.Simulate samples  i, D i from the joint density p( ,D): 1.First sample from the prior:  i ~ p(  ) 2.Then simulate the data, given  i : D i ~ p(D |  i ) 2.The posterior distribution, p (  | D ) = p ( D,  ) / p ( D ), for any given D, can be estimate by the proportion of all simulated points that correspond to that particular D and  divided by the proportion of points corresponding to D (ignoring  ).  Summarize a large amount of data into a few representative values  By replacing the data with summary statistics, it is easier to decide how ‘similar’ data sets are to each other.

1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Bayesian inference on population genetics 2. Characteristics of an ABC methodology 3. Algorithm of an ABC inference 4. Limitations of the ABC approach 5. Typical ABC run 4. Present work 1. Compare the ABC algorithm with MCMC 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index

2.ABC approach 2.Algorithm of an ABC inference     SummStats, S Parameter,  Joint distribution (S,  ) Set of priors (  Get summary statistics (S) Obtained genetic data s’s’ in (Nordborg, 2001)

By extracting the points near the real data set we obtain the posterior: 2.Algorithm of an ABC inference 2.ABC approach SummStats, S Parameter,   Joint distribution (S,  ) Posterior distribution – p(   | S=s’) p s’s’

Mode Mean Posterior distribution – p(   | S=s’) p Point estimate of parameter  1 2.ABC approach 2.Algorithm of an ABC inference Credible Interval

1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics of an ABC methodology 2. Algorithm of an ABC inference 3. Limitations of the ABC approach 4. Typical ABC run 4. Present work 1. Compare the ABC algorithm with MCMC 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index

2.ABC approach Natural limitation due to lack of information in data sets Limitation on the number of summary statistics used Limitation on the calculation of summary statistic (time consuming) Limitation on the time consumption of the simulation step 3.Limitations

2.ABC approach Natural limitation due to lack of information in data sets Limitation on the number of summary statistics used Limitation on the calculation of summary statistic (time consuming) Limitation on the time consumption of the simulation step 3.Limitations

3.ABC approach Limitation on the number of summary statistics used  S s’ ( , S = s’) s’ ( , S 1 = s’ 1, S 2 = s’ 2 ) s’ 2 s’ 1  S1S1 S2S2 Summary Statistics = 1 Summary Statistics = 2

2.ABC approach Natural limitation due to lack of information in data sets Limitation on the number of summary statistics used Limitation on the calculation of summary statistic (time consuming) Limitation on the time consumption of the simulation step 3.Limitations

1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Bayesian inference on population genetics 2. Algorithm of an ABC inference 3. Limitations of the ABC approach 4. Typical ABC run 4. Present work 1. Compare the ABC algorithm with MCMC 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index

3.ABC approach 2.Typical ABC run Compute distance between “real” data and simulated data Retain simulated data closest to “real” data Estimate parameters from the posterior distributions obtained from the retained simulated data yes no Step1 - simulationStep2 – getting posterior distribution Step3 - estimation a)Choosing the priors b)Choosing the summary statistics c)Choosing a “rejection” method of the simulated data

3.ABC approach 2.Typical ABC run Using posterior distribution information in priors distributions: Prior distribution – p(   ) p

3.ABC approach 2.Typical ABC run Using posterior distribution information in priors distributions: Prior distribution – p(   ) p Using kernel estimator

3.ABC approach 2.Typical ABC run Using posterior distribution information in priors distributions: p Randomly chose points from the posterior distribution 1’1’ Prior (  1 =  1 ’ ) Posterior(  1 =  1 ’ ) weight Simulation parameter Prior distribution – p(   )

3.ABC approach 2.Typical ABC run Rejection method (Pritchard et al, 1999): SummStats, S Parameter,   tolerance s’ – “real” data Posterior distribution – p(  | S)

3.ABC approach 2.Typical ABC run Local Linear Multiple Regression adjustment and Weighting (Beaumont et al, 2002): SummStats, S Parameter,  s’ - “real” data Posterior distribution – p(  | S) Weighting Regression

where Epanechnikov kernel We want to minimize 3.ABC approach 2.Typical ABC run Spherical acceptance region Local weighting Linear multiple regression: Correlation coefficients vector Vector of standardized summstats E [P(  |S=s)] Least square error

3.ABC approach 2.Typical ABC run To obtain samples from the posterior distribution we adjust the parameter values as I.e. we are assuming that the conditional mean of the parameter is a linear function of the summary statistics, but all other moments remain the same. Least squares gives an estimate of the posterior mean

1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics of an ABC methodology 2. Algorithm of an ABC inference 3. Limitations of the ABC approach 4. Typical ABC run 4. Present work 1. Compare the ABC algorithm with MCMC 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index

Pop anc Pop 2 Pop 1 t One simple case: 4.Present Work m1m1 m2m2 Ne 1 Ne 2 Ne anc tev 1   6 parameters to be estimated +  (mutation rate)

Summary Statistics used Sequence Data: 1.mean of pairwise differences a)in each population b)both populations joined together 2.number of segregating sites a)in each population b)both populations joined together 3.number of haplotypes a)in each population b)both populations joined together 4.Present Work

Simulated “real” data and Prior information Ne 1 Ne 2 Ne anc TevMig 2 Mig 1 “real” data prior distribution ABC method MCMC method 4.Present Work

Ne 1 – no migration: sim1sim3sim2sim4sim5 sim6sim8sim7sim9sim10 4.Present Work

Ne 2 – no migration: sim1sim3sim2sim4sim5 sim6sim8sim7sim9sim10 4.Present Work

Ne anc – no migration: sim1sim3sim2sim4sim5 sim6sim8sim7sim9sim10 4.Present Work

Te 1 – no migration: sim1sim3sim2sim4sim5 sim6sim8sim7sim9sim10 4.Present Work

ABC vs MCMC: Data 1 (no migration); Simulation 7: Data 2 (migration = 0.01); Simulation 9: Ne 1 Ne 2 Ne anc Tev Ne 1 Ne 2 Ne anc TevMig 2 Mig 1 4.Present Work

ABC vs MCMC ( iter, tol=0.02): Ne1Ne2NeancMig1Mig2Tev ABC MCMC Priors Ne1Ne2NeancMig1Mig2Tev ABC MCMC Priors MISE: No migration MISE: Migration = Present Work

1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics of an ABC methodology 2. Algorithm of an ABC inference 3. Limitations of the ABC approach 4. Typical ABC run 4. Present work 1. Compare the ABC algorithm with MCMC 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index

Summary Statistics used Sequence Data: 1.mean of pairwise differences a) in each population b) both populations joined together 2.number of segregating sites a)in each population b)both populations joined together 3.number of haplotypes a)in each population b)both populations joined together 4.variance of pairwise differences a)in each population b)both populations joined together 5.Shanon’s index a)in each population b)both populations joined together 6.number of singletons a)in each population b)both populations joined together 4.Present Work

Simulated “real” data and Prior information Ne 1 Ne 2 Ne anc TevMig 2 Mig 1 “real” data prior distribution standard previous + Shanon’s previous + var pairwise dif previous + singletons MCMC based method 4.Present Work

Summary Statistics ( iter, tol=0.02): Data 1 (no migration); Simulation 7: Data 2 (migration = 0.01); Simulation 9: Ne 1 Ne 2 Ne anc Tev Ne 1 Ne 2 Ne anc TevMig 2 Mig 1 4.Present Work

Summary Statistics ( iter, tol=0.02): Data 1 (no migration); Simulation 7: Data 2 (migration = 0.01); Simulation 9: Ne 1 Ne 2 Ne anc Tev Ne 1 Ne 2 Ne anc TevMig 2 Mig 1 4.Present Work

Summary Statistics ( iter, tol=0.02): Ne1Ne2NeancMig1Mig2Tev ABC I ABC II ABC III ABC IV MCMC MISE: No migration MISE: Migration = 0.01 Ne1Ne2NeancMig1Mig2Tev ABC I ABC II ABC III ABC IV MCMC Present Work

Summary Statistics ( iter, tol=0.02): Ne1Ne2NeancMig1Mig2Tev ABC I ABC II ABC III ABC IV Adjusted R 2 : No migration Adjusted R 2 : Migration = 0.01 Ne1Ne2NeancMig1Mig2Tev ABC I ABC II ABC III ABC IV Present Work

1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics of an ABC methodology 2. Algorithm of an ABC inference 3. Limitations of the ABC approach 4. Typical ABC run 4. Present work 1. Compare the ABC algorithm with MCMC 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index

4.Three populations model m1m1 m2m2 Ne 1 Ne 3 Ne anc1 tev 2   11 parameters to be estimated + topology +  (mutation rate) Pop anc1 Pop 2 Pop 1 Pop anc2 Pop 3 tev 1 Ne anc2 Ne 2 m3m3 m anc

Simulated “real” data and Prior information Ne 1 Ne 2 Ne 3 Mig 2 Mig 1 free top fixed top Tev 2 Mig anc Mig Tev Ne anc2 Ne anc1 4.Present Work

Three Populations model (no migration): Ne 1 Ne 2 Ne 3 Tev 2 Tev 1 Ne anc2 Ne anc1 Topology: Data 1 (no migration); Simulation 7: (2,3)1) 4.Present Work

Three Populations model (migration = 0.01): Data 2 (migration = 0.01); Simulation 6: Topology: (1,2)3) Ne 1 Ne 2 Ne 3 Mig 2 Mig 1 Tev 2 Mig anc Mig 3 Tev 1 Ne anc2 Ne anc1 4.Present Work

Three Populations model ( iter, tol=0.02): MISE Ne Ne*Neanc2Neanc1Mig Mig*MigancTev2Tev1 Free Fixed No migration: Migration = 0.01: MISE Ne Ne*Neanc2Neanc1Mig Mig*MigancTev2Tev1 Free Fixed Topology Free Prior0.33- Topology Free Prior Present Work

Conclusions: ABC up to 2 orders of magnitude faster for single locus ABC modes are similar to MCMC but overall precision is lower No substantial improvement with more summary statistics No substantial improvement with more iterations ABC is able to consider more complex scenarios, but ability to infer parameters is reduced when considering migration

1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics of an ABC methodology 2. Algorithm of an ABC inference 3. Limitations of the ABC approach 4. Typical ABC run 4. Present work 1. Compare the ABC algorithm with a MCMC one 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index

The user-friendly version of the program (initial stage) Features of the program  Use of heredity scalars for each locus  Use different types of DNA data at the same time (Microsatellite and DNA sequence)  Use an unlimited number of populations within an IM model  Use of different combinations of 7 different summary statistics for each DNA data type Freeware and source code available (soon) 4.Present Work

1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics of an ABC methodology 2. Algorithm of an ABC inference 3. Limitations of the ABC approach 4. Typical ABC run 4. Present work 1. Compare the ABC algorithm with a MCMC one 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index

5.Future Developments Current Goals  Currently addressing the method to a published data set (Won & Hey, 2005)  Continue to improve the accuracy of ABC (e.g. identify better summary statistics)  Obtain better estimations for MISE (e.g. using more simulated ‘real’ data) Future Goals  Add recombination  Create a user-friendly interface  Use a variable migration rate through time  Improve ABC: sequential method non-linear regression

Acknowledgements I would like to acknowledge David Balding for helpful discussion on the methods used. And also a special thanks to Mark Beaumont for advice and comments on the work. Support for this work was provided by EPSRC.