Download presentation
Presentation is loading. Please wait.
Published byRosalyn Singleton Modified over 8 years ago
1
ABC The method: practical overview
2
1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics of an ABC methodology 2. Algorithm of an ABC inference 3. Limitations of the ABC approach 4. Typical ABC run 4. Present work 1. Compare the ABC algorithm with MCMC 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index
3
1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics of an ABC methodology 2. Algorithm of an ABC inference 3. Limitations of the ABC approach 4. Typical ABC run 4. Present work 1. Compare the ABC algorithm with MCMC 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index
4
1.Application of ABC in population genetics Pop anc Pop 3 Pop 4 Pop 2 Pop 1
5
1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics of an ABC methodology 2. Algorithm of an ABC inference 3. Limitations of the ABC approach 4. Typical ABC run 4. Present work 1. Compare the ABC algorithm with MCMC 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index
6
Two processes are usually considered important in determining population structure: - Gene flow; - Population splitting. Most often these processes are modelled and inferred separately; Recent advances by Nielsen and Wakeley (2001) and Hey and Nielsen (2004) for two-population scenario using Markov Chain Monte Carlo (MCMC) can study both processes at the same time; An Approximate Bayesian Computation (ABC) method developed by (Beaumont, 2006) deals with the same problem but in a three-population scenario. The idea is to avoid problems associated with MCMC such as poor-mixing and long convergence times. But it relies in a couple of approximations. The aim of this study is to see how good these approximations are. 2.Motivation for the application of ABC
7
Wakeley, Hey (1997, Genetics) - developed an algorithm to estimate historic demographic parameters. Nielsen, Wakeley (2001, Genetics) - developed a MCMC algorithm to infer about demographic parameters in a “Isolation with Migration” model. Hey, Nielsen (2004, Genetics) - presents the IM program (software that uses the MCMC algorithm previously developed). Hey et al (2004, Mol. Ecol.) - introduce changes in IM software (HapSTR data can be used). Won, Hey (2005, Mol. Biol. Evol.) - presents a case study in 3 populations of chimpanzees. Hey (2005, PLoS. Biol.) – the peopling of the Americas. Introduce changes in IM software (founder population size can be inferred). Background using MCMC: 2.Motivation for the application of ABC
8
Background using ABC: 2.Motivation for the application of ABC Tavaré et al. (1997, Genetics) – presented a simulation based-algorithm to infer about specific demographic parameters Pritchard et al. (1999, MBE) - introduce the first ABC approach with a rejection method step to estimate demographic parameters. Beaumont et al. (2002, Genetics) – introduce a regression method within a ABC framework to estimate demographic parameters. Marjoram et al (2003, PNAS) – uses MCMC without likelihoods within an ABC framework. Beaumont (2006, “Simulation, Genetics, and Human Prehistory”) - uses regression based ABC to estimate demographic parameters within a “Isolation with Migration” model for microsatellites in three populations. Hickerson et al (2006, in press) – compares ABC with IM in two-population studies for sequence data.
9
Estoup and Clegg (2003, Mol. Evol.) Plagnol and Tavare (2003, “Monte Carlo and Quasi-Monte Carlo Methods 2002”) Estoup et al. (2004, Evolution) Tallmon et al. (2004, Genetics) Excoffier et al. (2005, Genetics) - introduce the studies in admixture events Hamilton et al. (2005, Genetics) - introduced WED (Weighted Euclidian distance) Tanaka et al. (2006, Genetics) - applied to disease transmission Sisson et al. (2006, under submission) - introduced a Sequential ABC approach (SABC) Examples of ABC use after (Beaumont, 2002): 2.Motivation for the application of ABC
10
1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics of an ABC methodology 2. Algorithm of an ABC inference 3. Limitations of the ABC approach 4. Typical ABC run 4. Present work 1. Compare the ABC algorithm with MCMC 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index
11
Replace the data with summary statistics: 2.ABC approach 2.Characteristics of an ABC methodology Get the posterior distribution by sampling values from it: 1.Simulate samples i, D i from the joint density p( ,D): 1.First sample from the prior: i ~ p( ) 2.Then simulate the data, given i : D i ~ p(D | i ) 2.The posterior distribution, p ( | D ) = p ( D, ) / p ( D ), for any given D, can be estimate by the proportion of all simulated points that correspond to that particular D and divided by the proportion of points corresponding to D (ignoring ). Summarize a large amount of data into a few representative values By replacing the data with summary statistics, it is easier to decide how ‘similar’ data sets are to each other.
12
1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Bayesian inference on population genetics 2. Characteristics of an ABC methodology 3. Algorithm of an ABC inference 4. Limitations of the ABC approach 5. Typical ABC run 4. Present work 1. Compare the ABC algorithm with MCMC 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index
13
2.ABC approach 2.Algorithm of an ABC inference SummStats, S Parameter, Joint distribution (S, ) Set of priors ( Get summary statistics (S) Obtained genetic data s’s’ in (Nordborg, 2001)
14
By extracting the points near the real data set we obtain the posterior: 2.Algorithm of an ABC inference 2.ABC approach SummStats, S Parameter, Joint distribution (S, ) Posterior distribution – p( | S=s’) p s’s’
15
Mode Mean Posterior distribution – p( | S=s’) p Point estimate of parameter 1 2.ABC approach 2.Algorithm of an ABC inference Credible Interval
16
1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics of an ABC methodology 2. Algorithm of an ABC inference 3. Limitations of the ABC approach 4. Typical ABC run 4. Present work 1. Compare the ABC algorithm with MCMC 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index
17
2.ABC approach Natural limitation due to lack of information in data sets Limitation on the number of summary statistics used Limitation on the calculation of summary statistic (time consuming) Limitation on the time consumption of the simulation step 3.Limitations
18
2.ABC approach Natural limitation due to lack of information in data sets Limitation on the number of summary statistics used Limitation on the calculation of summary statistic (time consuming) Limitation on the time consumption of the simulation step 3.Limitations
19
3.ABC approach Limitation on the number of summary statistics used S s’ ( , S = s’) s’ ( , S 1 = s’ 1, S 2 = s’ 2 ) s’ 2 s’ 1 S1S1 S2S2 Summary Statistics = 1 Summary Statistics = 2
20
2.ABC approach Natural limitation due to lack of information in data sets Limitation on the number of summary statistics used Limitation on the calculation of summary statistic (time consuming) Limitation on the time consumption of the simulation step 3.Limitations
21
1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Bayesian inference on population genetics 2. Algorithm of an ABC inference 3. Limitations of the ABC approach 4. Typical ABC run 4. Present work 1. Compare the ABC algorithm with MCMC 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index
22
3.ABC approach 2.Typical ABC run Compute distance between “real” data and simulated data Retain simulated data closest to “real” data Estimate parameters from the posterior distributions obtained from the retained simulated data yes no Step1 - simulationStep2 – getting posterior distribution Step3 - estimation a)Choosing the priors b)Choosing the summary statistics c)Choosing a “rejection” method of the simulated data
23
3.ABC approach 2.Typical ABC run Using posterior distribution information in priors distributions: Prior distribution – p( ) p
24
3.ABC approach 2.Typical ABC run Using posterior distribution information in priors distributions: Prior distribution – p( ) p Using kernel estimator
25
3.ABC approach 2.Typical ABC run Using posterior distribution information in priors distributions: p Randomly chose points from the posterior distribution 1’1’ Prior ( 1 = 1 ’ ) Posterior( 1 = 1 ’ ) weight Simulation parameter Prior distribution – p( )
26
3.ABC approach 2.Typical ABC run Rejection method (Pritchard et al, 1999): SummStats, S Parameter, tolerance s’ – “real” data Posterior distribution – p( | S)
27
3.ABC approach 2.Typical ABC run Local Linear Multiple Regression adjustment and Weighting (Beaumont et al, 2002): SummStats, S Parameter, s’ - “real” data Posterior distribution – p( | S) Weighting Regression
28
where Epanechnikov kernel We want to minimize 3.ABC approach 2.Typical ABC run Spherical acceptance region Local weighting Linear multiple regression: Correlation coefficients vector Vector of standardized summstats E [P( |S=s)] Least square error
29
3.ABC approach 2.Typical ABC run To obtain samples from the posterior distribution we adjust the parameter values as I.e. we are assuming that the conditional mean of the parameter is a linear function of the summary statistics, but all other moments remain the same. Least squares gives an estimate of the posterior mean
30
1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics of an ABC methodology 2. Algorithm of an ABC inference 3. Limitations of the ABC approach 4. Typical ABC run 4. Present work 1. Compare the ABC algorithm with MCMC 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index
31
Pop anc Pop 2 Pop 1 t One simple case: 4.Present Work m1m1 m2m2 Ne 1 Ne 2 Ne anc tev 1 6 parameters to be estimated + (mutation rate)
32
Summary Statistics used Sequence Data: 1.mean of pairwise differences a)in each population b)both populations joined together 2.number of segregating sites a)in each population b)both populations joined together 3.number of haplotypes a)in each population b)both populations joined together 4.Present Work
33
Simulated “real” data and Prior information 0 10000 1000 500 0 10000 0 0.05 0 5000 0.01 Ne 1 Ne 2 Ne anc TevMig 2 Mig 1 “real” data prior distribution ABC method MCMC method 4.Present Work
34
Ne 1 – no migration: sim1sim3sim2sim4sim5 sim6sim8sim7sim9sim10 4.Present Work
35
Ne 2 – no migration: sim1sim3sim2sim4sim5 sim6sim8sim7sim9sim10 4.Present Work
36
Ne anc – no migration: sim1sim3sim2sim4sim5 sim6sim8sim7sim9sim10 4.Present Work
37
Te 1 – no migration: sim1sim3sim2sim4sim5 sim6sim8sim7sim9sim10 4.Present Work
38
ABC vs MCMC: Data 1 (no migration); Simulation 7: Data 2 (migration = 0.01); Simulation 9: Ne 1 Ne 2 Ne anc Tev Ne 1 Ne 2 Ne anc TevMig 2 Mig 1 4.Present Work
39
ABC vs MCMC (500 000 iter, tol=0.02): Ne1Ne2NeancMig1Mig2Tev ABC3.857 0.899 2.529 0.653 3.956 0.532 ----3.532 0.695 MCMC1.153 0.505 0.724 0.295 3.594 0.602 ----1.567 0.429 Priors24.33- - ----- - Ne1Ne2NeancMig1Mig2Tev ABC8.242 2.194 10.41 2.240 19.15 0.604 3.977 0.316 3.986 0.259 27.17 0.904 MCMC4.196 1.132 5.693 1.839 18.85 0.602 2.760 0.391 3.031 0.483 26.54 1.510 Priors24.33- - -4.33- -24.33- MISE: No migration MISE: Migration = 0.01 4.Present Work
40
1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics of an ABC methodology 2. Algorithm of an ABC inference 3. Limitations of the ABC approach 4. Typical ABC run 4. Present work 1. Compare the ABC algorithm with MCMC 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index
41
Summary Statistics used Sequence Data: 1.mean of pairwise differences a) in each population b) both populations joined together 2.number of segregating sites a)in each population b)both populations joined together 3.number of haplotypes a)in each population b)both populations joined together 4.variance of pairwise differences a)in each population b)both populations joined together 5.Shanon’s index a)in each population b)both populations joined together 6.number of singletons a)in each population b)both populations joined together 4.Present Work
42
Simulated “real” data and Prior information 0 10000 1000 500 0 10000 0 0.05 0 5000 0.01 Ne 1 Ne 2 Ne anc TevMig 2 Mig 1 “real” data prior distribution standard previous + Shanon’s previous + var pairwise dif previous + singletons MCMC based method 4.Present Work
43
Summary Statistics (500 000 iter, tol=0.02): Data 1 (no migration); Simulation 7: Data 2 (migration = 0.01); Simulation 9: Ne 1 Ne 2 Ne anc Tev Ne 1 Ne 2 Ne anc TevMig 2 Mig 1 4.Present Work
44
Summary Statistics (7 000 000 iter, tol=0.02): Data 1 (no migration); Simulation 7: Data 2 (migration = 0.01); Simulation 9: Ne 1 Ne 2 Ne anc Tev Ne 1 Ne 2 Ne anc TevMig 2 Mig 1 4.Present Work
45
Summary Statistics (7 000 000 iter, tol=0.02): Ne1Ne2NeancMig1Mig2Tev ABC I3.861 0.903 2.548 0.654 3.992 0.525 ----3.548 0.702 ABC II3.538 0.857 2.353 0.614 4.007 0.552 ----3.324 0.615 ABC III2.160 0.869 1.818 0.577 4.241 0.700 ----4.266 0.949 ABC IV2.205 0.721 1.606 0.548 4.536 0.700 ----4.698 0.989 MCMC1.153 0.505 0.724 0.295 3.594 0.602 ----1.567 0.429 MISE: No migration MISE: Migration = 0.01 Ne1Ne2NeancMig1Mig2Tev ABC I8.216 2.170 10.31 2.204 19.03 0.617 3.925 0.318 4.000 0.276 27.05 0.907 ABC II7.021 2.182 9.664 2.371 19.40 0.540 3.600 0.270 3.755 0.322 28.42 0.951 ABC III6.285 1.765 7.425 1.415 19.69 0.612 3.435 0.312 3.308 0.349 29.67 1.056 ABC IV6.585 2.026 6.564 1.218 19.38 0.587 3.410 0.313 3.329 0.334 28.74 0.845 MCMC4.196 1.132 5.693 1.839 18.85 0.602 2.760 0.391 3.031 0.483 26.54 1.510 4.Present Work
46
Summary Statistics (7 000 000 iter, tol=0.02): Ne1Ne2NeancMig1Mig2Tev ABC I0.490.500.27--0.65 ABC II0.510.520.27--0.67 ABC III0.600.590.30--0.67 ABC IV0.55 0.27--0.63 Adjusted R 2 : No migration Adjusted R 2 : Migration = 0.01 Ne1Ne2NeancMig1Mig2Tev ABC I0.23 0.010.08 0.02 ABC II0.250.240.010.090.100.02 ABC III0.30 0.010.11 0.01 ABC IV0.26 0.010.11 0.01 4.Present Work
47
1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics of an ABC methodology 2. Algorithm of an ABC inference 3. Limitations of the ABC approach 4. Typical ABC run 4. Present work 1. Compare the ABC algorithm with MCMC 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index
48
4.Three populations model m1m1 m2m2 Ne 1 Ne 3 Ne anc1 tev 2 11 parameters to be estimated + topology + (mutation rate) Pop anc1 Pop 2 Pop 1 Pop anc2 Pop 3 tev 1 Ne anc2 Ne 2 m3m3 m anc
49
Simulated “real” data and Prior information 0 10000 1000 0 10000 0 0.05 0.01 Ne 1 Ne 2 Ne 3 Mig 2 Mig 1 free top fixed top 500 0 0.05 0 5000 0.01 Tev 2 Mig anc Mig 3 1500 0 5000 Tev 1 0 10000 1000 0 10000 Ne anc2 Ne anc1 4.Present Work
50
Three Populations model (no migration): Ne 1 Ne 2 Ne 3 Tev 2 Tev 1 Ne anc2 Ne anc1 Topology: Data 1 (no migration); Simulation 7: (2,3)1) 4.Present Work
51
Three Populations model (migration = 0.01): Data 2 (migration = 0.01); Simulation 6: Topology: (1,2)3) Ne 1 Ne 2 Ne 3 Mig 2 Mig 1 Tev 2 Mig anc Mig 3 Tev 1 Ne anc2 Ne anc1 4.Present Work
52
Three Populations model (500 000 iter, tol=0.02): MISE Ne Ne*Neanc2Neanc1Mig Mig*MigancTev2Tev1 Free5.7005.4383.7394.7810.886----0.4418.39 Fixed5.4675.2823.8154.5110.264----0.559.59 No migration: Migration = 0.01: MISE Ne Ne*Neanc2Neanc1Mig Mig*MigancTev2Tev1 Free5.4155.5214.3394.8640.8374.184.034.114.320.5123.32 Fixed5.3825.4564.3275.0070.8314.284.184.124.340.5423.60 Topology Free0.76 0.05 Prior0.33- Topology Free0.41 0.02 Prior0.33- 4.Present Work
53
Conclusions: ABC up to 2 orders of magnitude faster for single locus ABC modes are similar to MCMC but overall precision is lower No substantial improvement with more summary statistics No substantial improvement with more iterations ABC is able to consider more complex scenarios, but ability to infer parameters is reduced when considering migration
54
1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics of an ABC methodology 2. Algorithm of an ABC inference 3. Limitations of the ABC approach 4. Typical ABC run 4. Present work 1. Compare the ABC algorithm with a MCMC one 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index
55
The user-friendly version of the program (initial stage) Features of the program Use of heredity scalars for each locus Use different types of DNA data at the same time (Microsatellite and DNA sequence) Use an unlimited number of populations within an IM model Use of different combinations of 7 different summary statistics for each DNA data type Freeware and source code available (soon) 4.Present Work
56
1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics of an ABC methodology 2. Algorithm of an ABC inference 3. Limitations of the ABC approach 4. Typical ABC run 4. Present work 1. Compare the ABC algorithm with a MCMC one 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index
57
5.Future Developments Current Goals Currently addressing the method to a published data set (Won & Hey, 2005) Continue to improve the accuracy of ABC (e.g. identify better summary statistics) Obtain better estimations for MISE (e.g. using more simulated ‘real’ data) Future Goals Add recombination Create a user-friendly interface Use a variable migration rate through time Improve ABC: sequential method non-linear regression
58
Acknowledgements I would like to acknowledge David Balding for helpful discussion on the methods used. And also a special thanks to Mark Beaumont for advice and comments on the work. Support for this work was provided by EPSRC.
59
joao.lopes@rdg.ac.uk http://www.rdg.ac.uk/~sar05sal
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.