Presentation is loading. Please wait.

Presentation is loading. Please wait.

ABC The method: practical overview. 1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics.

Similar presentations


Presentation on theme: "ABC The method: practical overview. 1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics."— Presentation transcript:

1 ABC The method: practical overview

2 1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics of an ABC methodology 2. Algorithm of an ABC inference 3. Limitations of the ABC approach 4. Typical ABC run 4. Present work 1. Compare the ABC algorithm with MCMC 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index

3 1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics of an ABC methodology 2. Algorithm of an ABC inference 3. Limitations of the ABC approach 4. Typical ABC run 4. Present work 1. Compare the ABC algorithm with MCMC 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index

4 1.Application of ABC in population genetics Pop anc Pop 3 Pop 4 Pop 2 Pop 1

5 1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics of an ABC methodology 2. Algorithm of an ABC inference 3. Limitations of the ABC approach 4. Typical ABC run 4. Present work 1. Compare the ABC algorithm with MCMC 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index

6  Two processes are usually considered important in determining population structure: - Gene flow; - Population splitting.  Most often these processes are modelled and inferred separately;  Recent advances by Nielsen and Wakeley (2001) and Hey and Nielsen (2004) for two-population scenario using Markov Chain Monte Carlo (MCMC) can study both processes at the same time;  An Approximate Bayesian Computation (ABC) method developed by (Beaumont, 2006) deals with the same problem but in a three-population scenario. The idea is to avoid problems associated with MCMC such as poor-mixing and long convergence times. But it relies in a couple of approximations. The aim of this study is to see how good these approximations are. 2.Motivation for the application of ABC

7 Wakeley, Hey (1997, Genetics) - developed an algorithm to estimate historic demographic parameters. Nielsen, Wakeley (2001, Genetics) - developed a MCMC algorithm to infer about demographic parameters in a “Isolation with Migration” model. Hey, Nielsen (2004, Genetics) - presents the IM program (software that uses the MCMC algorithm previously developed). Hey et al (2004, Mol. Ecol.) - introduce changes in IM software (HapSTR data can be used). Won, Hey (2005, Mol. Biol. Evol.) - presents a case study in 3 populations of chimpanzees. Hey (2005, PLoS. Biol.) – the peopling of the Americas. Introduce changes in IM software (founder population size can be inferred). Background using MCMC: 2.Motivation for the application of ABC

8 Background using ABC: 2.Motivation for the application of ABC Tavaré et al. (1997, Genetics) – presented a simulation based-algorithm to infer about specific demographic parameters Pritchard et al. (1999, MBE) - introduce the first ABC approach with a rejection method step to estimate demographic parameters. Beaumont et al. (2002, Genetics) – introduce a regression method within a ABC framework to estimate demographic parameters. Marjoram et al (2003, PNAS) – uses MCMC without likelihoods within an ABC framework. Beaumont (2006, “Simulation, Genetics, and Human Prehistory”) - uses regression based ABC to estimate demographic parameters within a “Isolation with Migration” model for microsatellites in three populations. Hickerson et al (2006, in press) – compares ABC with IM in two-population studies for sequence data.

9  Estoup and Clegg (2003, Mol. Evol.)  Plagnol and Tavare (2003, “Monte Carlo and Quasi-Monte Carlo Methods 2002”)  Estoup et al. (2004, Evolution)  Tallmon et al. (2004, Genetics)  Excoffier et al. (2005, Genetics) - introduce the studies in admixture events  Hamilton et al. (2005, Genetics) - introduced WED (Weighted Euclidian distance)  Tanaka et al. (2006, Genetics) - applied to disease transmission  Sisson et al. (2006, under submission) - introduced a Sequential ABC approach (SABC) Examples of ABC use after (Beaumont, 2002): 2.Motivation for the application of ABC

10 1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics of an ABC methodology 2. Algorithm of an ABC inference 3. Limitations of the ABC approach 4. Typical ABC run 4. Present work 1. Compare the ABC algorithm with MCMC 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index

11 Replace the data with summary statistics: 2.ABC approach 2.Characteristics of an ABC methodology Get the posterior distribution by sampling values from it: 1.Simulate samples  i, D i from the joint density p( ,D): 1.First sample from the prior:  i ~ p(  ) 2.Then simulate the data, given  i : D i ~ p(D |  i ) 2.The posterior distribution, p (  | D ) = p ( D,  ) / p ( D ), for any given D, can be estimate by the proportion of all simulated points that correspond to that particular D and  divided by the proportion of points corresponding to D (ignoring  ).  Summarize a large amount of data into a few representative values  By replacing the data with summary statistics, it is easier to decide how ‘similar’ data sets are to each other.

12 1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Bayesian inference on population genetics 2. Characteristics of an ABC methodology 3. Algorithm of an ABC inference 4. Limitations of the ABC approach 5. Typical ABC run 4. Present work 1. Compare the ABC algorithm with MCMC 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index

13 2.ABC approach 2.Algorithm of an ABC inference     SummStats, S Parameter,  Joint distribution (S,  ) Set of priors (  Get summary statistics (S) Obtained genetic data s’s’ in (Nordborg, 2001)

14 By extracting the points near the real data set we obtain the posterior: 2.Algorithm of an ABC inference 2.ABC approach SummStats, S Parameter,   Joint distribution (S,  ) Posterior distribution – p(   | S=s’) p s’s’

15 Mode Mean Posterior distribution – p(   | S=s’) p Point estimate of parameter  1 2.ABC approach 2.Algorithm of an ABC inference Credible Interval

16 1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics of an ABC methodology 2. Algorithm of an ABC inference 3. Limitations of the ABC approach 4. Typical ABC run 4. Present work 1. Compare the ABC algorithm with MCMC 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index

17 2.ABC approach Natural limitation due to lack of information in data sets Limitation on the number of summary statistics used Limitation on the calculation of summary statistic (time consuming) Limitation on the time consumption of the simulation step 3.Limitations

18 2.ABC approach Natural limitation due to lack of information in data sets Limitation on the number of summary statistics used Limitation on the calculation of summary statistic (time consuming) Limitation on the time consumption of the simulation step 3.Limitations

19 3.ABC approach Limitation on the number of summary statistics used  S s’ ( , S = s’) s’ ( , S 1 = s’ 1, S 2 = s’ 2 ) s’ 2 s’ 1  S1S1 S2S2 Summary Statistics = 1 Summary Statistics = 2

20 2.ABC approach Natural limitation due to lack of information in data sets Limitation on the number of summary statistics used Limitation on the calculation of summary statistic (time consuming) Limitation on the time consumption of the simulation step 3.Limitations

21 1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Bayesian inference on population genetics 2. Algorithm of an ABC inference 3. Limitations of the ABC approach 4. Typical ABC run 4. Present work 1. Compare the ABC algorithm with MCMC 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index

22 3.ABC approach 2.Typical ABC run Compute distance between “real” data and simulated data Retain simulated data closest to “real” data Estimate parameters from the posterior distributions obtained from the retained simulated data yes no Step1 - simulationStep2 – getting posterior distribution Step3 - estimation a)Choosing the priors b)Choosing the summary statistics c)Choosing a “rejection” method of the simulated data

23 3.ABC approach 2.Typical ABC run Using posterior distribution information in priors distributions: Prior distribution – p(   ) p

24 3.ABC approach 2.Typical ABC run Using posterior distribution information in priors distributions: Prior distribution – p(   ) p Using kernel estimator

25 3.ABC approach 2.Typical ABC run Using posterior distribution information in priors distributions: p Randomly chose points from the posterior distribution 1’1’ Prior (  1 =  1 ’ ) Posterior(  1 =  1 ’ ) weight Simulation parameter Prior distribution – p(   )

26 3.ABC approach 2.Typical ABC run Rejection method (Pritchard et al, 1999): SummStats, S Parameter,   tolerance s’ – “real” data Posterior distribution – p(  | S)

27 3.ABC approach 2.Typical ABC run Local Linear Multiple Regression adjustment and Weighting (Beaumont et al, 2002): SummStats, S Parameter,  s’ - “real” data Posterior distribution – p(  | S) Weighting Regression

28 where Epanechnikov kernel We want to minimize 3.ABC approach 2.Typical ABC run Spherical acceptance region Local weighting Linear multiple regression: Correlation coefficients vector Vector of standardized summstats E [P(  |S=s)] Least square error

29 3.ABC approach 2.Typical ABC run To obtain samples from the posterior distribution we adjust the parameter values as I.e. we are assuming that the conditional mean of the parameter is a linear function of the summary statistics, but all other moments remain the same. Least squares gives an estimate of the posterior mean

30 1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics of an ABC methodology 2. Algorithm of an ABC inference 3. Limitations of the ABC approach 4. Typical ABC run 4. Present work 1. Compare the ABC algorithm with MCMC 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index

31 Pop anc Pop 2 Pop 1 t One simple case: 4.Present Work m1m1 m2m2 Ne 1 Ne 2 Ne anc tev 1   6 parameters to be estimated +  (mutation rate)

32 Summary Statistics used Sequence Data: 1.mean of pairwise differences a)in each population b)both populations joined together 2.number of segregating sites a)in each population b)both populations joined together 3.number of haplotypes a)in each population b)both populations joined together 4.Present Work

33 Simulated “real” data and Prior information 0 10000 1000 500 0 10000 0 0.05 0 5000 0.01 Ne 1 Ne 2 Ne anc TevMig 2 Mig 1 “real” data prior distribution ABC method MCMC method 4.Present Work

34 Ne 1 – no migration: sim1sim3sim2sim4sim5 sim6sim8sim7sim9sim10 4.Present Work

35 Ne 2 – no migration: sim1sim3sim2sim4sim5 sim6sim8sim7sim9sim10 4.Present Work

36 Ne anc – no migration: sim1sim3sim2sim4sim5 sim6sim8sim7sim9sim10 4.Present Work

37 Te 1 – no migration: sim1sim3sim2sim4sim5 sim6sim8sim7sim9sim10 4.Present Work

38 ABC vs MCMC: Data 1 (no migration); Simulation 7: Data 2 (migration = 0.01); Simulation 9: Ne 1 Ne 2 Ne anc Tev Ne 1 Ne 2 Ne anc TevMig 2 Mig 1 4.Present Work

39 ABC vs MCMC (500 000 iter, tol=0.02): Ne1Ne2NeancMig1Mig2Tev ABC3.857 0.899 2.529 0.653 3.956 0.532 ----3.532 0.695 MCMC1.153 0.505 0.724 0.295 3.594 0.602 ----1.567 0.429 Priors24.33- - ----- - Ne1Ne2NeancMig1Mig2Tev ABC8.242 2.194 10.41 2.240 19.15 0.604 3.977 0.316 3.986 0.259 27.17 0.904 MCMC4.196 1.132 5.693 1.839 18.85 0.602 2.760 0.391 3.031 0.483 26.54 1.510 Priors24.33- - -4.33- -24.33- MISE: No migration MISE: Migration = 0.01 4.Present Work

40 1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics of an ABC methodology 2. Algorithm of an ABC inference 3. Limitations of the ABC approach 4. Typical ABC run 4. Present work 1. Compare the ABC algorithm with MCMC 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index

41 Summary Statistics used Sequence Data: 1.mean of pairwise differences a) in each population b) both populations joined together 2.number of segregating sites a)in each population b)both populations joined together 3.number of haplotypes a)in each population b)both populations joined together 4.variance of pairwise differences a)in each population b)both populations joined together 5.Shanon’s index a)in each population b)both populations joined together 6.number of singletons a)in each population b)both populations joined together 4.Present Work

42 Simulated “real” data and Prior information 0 10000 1000 500 0 10000 0 0.05 0 5000 0.01 Ne 1 Ne 2 Ne anc TevMig 2 Mig 1 “real” data prior distribution standard previous + Shanon’s previous + var pairwise dif previous + singletons MCMC based method 4.Present Work

43 Summary Statistics (500 000 iter, tol=0.02): Data 1 (no migration); Simulation 7: Data 2 (migration = 0.01); Simulation 9: Ne 1 Ne 2 Ne anc Tev Ne 1 Ne 2 Ne anc TevMig 2 Mig 1 4.Present Work

44 Summary Statistics (7 000 000 iter, tol=0.02): Data 1 (no migration); Simulation 7: Data 2 (migration = 0.01); Simulation 9: Ne 1 Ne 2 Ne anc Tev Ne 1 Ne 2 Ne anc TevMig 2 Mig 1 4.Present Work

45 Summary Statistics (7 000 000 iter, tol=0.02): Ne1Ne2NeancMig1Mig2Tev ABC I3.861 0.903 2.548 0.654 3.992 0.525 ----3.548 0.702 ABC II3.538 0.857 2.353 0.614 4.007 0.552 ----3.324 0.615 ABC III2.160 0.869 1.818 0.577 4.241 0.700 ----4.266 0.949 ABC IV2.205 0.721 1.606 0.548 4.536 0.700 ----4.698 0.989 MCMC1.153 0.505 0.724 0.295 3.594 0.602 ----1.567 0.429 MISE: No migration MISE: Migration = 0.01 Ne1Ne2NeancMig1Mig2Tev ABC I8.216 2.170 10.31 2.204 19.03 0.617 3.925 0.318 4.000 0.276 27.05 0.907 ABC II7.021 2.182 9.664 2.371 19.40 0.540 3.600 0.270 3.755 0.322 28.42 0.951 ABC III6.285 1.765 7.425 1.415 19.69 0.612 3.435 0.312 3.308 0.349 29.67 1.056 ABC IV6.585 2.026 6.564 1.218 19.38 0.587 3.410 0.313 3.329 0.334 28.74 0.845 MCMC4.196 1.132 5.693 1.839 18.85 0.602 2.760 0.391 3.031 0.483 26.54 1.510 4.Present Work

46 Summary Statistics (7 000 000 iter, tol=0.02): Ne1Ne2NeancMig1Mig2Tev ABC I0.490.500.27--0.65 ABC II0.510.520.27--0.67 ABC III0.600.590.30--0.67 ABC IV0.55 0.27--0.63 Adjusted R 2 : No migration Adjusted R 2 : Migration = 0.01 Ne1Ne2NeancMig1Mig2Tev ABC I0.23 0.010.08 0.02 ABC II0.250.240.010.090.100.02 ABC III0.30 0.010.11 0.01 ABC IV0.26 0.010.11 0.01 4.Present Work

47 1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics of an ABC methodology 2. Algorithm of an ABC inference 3. Limitations of the ABC approach 4. Typical ABC run 4. Present work 1. Compare the ABC algorithm with MCMC 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index

48 4.Three populations model m1m1 m2m2 Ne 1 Ne 3 Ne anc1 tev 2   11 parameters to be estimated + topology +  (mutation rate) Pop anc1 Pop 2 Pop 1 Pop anc2 Pop 3 tev 1 Ne anc2 Ne 2 m3m3 m anc

49 Simulated “real” data and Prior information 0 10000 1000 0 10000 0 0.05 0.01 Ne 1 Ne 2 Ne 3 Mig 2 Mig 1 free top fixed top 500 0 0.05 0 5000 0.01 Tev 2 Mig anc Mig 3 1500 0 5000 Tev 1 0 10000 1000 0 10000 Ne anc2 Ne anc1 4.Present Work

50 Three Populations model (no migration): Ne 1 Ne 2 Ne 3 Tev 2 Tev 1 Ne anc2 Ne anc1 Topology: Data 1 (no migration); Simulation 7: (2,3)1) 4.Present Work

51 Three Populations model (migration = 0.01): Data 2 (migration = 0.01); Simulation 6: Topology: (1,2)3) Ne 1 Ne 2 Ne 3 Mig 2 Mig 1 Tev 2 Mig anc Mig 3 Tev 1 Ne anc2 Ne anc1 4.Present Work

52 Three Populations model (500 000 iter, tol=0.02): MISE Ne Ne*Neanc2Neanc1Mig Mig*MigancTev2Tev1 Free5.7005.4383.7394.7810.886----0.4418.39 Fixed5.4675.2823.8154.5110.264----0.559.59 No migration: Migration = 0.01: MISE Ne Ne*Neanc2Neanc1Mig Mig*MigancTev2Tev1 Free5.4155.5214.3394.8640.8374.184.034.114.320.5123.32 Fixed5.3825.4564.3275.0070.8314.284.184.124.340.5423.60 Topology Free0.76 0.05 Prior0.33- Topology Free0.41 0.02 Prior0.33- 4.Present Work

53 Conclusions: ABC up to 2 orders of magnitude faster for single locus ABC modes are similar to MCMC but overall precision is lower No substantial improvement with more summary statistics No substantial improvement with more iterations ABC is able to consider more complex scenarios, but ability to infer parameters is reduced when considering migration

54 1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics of an ABC methodology 2. Algorithm of an ABC inference 3. Limitations of the ABC approach 4. Typical ABC run 4. Present work 1. Compare the ABC algorithm with a MCMC one 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index

55 The user-friendly version of the program (initial stage) Features of the program  Use of heredity scalars for each locus  Use different types of DNA data at the same time (Microsatellite and DNA sequence)  Use an unlimited number of populations within an IM model  Use of different combinations of 7 different summary statistics for each DNA data type Freeware and source code available (soon) 4.Present Work

56 1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics of an ABC methodology 2. Algorithm of an ABC inference 3. Limitations of the ABC approach 4. Typical ABC run 4. Present work 1. Compare the ABC algorithm with a MCMC one 2. Study the use of different summary statistics 3. Study the use of ABC in more complex scenario 4. “State of art” of the software 5. Future developments Index

57 5.Future Developments Current Goals  Currently addressing the method to a published data set (Won & Hey, 2005)  Continue to improve the accuracy of ABC (e.g. identify better summary statistics)  Obtain better estimations for MISE (e.g. using more simulated ‘real’ data) Future Goals  Add recombination  Create a user-friendly interface  Use a variable migration rate through time  Improve ABC: sequential method non-linear regression

58 Acknowledgements I would like to acknowledge David Balding for helpful discussion on the methods used. And also a special thanks to Mark Beaumont for advice and comments on the work. Support for this work was provided by EPSRC.

59 joao.lopes@rdg.ac.uk http://www.rdg.ac.uk/~sar05sal


Download ppt "ABC The method: practical overview. 1. Applications of ABC in population genetics 2. Motivation for the application of ABC 3. ABC approach 1. Characteristics."

Similar presentations


Ads by Google