Presentation is loading. Please wait.

Presentation is loading. Please wait.

The use of short-read next generation sequences to recover the evolutionary histories in multi-individual samples Systematic biology presentation Yuantong.

Similar presentations


Presentation on theme: "The use of short-read next generation sequences to recover the evolutionary histories in multi-individual samples Systematic biology presentation Yuantong."— Presentation transcript:

1 The use of short-read next generation sequences to recover the evolutionary histories in multi-individual samples Systematic biology presentation Yuantong Ding Dec. 6

2 Outline Background Workflow Sequence comparison Tree comparison Summary & future work

3 Can short-reads successfully recover phylogeny? Next generation sequencing (NGS) Low-cost High-throughput Short-read Multi individual sample Short-reads Reconstructed sequence phylogeny ? BackgroundWorkflowSequence comparison Tree comparisonSummary

4 Simulation process Original genealogyOriginal haplotypesNJ tree Simulated by SerialSimCoal with coalescent model Consensus sequence Short-reads Simulated by MetaSim with 454 error model Mapping Alignment built by SHRiMP and SSAHA Reconstructed haplotypes Haplotypes reconstructed by ShoRAH NJ tree built by PAUP* Compare tree topology Compare number and similarity of haplotypes BackgroundWorkflowSequence comparison Tree comparisonSummary

5 6 parameters used Effective population size N Sample size n Mutation rate μ Sequence length l NnμlSr_NSr_l 300010 5.00E-05 12005000200 500020 1.00E-05 200010000400 1000040 5.00E-06 500030000— Number of short-reads Sr_N Length of short-reads Sr_l BackgroundWorkflowSequence comparison Tree comparisonSummary All 486 combination of these parameters were simulated

6 Different numbers of haplotypes BackgroundWorkflowSequence comparison Tree comparisonSummary

7 Similar sequences BackgroundWorkflowSequence comparison Tree comparisonSummary

8 Can reconstructed haplotypes still capture some phylogenetic information? Different haplotypes number  impossible to recover the true phylogenetic trees Assuming true haplotypes number of the sample is known Select the most similar reconstructed sequences to build phylogeny tree Calculate symmetric difference BackgroundWorkflowSequence comparison Tree comparisonSummary Cluster (k-mean) reconstructed haplotypes to n groups Build tree with consensus sequence of each group Calculate tree balance statistics

9 Method for tree comparison A B C B A C (BC) (ABC) (AC) (ABC) symmetric difference = 2 Symmetric difference for rooted and labeled trees Tree balance statistics for rooted and unlabeled trees A N i is the internal nodes number between tip i and root e.g. i=A, N A = 2, Ñ = (2+2+2+3+3)/5=2.4

10 Different topology of most similar sequence tree BackgroundWorkflowSequence comparison Tree comparisonSummary

11 Different balance statistics of k- mean cluster tree BackgroundWorkflowSequence comparison Tree comparisonSummary nN_barI_c orgrecPorgrecP 104.84.70.0020.740.670.0004 207.56.99.2e-090.570.471.52e-10 4010.69.61.2e-080.400.331.94e-09

12 Summary & future work Reconstructed haplotypes typically failed to estimate the correct number of haplotypes Consequently, it was not possible to recover the true phylogenetic trees. Even assuming we know the true haplotype number, the chance to recover the true tree topology is still small. Other reconstruction method, use multiple reference sequence when mapping…

13 Reference Anderson, C.N.K., Ramakrishnan, U. et al.2005. Serial SimCoal: A population genetic model for data from multiple populations and points in time.. Bioinformatics 21, 1733-1734. Johnson, P.L., Slatkin, M., 2006. Inference of population genetic parameters in metagenomics: a clean look at messy data. Genome Res 16, 1320-1327. Richter, D.C., Ott, F. et al. 2008. MetaSim—A Sequencing Simulator for Genomics and Metagenomics. PLoS ONE 3, 3373. Suzuki, S., Ono, N., Furusawa, C., Ying, B.-W., Yomo, T., 2011. Comparison of Sequence Reads Obtained from Three Next-Generation Sequencing Platforms. PLoS ONE 6, e19534. Zagordi, O., Bhattacharya, A. et al. 2011. ShoRAH: estimating the genetic diversity of a mixed sample from next-generation sequencing data. BMC Bioinformatics 12, 119 Metei D., Misko D,. et al. 2011 SHRiMP2: Sensitive yet Practical Short Read Mapping. Bioinformatics 27, 7 Ning Z, Cox AJ and Mullikin JC. 2001. SSAHA: a fast search method for large DNA databases. Genome research, 1725-9


Download ppt "The use of short-read next generation sequences to recover the evolutionary histories in multi-individual samples Systematic biology presentation Yuantong."

Similar presentations


Ads by Google