Presentation is loading. Please wait.

Presentation is loading. Please wait.

High-throughut comparative genomics 24th October 2013 Joe Parker, Queen Mary University London.

Similar presentations

Presentation on theme: "High-throughut comparative genomics 24th October 2013 Joe Parker, Queen Mary University London."— Presentation transcript:

1 High-throughut comparative genomics 24th October 2013 Joe Parker, Queen Mary University London

2 Topics 1.Introduction 2.Background: why phylogenomics? 3.Examples 4.Practice 5.Case study 6.On the horizon 7.Over the horizon

3 Aims Context of phylogenomics: Next- generation sequencing (NGS) Why phylogenomics? Practical analyses Future developments

4 1. Our Research

5 Lab Interests Ecology and evolution of traits Echolocation, sociality NGS data for population genetics and phylogenomics

6 Activities Phylogeny estimation/comparison Molecular correlates of evolution; –site substitutions, dN/dS, composition Simulation Dataset limitations (R-L): Joe Parker; GeorgiaTsagkogeorga; Kalina Davies; Steve Rossiter; Xiuguang Mao; Seb Bailey

7 2. Background

8 Next-generation sequencing

9 Why phylogenomics, not - genetics? Causes of discordant signal –Incomplete lineage sorting –Lateral transfer –Recombination –Introgression

10 Quantitative biology Multiple configurations Hyperparameters empirically investigated Determine sensitivity of results

11 Distributions Genome-scale data provides context Identify outliers Genes / taxa / trees Compare values across biological systems

12 Integration with ‘Omics Multiple databases Functional data Bibliographic information

13 3. Example studies

14 Tsakgogeorgia et al. (in press)

15 Salichos & Rokas (2013)

16 Backström et al. (2013)

17 Lindblad-Toh et al. (2011)

18 4. Practice

19 Source material Samples Storage Purification Library prep

20 Sequencing Genome –Sanger –Illumina –Pyro /454 –SOLiD –PacBio Transcriptome / RNA-seq –MyBAITS HiSeq / MiSeq IonTorrent

21 Infrastructure Desktop machines Computing clusters Grid systems Cloud-based computation

22 Assembly, Annotation Assembly –To reference (mapping) –De novo Annotation –By homology –De novo SOAPdenovo MAKER Velvet Bowtie / Cufflinks / Tophat Trinity

23 Alignment PRANK MUSCLE MAFFT Clustal

24 Phylogeny inference MrBayes RAxML BEAST MP-EST STAR

25 Phylogenetic analysis BEAST HYPHY PAML Pipelines LRT

26 5. Case study

27 Parker et al. (2013) De novo genomes: –four taxa –2,321 protein-coding loci –801,301 codons Published: –18 genomes ~69,000 simulated datasets ~3,500 cluster cores

28 Our pipeline for detecting genome-wide convergence








36 mean = 0.05

37 mean = 0.05mean = -0.01mean = -0.08

38 Development cycle Design Wireframe & specify tests Implement Alignment loadSequences() getSubstitutions() Phylogeny trimTaxa() getMRCA() DataSeries calculateECDF() randomise() Regression getResiduals() predictInterval() Review, refine & refactor

39 Parker et al. (2013)


41 6. On the horizon

42 Environmental metagenomics

43 Models of computation Cloud resources: Unlimited flexibility, finite time Development trade-off –Off-the-shelf –Bespoke Exploratory work –Real time genomic transects? Essential fundamental data missing from nearly every system; –Diversity; structure; substitution rates; dN/dS; recombination; dispersal; lateral transfer

44 Serialisation Process data remotely Freeze-dry objects, download to desktop Implement new methods directly on previously- analysed data

45 7. Over the horizon Real-time phylogenetics Field phylogenetics Alignment-free analyses

46 Conclusions Why phylogenomics? Practice Comparative approach Statistical context

47 Thanks Steve Rossiter 1, James Cotton 2, Elia Stupka 3 & Georgia Tsagkogeorga 1 1 School of Biological and Chemical Sciences, Queen Mary, University of London 2 Wellcome Trust Sanger Institute 3 Center for Translational Genomics and Bioinformatics, San Raffaele Institute, Milan Chris Walker & Dan Traynor Queen Mary GridPP High-throughput Cluster Chaz Mein & Anna Terry Barts and The London Genome Centre Mahesh Pancholi School of Biological and Chemical Sciences BBSRC (UK); Queen Mary, University of London

48 Resources My email: Joe Parker (Queen Mary University of London): Parker, J., Tsagkogeorga, G., Cotton, J.A., Liu, Y., Provero, P., Stupka, E. & Rossiter, S.J. (2013) Genome-wide signatures of convergent evolution in echolocating mammals. Nature 502(7470):228-231 doi:10.1038/nature12511. Tsagkogeorga, G., Parker, J., Stupka, E., Cotton, J.A., & Rossiter, S.J. (2013) Phylogenomic analyses elucidate evolutionary relationships of the bats (Chiroptera) Curr. Biol. in the press. Salichos, L. & Rokas, A. (2013) Inferring ancient divergences requires genes with strong phylogenetic signals. Nature 437:327- 331. doi:10.1038/nature12130 Backström, N., Zhang, Q. & Edwards, S.V. (2013) Evidence from a House Finch (Haemorhous mexicanus) Spleen Transcriptome for Adaptive Evolution and Biased Gene Conversion in Passerine Birds. MBE 30(5):1046-50. doi:10.1093/molbev/mst033 Lindblad-Toh, K., Garber, M., Zuk, O., Lin, M.F., Parker, B.J.,et al. (2011) A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478:476–482 doi:10.1038/nature10530 Degnan, J.H. & Rosenberg, N.A. (2009) Gene tree discordance, phylogenetic inference and the multispecies coalescent. TREE 24:(6)332-340 doi:10.1016/j.tree.2009.01.009 The Tree Of Life: RNA-seq For Everyone: Evo-Phylo: OpenHelix: Our blogs: (lab) and (Joe)

Download ppt "High-throughut comparative genomics 24th October 2013 Joe Parker, Queen Mary University London."

Similar presentations

Ads by Google