Presentation is loading. Please wait.

Presentation is loading. Please wait.

The iPlant Tree of Life Project and Toolkit: Building a Cyberinfrastructure for Plant Science Research Naim Matasci The iPlant Collaborative Evolution.

Similar presentations


Presentation on theme: "The iPlant Tree of Life Project and Toolkit: Building a Cyberinfrastructure for Plant Science Research Naim Matasci The iPlant Collaborative Evolution."— Presentation transcript:

1 The iPlant Tree of Life Project and Toolkit: Building a Cyberinfrastructure for Plant Science Research Naim Matasci The iPlant Collaborative Evolution 2011 Jun 17-21, 2011

2 What is iPlant?

3 Discovery Environment NEW RELEASE COMING SOON! http://www.iplantcollaborative.org/discovery-environment-preview-access

4 4

5 Physical Infrastructure Computation 63K cores cluster 20K cores cluster 1 TB RAM Storage 2 PB 20 PB archive

6 Cloud Storage Store, access and share large datasets Multiple points of entry: web interface, mounted FS, API Free and secure AVAILABLE NOW! http://www.iplantcollaborative.org/about/policies/data-set-hosting

7 Cloud Computing Virtual Machines – Up to 4 cores, 32 GB RAM, 100 GB dedicated disk – Run any x86-compatible OS (even Windows) – Persistent or on-demand – Log in via SSH or secure VNC Use Cases – Internet-enabled Servers – Database management appliances – Virtual desktops – … The sky is the limit! AVAILABLE NOW! http://www.iplantcollaborative.org/atmosphere-preview

8 Consumer Applications 8 iPlant's CI

9 iPlant Tree of Life Grand Challange Large phylogenetic inference Building a tree of life for up to 500,000 green plants Tree Visualization Scalable visualization for small to large trees Data Assembly and Integration Acquisition, organization and processing the data Taxonomic Intelligence Sorting out different names for the same species Tree Reconciliation Resolving discordant gene and species trees Trait Evolution Using trees to understand how traits evolved

10 BIG TREES To optimize existing methods to construct phylogenetic trees in the order of 500K taxa.

11 Big Trees NINJA/WINDJAMMER (Travis Wheeler) Neighbor-Joining implementation that can analyze > 200K species Six day run time reduced 32-fold to 4.5 hours for 220K species data set Two/three day run time reduced 1,800-folds to 2 minutes for distance matrix calculation on 220K set RAxML-Light (Alexandros Stamatakis) Large Scale Maximum Likelihood implementation 55K Tree published (Stephen A. Smith et al., “Understanding angiosperm diversification using small and large phylogenetic trees,” American Journal of Botany 98, no. 3 (2011): 404 -414) AVAILABLE NOW!

12 TREE VISUALIZATION To develop an application for viewing, analyzing and exploring large phylogenetic trees.

13 Tree Visualization > 500K Taxa Fast Web based, platform independent Semantic zooming Metadata driven display of information

14 iPlant Tree Viewer Prototype AVAILABLE NOW! http://portnoy.iplantcollaborative.org/

15 1KP Collaboration (1KP) – To support the data analysis of the Thousand Plant Transcriptomes Project

16 1KP unexplored territory N(genes) dozens of species completed genomes N(species) dozens of genes PCR in 10 4 species

17 Broad phylogenetic coverage algaenon-floweringflowering (angiosperm) on role of polyploidy in Darwin’s “abominable mystery” Phylogenomics of 1000 species across plant taxa

18 TREE RECONCILIATION To reconcile the evolutionary history of genes and species.

19 Gene family data courtesy John Bowers Tree Reconciliation

20

21 TAXONOMIC NAME RESOLUTION Collaboration (BIEN) - To unify and resolve synonymous, erroneous, or other conflicting taxonomic names.

22 Taxonomic uncertainty 1.Non-existent names Misspellings Contamination Annotations Morphospecies Digitization issues (frame shifts, character encoding)Lexical variants (digitization conventions) 2.Synonymy Nomenclatural synonyms Taxonomic synonyms / concepts 3.Misidentifications, incomplete identifications

23

24 AS SEEN IN NATURE! AVAILABLE NOW!

25

26 Taxonomic Name Resolution Service Computer assisted standardization of plant names Corrects spelling errors and alternative spellings to a standard list of names Convert out-of-date names to currently accepted names

27 TRAIT EVOLUTION To develop an infrastructure for downstream analysis of large trees.

28 Trait Evolution Toolkit to study the evolution of traits of interest on very large phylogenies – Diversification – Biogeographic patterns – Adaptation – Co-evolution – …

29 Current analyses (Proof of concept) Phylogenetically Independent Contrasts (Felsenstein 1985) Continuous Ancestral Character Estimation (Schulter et al. 1997, Paradis 2004) Discrete Ancestral Character Estimation (Pagel 1994, Paradis 2004)

30 Community Integrated (2 ½ Days Workshop) EUtils Lopper RAxML Ninja Phyml Muscle PHYLIP VCF to GFF script LRmaqqtl FASTX quality stats FASTX quality boxplot FASTX nucleotide distribution Cuffcompare ERMINEJ progressiveMauve iPlantBorda (mlpy) iPlantCanberra (mlpy) vbay MECPM OUCH Picante Ontologize BOWTIE BWA TopHat SHRiMP Cuffdiff GNU Core Text utilities GeneMania SRA import PARS PL DTT BBC biclustering

31 MY-PLANT.ORG To easily share information and research, collaborate, and stay on top of the latest news in the field.

32 Collaborative Tool AVAILABLE NOW! NEW AND IMPROVED! http://my-plant.org/

33

34 http://www.iplantcollaborative.org


Download ppt "The iPlant Tree of Life Project and Toolkit: Building a Cyberinfrastructure for Plant Science Research Naim Matasci The iPlant Collaborative Evolution."

Similar presentations


Ads by Google