Presentation on theme: "Kayo Arima California Institute for Telecommunications and Information Technology (Calit2)-University of California, San Diego Division Cyber Metagenomics;"— Presentation transcript:
Kayo Arima California Institute for Telecommunications and Information Technology (Calit2)-University of California, San Diego Division Cyber Metagenomics; Challenge to See The Unseen Majority in The Ocean
Looking Back Nearly 4 Billion Years In the Evolution of Microbe Genomics Science Falkowski and Vargas 304 (5667): 58 Eukaryote has the nuclei. Prokaryotes has genes but no nuclear membrane.
Evolution is the Principle of Biological Systems: Most of Evolutionary Time Was in the Microbial World You Are Here Source: Carl Woese, et al Much of Genome Work Has Occurred in Animals
Two completely different approach to get microbial genomic information Microbial whole genomics Metagenomics Source: Karin Remington J. Craig Venter Institute Environmental sample DNA extraction Enz. digestion Shotgun sequencing Scaffold assembly Environmental sample Culture (grow) in lab Isolate the colony Culture the isolated colony DNA extraction Enz. digestion Shotgun sequencing Gene assembly
Down Side of Metagenomics Often fragmentary Often highly divergent Rarely any known activity No chromosomal placement No organism of origin Ab initio ORF predictions Huge data
Genomic Data Is Growing Rapidly, But Metagenomics Will Vastly Increase The Scale… GenBank Protein Data Bank Billion Bases! Total Data < 1TB 35,000 Structures
Full Genome Sequencing is Exploding: Most Sequenced Genomes are Bacterial Total 1665 Ongoing Genomes First Genome Genomes/ Year 2000 Total 422 Completed Genomes 90 Metagenomes
Marine Metagenomics Microbes account for more than 90% of ocean biomass, mediate all biochemical cycles in the oceans and are responsible for 98% of primary production in the sea. Metagenomics is a breakthrough sequencing approach to examine the open-space microbial species without the need for isolation and lab cultivation of individual species.
PI Larry Smarr Paul Gilna Ex. Dir. PI Larry Smarr
Marine Genome Sequencing Project Measuring the Genetic Diversity of Ocean Microbes Sorcerer II Data from this area has already reach to 10% of GenBank. The Entire Data Will Double Number of Proteins in Embank !
Sample Metadata from GOS Site Metadata –Location (lat/long, water depth) –Site characterization (finite list of types plus other) –Site description (free text) –Country Sampling Metadata –Sample collection date/time –Sampling depth –Conditions at time of sampling (e.g., stormy, surface temperature) –Sample physical/chemical measurements (T (oC), S (ppt), chl a (mg m-3), etc) –author Experimental Parameters –Filter size –Insert size
Flat File Server Farm W E B PORTAL Traditional User Response Request Dedicated Compute Farm (1000 CPUs) TeraGrid: Cyberinfrastructure Backplane (scheduled activities, e.g. all by all comparison) (10000s of CPUs) Data- Base Farm 10 GigE Fabric Calit2s Direct Access Core Architecture Will Create Next Generation Metagenomics Server Source: Phil Papadopoulos, SDSC, Calit2 + Web Services Sargasso Sea Data Sorcerer II Expedition (GOS) JGI Community Sequencing Project Moore Marine Microbial Project NASA Goddard Satellite Data Community Microbial Metagenomics Data Web (other service) Local Cluster Local Environment Direct Access Lambda Cnxns
Marine Metagenomics Who is there? Drug discovery Environmental survey Microbial genetic survey Microbial genomic survey Symbiosis Organism discovery Marine conservation Evolution study Bioenergy discovery Endosymbiosis Biogeochemistry mapping Metabolic pathway discovery