Presentation on theme: "Cyber Metagenomics; Challenge to See The Unseen Majority in The Ocean"— Presentation transcript:
1Cyber Metagenomics; Challenge to See The Unseen Majority in The Ocean Kayo ArimaCalifornia Institute for Telecommunications and Information Technology (Calit2)-University of California, San Diego Division
2Science Falkowski and Vargas 304 (5667): 58 Looking Back Nearly 4 Billion Years In the Evolution of Microbe GenomicsEukaryote has the nuclei .Prokaryotes has genes butno nuclear membrane.Science Falkowski and Vargas 304 (5667): 58
3Much of Genome Work Has Occurred in Animals Evolution is the Principle of Biological Systems: Most of Evolutionary Time Was in the Microbial WorldYou Are HereMuch of Genome Work Has Occurred in AnimalsSource: Carl Woese, et al
4Two completely different approach to get microbial genomic information Microbial whole genomicsMetagenomicsEnvironmental sampleCulture (grow) in labIsolate the colonyCulture the isolated colonyDNA extractionEnz. digestionShotgun sequencingGene assemblyEnvironmental sampleDNA extractionEnz. digestionShotgun sequencingScaffold assemblySource: Karin RemingtonJ. Craig Venter Institute
5Down Side of Metagenomics Often fragmentaryOften highly divergentRarely any known activityNo chromosomal placementNo organism of originAb initio ORF predictionsHuge data
6GenBank Protein Data Bank Genomic Data Is Growing Rapidly, But Metagenomics Will Vastly Increase The Scale…100 Billion Bases!35,000 StructuresGenBankProtein Data BankTotal Data < 1TB
7First Genome 1995 6 Genomes/ Year 2000 Full Genome Sequencing is Exploding: Most Sequenced Genomes are BacterialFirst Genome Genomes/ Year 2000Ongoing GenomesCompleted Genomes90MetagenomesTotal 422Total 1665
8Marine MetagenomicsMicrobes account for more than 90% of ocean biomass, mediate all biochemical cycles in the oceans and are responsible for 98% of primary production in the sea.Metagenomics is a breakthrough sequencing approach to examine the open-space microbial species without the need for isolation and lab cultivation of individual species.
10Marine Genome Sequencing Project Measuring the Genetic Diversity of Ocean Microbes Sorcerer II Data from this area has already reach to 10% of GenBank.The Entire Data Will Double Number of Proteins in Embank!
11Sample Metadata from GOS Site MetadataLocation (lat/long, water depth)Site characterization (finite list of types plus “other”)Site description (free text)CountrySampling MetadataSample collection date/timeSampling depthConditions at time of sampling (e.g., stormy, surface temperature)Sample physical/chemical measurements (T (oC), S (ppt), chl a (mg m-3), etc)“author”Experimental ParametersFilter sizeInsert size
12Flat FileServerFarmW E B PORTALTraditionalUserResponseRequestDedicatedCompute Farm(1000 CPUs)TeraGrid: Cyberinfrastructure Backplane(scheduled activities, e.g. all by all comparison)(10000s of CPUs)Data-Base10 GigEFabricCalit2’s Direct Access Core Architecture Will Create Next Generation Metagenomics ServerSource: Phil Papadopoulos, SDSC, Calit2+ Web ServicesSargasso Sea DataSorcerer II Expedition (GOS)JGI Community Sequencing ProjectMoore Marine Microbial ProjectNASA Goddard Satellite DataCommunity Microbial Metagenomics DataWeb(other service)LocalClusterEnvironmentDirectAccessLambdaCnxns
13Who is there? Marine Metagenomics Metabolic pathway discovery Drug discoveryMicrobial genetic surveyEnvironmental surveySymbiosisWho is there?Evolution studyEndosymbiosisOrganism discoveryMicrobial genomic surveyBioenergy discoveryBiogeochemistry mappingMarine conservation