Presentation on theme: "Cyber Metagenomics; Challenge to See The Unseen Majority in The Ocean"— Presentation transcript:
1 Cyber Metagenomics; Challenge to See The Unseen Majority in The Ocean Kayo ArimaCalifornia Institute for Telecommunications and Information Technology (Calit2)-University of California, San Diego Division
2 Science Falkowski and Vargas 304 (5667): 58 Looking Back Nearly 4 Billion Years In the Evolution of Microbe GenomicsEukaryote has the nuclei .Prokaryotes has genes butno nuclear membrane.Science Falkowski and Vargas 304 (5667): 58
3 Much of Genome Work Has Occurred in Animals Evolution is the Principle of Biological Systems: Most of Evolutionary Time Was in the Microbial WorldYou Are HereMuch of Genome Work Has Occurred in AnimalsSource: Carl Woese, et al
4 Two completely different approach to get microbial genomic information Microbial whole genomicsMetagenomicsEnvironmental sampleCulture (grow) in labIsolate the colonyCulture the isolated colonyDNA extractionEnz. digestionShotgun sequencingGene assemblyEnvironmental sampleDNA extractionEnz. digestionShotgun sequencingScaffold assemblySource: Karin RemingtonJ. Craig Venter Institute
5 Down Side of Metagenomics Often fragmentaryOften highly divergentRarely any known activityNo chromosomal placementNo organism of originAb initio ORF predictionsHuge data
6 GenBank Protein Data Bank Genomic Data Is Growing Rapidly, But Metagenomics Will Vastly Increase The Scale…100 Billion Bases!35,000 StructuresGenBankProtein Data BankTotal Data < 1TB
7 First Genome 1995 6 Genomes/ Year 2000 Full Genome Sequencing is Exploding: Most Sequenced Genomes are BacterialFirst Genome Genomes/ Year 2000Ongoing GenomesCompleted Genomes90MetagenomesTotal 422Total 1665
8 Marine MetagenomicsMicrobes account for more than 90% of ocean biomass, mediate all biochemical cycles in the oceans and are responsible for 98% of primary production in the sea.Metagenomics is a breakthrough sequencing approach to examine the open-space microbial species without the need for isolation and lab cultivation of individual species.
10 Marine Genome Sequencing Project Measuring the Genetic Diversity of Ocean Microbes Sorcerer II Data from this area has already reach to 10% of GenBank.The Entire Data Will Double Number of Proteins in Embank!
11 Sample Metadata from GOS Site MetadataLocation (lat/long, water depth)Site characterization (finite list of types plus “other”)Site description (free text)CountrySampling MetadataSample collection date/timeSampling depthConditions at time of sampling (e.g., stormy, surface temperature)Sample physical/chemical measurements (T (oC), S (ppt), chl a (mg m-3), etc)“author”Experimental ParametersFilter sizeInsert size
12 Flat FileServerFarmW E B PORTALTraditionalUserResponseRequestDedicatedCompute Farm(1000 CPUs)TeraGrid: Cyberinfrastructure Backplane(scheduled activities, e.g. all by all comparison)(10000s of CPUs)Data-Base10 GigEFabricCalit2’s Direct Access Core Architecture Will Create Next Generation Metagenomics ServerSource: Phil Papadopoulos, SDSC, Calit2+ Web ServicesSargasso Sea DataSorcerer II Expedition (GOS)JGI Community Sequencing ProjectMoore Marine Microbial ProjectNASA Goddard Satellite DataCommunity Microbial Metagenomics DataWeb(other service)LocalClusterEnvironmentDirectAccessLambdaCnxns
13 Who is there? Marine Metagenomics Metabolic pathway discovery Drug discoveryMicrobial genetic surveyEnvironmental surveySymbiosisWho is there?Evolution studyEndosymbiosisOrganism discoveryMicrobial genomic surveyBioenergy discoveryBiogeochemistry mappingMarine conservation