Microbial Metagenomics Drives a New Cyberinfrastructure Invited Talk School of Biological Sciences University of California, Irvine March 3, 2006 Dr. Larry.

Slides:



Advertisements
Similar presentations
OptIPuter Goal: Removing Bandwidth Barriers to e-Science ATLAS Sloan Digital Sky Survey LHC ALMA.
Advertisements

Cyber Metagenomics; Challenge to See The Unseen Majority in The Ocean
Calit2 " Talk Nortel Visiting Team December 12, 2005 Dr. Larry Smarr Director, California Institute for Telecommunications and Information.
Advancing the Metagenomics Revolution Invited Talk Symposium #1816, Managing the Exaflood: Enhancing the Value of Networked Data for Science and Society.
The Coming Revolution in Environmental Awareness
Presentation for the Microbe Project Interagency Team
"Cyberinfrastructure for Environmental Observations" Invited Talk to Symposium on Science and Technology in GEOSS: The Role of Universities Hosted by
Sequencing Genomics: The New Big Data Driver IntermezzoTalk SURFnet7, Part of GigaPort3 Utrecht, Netherlands December 7, 2011 Dr. Larry Smarr Director,
Creating High Performance Lambda Collaboratories" ONR Briefing ACCESS DC Arlington, VA March 25, 2005 Dr. Larry Smarr Director, California Institute for.
Calit2-Living in the Future " Keynote Sharecase 2006 University of California, San Diego March 29, 2006 Dr. Larry Smarr Director, California Institute.
Calit2s Program in Nano-science, Nano-engineering, and Nano-medicine Invited Talk Review of Nano-cancer project April 11, 2006 Dr. Larry Smarr Director,
Bringing Mexico Into the Global LambdaGrid Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber.
Creating a Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis (a.k.a. CAMERA) Invited Talk Honoring David Kingsbury.
“Tracking Immune Biomarkers and the Human Gut Microbiome: Inflammation, Crohn's Disease, and Colon Cancer” USC Monthly Seminar Series Physical Sciences.
Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis (CAMERA) Invited Talk CONNECT Board Meeting La Jolla, CA April 26, 2006.
High Resolution Multimedia in a Ultra Bandwidth World After Dinner Talk IEEE ISM2005 Irvine, CA December 13, 2005 Dr. Larry Smarr Director, California.
Exploring Our Inner Universe Using Supercomputers and Gene Sequencers Physics Department Colloquium UC San Diego October 24, 2013 Dr. Larry Smarr Director,
1 The Importance of Large-Scale Computer Science Research Efforts Talk at Public Seminar on Large-Scale NSF Research Efforts for the Future Computer Museum.
Collaborations Between Calit2, SIO, and the Venter Institutea Beginning " Talk to the Venter Institute Board La Jolla, CA December 5, 2005 Dr. Larry Smarr.
The CAMERA Project Metagenomics 2006 Oct 3-5, 2006 Paul Gilna, Calit2, UCSD.
High Performance Cyberinfrastructure Discovery Tools for Data Intensive Research Larry Smarr Prof. Computer Science and Engineering Director, Calit2 (UC.
Why Optical Networks Are Emerging as the 21 st Century Driver Scientific American, January 2001.
"The OptIPuter: an IP Over Lambda Testbed" Invited Talk NREN Workshop VII: Optical Network Testbeds (ONT) NASA Ames Research Center Mountain View, CA August.
The First Year of Cal-(IT) 2 Report to The University of California Regents UCSF San Francisco, CA March 13, 2002 Dr. Larry Smarr Director, California.
Scaling-Up the BIRN Building on the BIRN Workshop National Institutes of Health Bethesda, MD March 22, 2004 Dr. Larry Smarr Director, California Institute.
JGI Timeline 1997 JGI April 2003 Human Genome Program Officially Ended Human Genome Program Officially Launched 1990 Joint Genome Institute ………………….(JGI)
AHM Overview OptIPuter Overview Third All Hands Meeting OptIPuter Project San Diego Supercomputer Center University of California, San Diego January 26,
Genomics at the Speed of Light: Understanding the Living Ocean The Gordon and Betty Moore Foundation 2nd Annual Marine Microbiology Investigator Symposium.
Microbial Metagenomics and Human Health Invited Talk Health Sciences Advisory Board School of Medicine University of California, San Diego May 8, 2006.
Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis (CAMERA) Invited Keynote Annual Meeting CENIC 2006 Oakland, CA March 13,
The Sorcerer II Global ocean sampling expedition Katrine Lekang Global Ocean Sampling project (GOS) Global Ocean Sampling project (GOS) CAMERA CAMERA METAREP.
“ OptIPuter Tech Transfer to the Broader e-Science and HPC Communities " OptIPuter All Hands Meeting La Jolla, CA December 20, 2006 Dr. Larry.
The OptIPuter Project: From the Grid to the LambdaGrid Invited Talk IEEE Orange County Computer Society Irvine, CA October 24, 2005 Dr. Larry Smarr Director,
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology Metagenomics Center for Earth Observations and Applications Advisory Committee.
Presentation Title April 4, 2002 CAMERA- Metagenomics meets the Cyberinfrastructure David T. Kingsbury Gordon and Betty Moore Foundation BERAC - October.
Genomics at the Speed of Light: Understanding the Living Ocean Invited Talk JASON Summer Program La Jolla, CA July 12, 2006 Dr. Larry Smarr Director, California.
Bringing Mexico Into the Global LambdaGrid Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E. Gruber.
Beyond the Human Genome Project Future goals and projects based on findings from the HGP.
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology Metagenomics Invited Talk 2006 Synthetic Biology Symposium Aliso Creek Inn.
“ NCSA and Telepresence Collaboration ” Remote Telepresence Talk to The 2006 NCSA Private Sector Program Annual Meeting In Honor of John Stevenson’s Retirement.
“Calit2: A UC Experiment for Living in the Future" Talk to UCSD Near You La Jolla, CA April 11, 2006 Dr. Larry Smarr Director, California Institute.
Developing a North American Global LambdaGrid Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technology Harry E.
“ High Performance Collaboration – The Jump to Light Speed " Talk to A Visiting Team from Intel June 25, 2006 Dr. Larry Smarr Director, California.
GENI GEC 15 Bonnie Hurst Experimental Support Service
Cal-(IT) 2 : A Public-Private Partnership in Southern California U.S. Business Council for Sustainable Development Year-End Meeting December 11, 2003 Institute.
Innovative Research Alliances Invited Talk IUCRP Fellows Seminar UCSD La Jolla, CA July 10, 2006 Dr. Larry Smarr Director, California Institute for Telecommunications.
“Cyberinfrastructure for Ocean Cabled Observatories" Invited Talk NEPTUNE Regional Cabled Ocean Observatory Workshop Seattle, WA November 15, 2005 Dr.
Using Photonics to Prototype the Research Campus Infrastructure of the Future: The UCSD Quartzite Project Philip Papadopoulos Larry Smarr Joseph Ford Shaya.
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology Metagenomics Invited Talk Metagenomics 2006 UCSD La Jolla, CA October.
“Living in a Microbial World” Global Health Program Council on Foreign Relations New York, NY April 10, 2014 Dr. Larry Smarr Director, California Institute.
Copyright 2004 National LambdaRail, Inc N ational L ambda R ail Update 9/28/2004 Debbie Montano Director, Development & Operations
“ Genomic Research: The Jump to Light Speed " Invited Talk Genomes, Medicine, and the Environment Conference 2005 Hilton Head, SC October 19, 2005 Dr.
“Observing the Dynamics of the Human Immune System Coupled to the Microbiome in Health and Disease” CASIS Workshop on Biomedical Research Aboard the ISS.
Ocean Sciences Cyberinfrastructure Futures Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technologies Harry E.
“ Collaborations Between Calit2, SIO, and the Venter Institute—a Beginning " Talk to the UCSD Representative Assembly La Jolla, CA November 29, 2005 Dr.
“CAMERA Goes Live!" Presentation with Craig Venter National Press Club Washington, DC March 13, 2007 Dr. Larry Smarr Director, California Institute for.
“The UCSD Big Data Freeway System” Invited Short Talk Workshop on “Enriching Human Life and Society” UC San Diego February 6, 2014 Dr. Larry Smarr Director,
“ OptIPuter Year Five: From Research to Adoption " OptIPuter All Hands Meeting La Jolla, CA January 22, 2007 Dr. Larry Smarr Director, California.
es/by-sa/2.0/. Metagenomics Prof:Rui Alves Dept Ciencies Mediques Basiques, 1st Floor, Room.
“Genomics: The CAMERA Project" Invited Talk 5 th Annual ON*VECTOR International Photonics Workshop UCSD February 28, 2006 Dr. Larry Smarr Director,
All Hands Meeting 2005 BIRN-CC: Building, Maintaining and Maturing a National Information Infrastructure to Enable and Advance Biomedical Research.
1 Modelling and Simulation EMBL – Beyond Molecular Biology Physics Computational Biology Chemistry Medicine.
High Performance Cyberinfrastructure Discovery Tools for Data Intensive Research Larry Smarr Prof. Computer Science and Engineering Director, Calit2 (UC.
1 Finding disease genes: A challenge for Medicine, Mathematics and Computer Science Andrew Collins, Professor of Genetic Epidemiology and Bioinformatics.
“OptIPuter: From the End User Lab to Global Digital Assets" Panel UC Research Cyberinfrastructure Meeting October 10, 2005 Dr. Larry Smarr.
“ Building an Information Infrastructure to Support Microbial Metagenomic Sciences " Presentation to the NBCR Research Advisory Committee UCSD La Jolla,
Invited Talk Metagenomics 2006 UCSD La Jolla, CA October 4, 2006
Joslynn Lee – Data Science Educator
“Building an Information Infrastructure to Support Genetic Sciences"
The OptIPuter Project: From the Grid to the LambdaGrid
Presentation transcript:

Microbial Metagenomics Drives a New Cyberinfrastructure Invited Talk School of Biological Sciences University of California, Irvine March 3, 2006 Dr. Larry Smarr Director, California Institute for Telecommunications and Information Technologies Harry E. Gruber Professor, Dept. of Computer Science and Engineering Jacobs School of Engineering, UCSD

Abstract Calit2, in partnership with J. Craig Venter Institute in Rockville, MD, and UCSD's Center for Earth Observations and Applications at Scripps Institution of Oceanography, will build a state-of-the-art computational resource and develop software tools to decipher the genetic code of communities of microbial life in the world's oceans. The Gordon and Betty Moore Foundation has awarded $24.5 million over seven years to create the Community Cyberinfrastructure for Advanced Marine Microbial Ecology Research and Analysis (CAMERA). Scientists will use CAMERA for metagenomics research -- analyzing microbial genomic sequence data in the context of other microbial species, as well as in comparison to a variety of other "metadata" such as the chemical and physical conditions in which microbes are sampled. The CAMERA project will contain the results of the Venter Institute's Sorcerer II Expedition, which carried out the first large-scale genomic survey of microbial life in the world's oceans to produce the largest gene catalogue ever assembled. Sorcerer II is expected to more than double the number of protein sequences currently available in the National Institutes of Health's GenBank. In addition to Sorcerer II's ecological genomic data, the CAMERA database will be augmented by the full genomes of more than 150 critical marine microbes enabling new comparative genomics studies.

Calit2 Brings Computer Scientists and Engineers Together with Biomedical Researchers Some Areas of Concentration: –Metagenomics –Genomic Analysis of Organisms –Evolution of Genomes –Cancer Genomics –Human Genomic Variation and Disease –Mitochondrial Evolution –Proteomics –Computational Biology –Information Theory and Biological Systems UC San Diego UC Irvine 1200 Researchers in Two Buildings

Evolution is the Principle of Biological Systems: Most of Evolutionary Time Was in the Microbial World You Are Here Source: Carl Woese, et al Much of Genome Work Has Occurred in Animals

David A. Hinds, Laura L. Stuve, Geoffrey B. Nilsen, Eran Halperin, Eleazar Eskin, Dennis G. Ballinger, Kelly A. Frazer, David R. Cox. Whole-Genome Patterns of Common DNA Variation in Three Human Populations Science 18 February, 2005: 307(5712): Calit2 Researcher Eskin Collaborates with Perlegen Sciences on Map of Human Genetic Variation Across Populations We have characterized whole-genome patterns of common human DNA variation by genotyping 1,586,383 single-nucleotide polymorphisms (SNPs) in 71 Americans of European, African, and Asian ancestry. Although knowledge of a single genetic risk factor can seldom be used to predict the treatment outcome of a common disease, knowledge of a large fraction of all the major genetic risk factors contributing to a treatment response or common disease could have immediate utility, allowing existing treatment options to be matched to individual patients without requiring additional knowledge of the mechanisms by which the genetic differences lead to different outcomes. More detailed haplotype analysis results are available at

For Mitochondrial Diseases It Has Been More Productive to Classify Patients by Genetic Defect Rather than by Clinical Manifestation Over the past 10 years, mitochondrial defects have been implicated in a wide variety of degenerative diseases, aging, and cancer… The same mtDNA mutation can produce quite different phenotypes, and different mutations can produce similar phenotypes. …The essential role of mitochondrial oxidative phosphorylation in cellular energy production, the generation of reactive oxygen species, and the initiation of apoptosis has suggested a number of novel mechanisms for mitochondrial pathology. --Douglas Wallace, Science, Vol. 283, , 5 March 1999

Comparative Genomics Can Reveal Biological Facts That Are Not Visible Within a Species After sequencing these three genomes, it is clear that substantial rearrangements in the human genome happen only once in a million years, while the rate of rearrangements in the rat and mouse is much faster. --Glenn Tesler, UCSD Dept. of Mathematics Co-Authors Pavel Pevzner and Glenn Tesler, UCSD April 1, 2004 December 05, 2002 December 9, 2004

Advanced Algorithmic Techniques Reveal Unexpected Results Many of the chicken– human aligned, non-coding sequences occur far from genes, frequently in clusters that seem to be under selection for functions that are not yet understood. Nature 432, (09 December 2004)

Microbial Metagenomics is a Rapidly Emerging Field of Research Despite their ubiquity, relatively little is known about the majority of environmental microorganisms, largely because of their resistance to culture under standard laboratory conditions. The application of high-throughput shotgun sequencing environmental samples has recently provided global views of those communities not obtainable from 16S rRNA or BAC clone– sequencing surveys. Comparative Metagenomics of Microbial Communities Susannah Green Tringe, Christian von Mering, Arthur Kobayashi, Asaf A. Salamov, Kevin Chen, Hwai W. Chang, Mircea Podar, Jay M. Short, Eric J. Mathur, John C. Detter, Peer Bork, Philip Hugenholtz, Edward M. Rubin Science 22 April 2005

Looking Back Nearly 4 Billion Years In the Evolution of Microbe Genomics Science Falkowski and Vargas 304 (5667): 58

The Sargasso Sea Experiment The Power of Environmental Metagenomics Yielded a Total of Over 1 billion Base Pairs of Non-Redundant Sequence Displayed the Gene Content, Diversity, & Relative Abundance of the Organisms Sequences from at Least 1800 Genomic Species, including 148 Previously Unknown Identified over 1.2 Million Unknown Genes MODIS-Aqua satellite image of ocean chlorophyll in the Sargasso Sea grid about the BATS site from 22 February 2003 J. Craig Venter, et al. Science 2 April 2004: Vol pp

PI Larry Smarr

Marine Genome Sequencing Project Measuring the Genetic Diversity of Ocean Microbes CAMERA will include All Sorcerer II Metagenomic Data

Moore Foundation Funded the Venter Institute to Provide the Full Genome Sequence of 150 Marine Microbes CAMERA will include All Moore Marine Microbial Genomes

Moore Microbial Genome Sequencing Project: Cyanobacteria Being Sequenced by Venter Institute

Moore Microbial Genome Sequencing Project Selected Microbes Throughout the Worlds Oceans

Calit2 is Discussing Including Other Metagenomic Data Sets A majority of the bacterial sequences corresponded to uncultivated species and novel microorganisms. We discovered significant intersubject variability. Characterization of this immensely diverse ecosystem is the first step in elucidating its role in health and disease. Diversity of the Human Intestinal Microbial Flora Paul B. Eckburg, et al Science (10 June 2005) 395 Phylotypes

Genomic Data Is Growing Rapidly, But Metagenomics Will Vastly Increase The Scale… GenBank Protein Data Bank Billion Bases! Total Data < 1TB 35,000 Structures

Metagenomics Will Couple to Earth Observations Which Add Several TBs/Day Source: Glenn Iona, EOSDIS Element Evolution Technical Working Group January 6-7, 2005

Challenge: Average Throughput of NASA Data Products to End User is < 50 Mbps Tested October Internet2 Backbone is 10,000 Mbps! Throughput is < 0.5% to End User

San Francisco Pittsburgh Cleveland National Lambda Rail (NLR) and TeraGrid Provides Cyberinfrastructure Backbone for U.S. Researchers San Diego Los Angeles Portland Seattle Pensacola Baton Rouge Houston San Antonio Las Cruces / El Paso Phoenix New York City Washington, DC Raleigh Jacksonville Dallas Tulsa Atlanta Kansas City Denver Ogden/ Salt Lake City Boise Albuquerque UC-TeraGrid UIC/NW-Starlight Chicago International Collaborators NLR 4 x 10Gb Lambdas Initially Capable of 40 x 10Gb wavelengths at Buildout NSFs TeraGrid Has 4 x 10Gb Lambda Backbone Links Two Dozen State and Regional Optical Networks DOE, NSF, & NASA Using NLR

The OptIPuter Project – Creating a LambdaGrid Web for Gigabyte Data Objects NSF Large Information Technology Research Proposal –Calit2 (UCSD, UCI) and UIC Lead CampusesLarry Smarr PI –Partnering Campuses: USC, SDSU, NW, TA&M, UvA, SARA, NASA Industrial Partners –IBM, Sun, Telcordia, Chiaro, Calient, Glimmerglass, Lucent $13.5 Million Over Five Years Linking Global Scale Science Projects to Users Linux Clusters NIH Biomedical Informatics NSF EarthScope and ORION Research Network

Using the OptIPuter to Couple Data Assimilation Models to Remote Data Sources Including Biology Regional Ocean Modeling System (ROMS) NASA MODIS Mean Primary Productivity for April 2001 in California Current System

Calit2 Intends to Jump Beyond Traditional Web-Accessible Databases Data Backend (DB, Files) W E B PORTAL (pre-filtered, queries metadata) Response Request BIRN PDB NCBI Genbank + many others Source: Phil Papadopoulos, SDSC, Calit2

Flat File Server Farm W E B PORTAL Traditional User Response Request Dedicated Compute Farm (100s of CPUs) TeraGrid: Cyberinfrastructure Backplane (scheduled activities, e.g. all by all comparison) (10000s of CPUs) Web (other service) Local Cluster Local Environment Direct Access Lambda Cnxns Data- Base Farm 10 GigE Fabric Calit2s Direct Access Core Architecture Will Create Next Generation Metagenomics Server Source: Phil Papadopoulos, SDSC, Calit2 + Web Services Sargasso Sea Data Sorcerer II Expedition (GOS) JGI Community Sequencing Project Moore Marine Microbial Project NASA Goddard Satellite Data Community Microbial Metagenomics Data

First Implementation of the CAMERA Complex Compute Database & Storage

Analysis Data Sets, Data Services, Tools, and Workflows Assemblies of Metagenomic Data –e.g, GOS, JGI CSP Annotations –Genomic and Metagenomic Data All-against-all Alignments of ORFs –Updated Periodically Gene Clusters and Associated Data –Profiles, Multiple-Sequence Alignments, –HMMs, Phylogenies, Peptide Sequences Data Services –Raw and Specialized Analysis Data –Rich Query Facilities Tools and Workflows –Navigate and Sift Raw and Analysis Data –Publish Workflows and Develop New Ones –Prioritize Features via Dialogue with Community Source: Saul Kravitz Director of Software Engineering J. Craig Venter Institute

CAMERA Timeline Release 1: Mid-2006 –Majority of GOS + Moore Microbe Genome Data –6 Gbp Has Been Assembled –Initial Versions of Core Tools –BLAST, Reference Alignment Viewer Release 2: Early-2007 –Additional Data –Additional/Improved Tools –Improved Usability Subsequent –Move Towards Semantic DB, Direct Access –Additional Tools & Data Based on Community Feedback

Announcing Tuesday January 17, 2006

The Bioinformatics Core of the Joint Center for Structural Genomics will be Housed in the Building Extremely Thermostable -- Useful for Many Industrial Processes (e.g. Chemical and Food) 173 Structures (122 from JCSG) Determining the Protein Structures of the Thermotoga Maritima Genome 122 T.M. Structures Solved by JCSG (75 Unique In The PDB) Direct Structural Coverage of 25% of the Expressed Soluble Proteins Probably Represents the Highest Structural Coverage of Any Organism Source: John Wooley, UCSD

UCIs IGB Develops a Suite of Programs and Servers for Protein Structure and Structural Feature Prediction Source: Pierre Baldi, UCI Sixty Affiliated IGB Labs at UCI e.g.:

CAMERA Builds on Cyberinfrastructure Grid, Workflow, and Portal Projects in a Service Oriented Architecture Cyberinfrastructure: Raw Resources, Middleware & Execution Environment NBCR Rocks Clusters Virtual Organizations Web Services KEPLER Workflow Management Vision Telescience Portal National Biomedical Computation Resource an NIH supported resource center Located in Building

Calit2 is Collaborating with Douglas Wallace-- Planning to Bring MITOMAP into Calit2 Domain The Human mtDNA Map, Showing the Location of Selected Pathogenic Mutations Within the 16,569-Base Pair Genome MITOMAP: A Human Mitochondrial Genome Database March 1999

Displaying Images from Electron Microscope Zeiss Scanning Electron Microscope in UCI

Zooming In

ProchlorococcusMicrobacterium Burkholderia RhodobacterSAR-86 unknown Metagenomics Extreme Assembly Requires Large Amount of Pixel Real Estate Source: Karin Remington J. Craig Venter Institute

Metagenomics Requires a Global View of Data and the Ability to Zoom Into Detail Interactively Overlay of Metagenomics Data onto Sequenced Reference Genomes (This Image: Prochloroccocus marinus MED4) Source: Karin Remington J. Craig Venter Institute

OptIPuter Scalable Adaptive Graphics Environment (SAGE) Allows Integration of HD Streams Source: David Lee, NCMIR, UCSD

Calit2 and the Venter Institute Will Combine Telepresence with Remote Interactive Analysis OptIPuter Visualized Data HDTV Over Lambda Live Demonstration of 21st Century National-Scale Team Science 25 Miles Venter Institute

Created by Garrett Hildebrand Modified by Jessica Yu Calit2 Building UCInet 10 GE HIPerWall Los Angeles SPDS Catalyst 3750 in CSI ONS WDM at UCI campus MPOE (CPL) 1 GE DWDM Network Line Tustin CENIC Calren POP UCSD Optiputer Network 10 GE DWDM Network Line Engineering Gateway Building, Catalyst 3750 in 3 rd floor IDF MDF Catalyst 6500 w/ firewall, 1 st floor closet Wave-2: layer-2 GE. UCSD address space /28 Floor 2 Catalyst 6500 Floor 3 Catalyst 6500 Floor 4 Catalyst 6500 Wave-1: UCSD address space NACS-reserved for testing ESMF Catalyst 3750 in NACS Machine Room (Optiputer) Viz Lab Wave 1 1GE Wave 2 1GE is Up and Working

Calit2/SDSC Proposal to Create a UC Cyberinfrastructure of On-Ramps to National LambdaRail Resources OptIPuter + CalREN-XD + TeraGrid = OptiGrid Source: Fran Berman, SDSC, Larry Smarr, Calit2 Creating a Critical Mass of End Users on a Secure LambdaGrid UC San Francisco UC San Diego UC Riverside UC Irvine UC Davis UC Berkeley UC Santa Cruz UC Santa Barbara UC Los Angeles UC Merced