Presentation is loading. Please wait.

Presentation is loading. Please wait.

GTL Facilities Computing Infrastructure for 21 st Century Systems Biology Ed Uberbacher ORNL & Mike Colvin LLNL.

Similar presentations


Presentation on theme: "GTL Facilities Computing Infrastructure for 21 st Century Systems Biology Ed Uberbacher ORNL & Mike Colvin LLNL."— Presentation transcript:

1 GTL Facilities Computing Infrastructure for 21 st Century Systems Biology Ed Uberbacher ORNL & Mike Colvin LLNL

2 Experimental: Complete datasets Quantitative measurements Comprehensive physical characterization:  Protein expression and interactions  Spatial distributions  Process kinetics Computational: Automated data analysis and validation Automated integration of diverse data sets Human and computer-accessible databases Molecular, Pathway and cell-level simulations The goals require a new synergy between computing and biology. Ultimate Goal is to Provide Predictive Models of Microbes This goal drives data collection and computing strategy.

3 GTL Biology Paradigm Integrated Large-Scale Experiment-Computing Cycles Real-Time Analysis Design or Revise Models Large-Scale Data Sets Large-Scale Data Sets Simulate and Generate Hypotheses Simulate and Generate Hypotheses Experiment

4 Facility I Production and Characterization of Proteins Estimating Microbial Genome Capability Computational Analysis Genome analysis of genes, proteins, and operons Metabolic pathways analysis from reference data Protein machines estimate from PM reference data Knowledge Captured Initial annotation of genome Initial perceptions of pathways and processes Recognized machines, function, and homology Novel proteins/machines (including prioritization) Production conditions and experience

5 Analysis and Modeling Mass spectrometry expression analysis Metabolic and regulatory pathway / network analysis and modeling Knowledge Captured Expression data and conditions Novel pathways and processes Functional inferences about novel proteins/machines Genome super annotation: regulation, function, and processes (deep knowledge about cellular subsystems) Facility II: Whole Proteome Analysis Modeling Proteome Expression, Regulation, and Pathways

6 Skeletogenic Regulatory Gene Network Model for Endomesoderm Specification Eric Davidson

7 Facility III: Characterization and Imaging of Molecular Machines Exploring Molecular Machine Geometry and Dynamics Computational Analysis, Modeling and Simulation Image analysis/cryoelectron microscopy Protein interaction analysis/mass spec Machine geometry and docking modeling Machine biophysical dynamic simulation Knowledge Captured Machine composition, organization, geometry, assembly and disassembly Component docking and dynamic simulations of machines

8 Classical Mol. Dynamics Jeruzalmi et al. Cell 106:417 (2001) Mechanistic model based on physical and biochemical data Jeruzalmi et al. Cell 106:429 (2001) Electron microscopy Mayanagi et al. J. Struct. Bio. 134: 35 (2001) Homology Modeling Venclovas et al. Prot. Sci. 11:2403 (2002) Atomic Force Microscopy Shiomi, et al. PNAS, 97:14127 (2002) Example of Combined Experiment and Modeling to Understand a Multiprotein Complex: DNA Clamps & Clamp-Loading Mechanisms

9 Facility IV: Analysis and Modeling of Cellular Systems Simulating Cell and Community Dynamics Analysis, Modeling and Simulation Couple knowledge of pathways, networks, and machines to generate an understanding of cellular and multi-cellular systems Metabolism, regulation, and machine simulation Cell and multicell modeling and flux visualization Knowledge Captured Cell and community measurement data sets Protein machine assembly time-course data sets Dynamic models and simulations of cell processes

10 Facility 1 genome annotation regulatory element and operon identification metabolic pathway analysis Facility 2 mass spec data analysis expression analysis and clustering metabolic and regulatory network modeling Facility 3 image analysis mass spec analysis protein / machine modeling docking and molecular dynamics Facility 4 metabolic simulation regulatory simulation cell modeling and simulations Collect and manage software - Maintain current versions - Ensure hardware compatability - User Interfaces - Documentation Centrally Planned Analysis and Modeling Tools Libraries

11 ATCGTAGCAATCGACCGT... CGGCTATAGCCGTTACCG… TTATGCTATCCATAATCGA... GGCTTAATCGCATACGAC... Capacity: e.g., High- throughput protein structure predictions Thread onto templates Best match Capability: e.g., Large scale biophysical simulations: Large size and timescale classical simulations Highly accurate quantum mechanical simulations GTL facilities will Require High Performance Computing for Both Capacity and Capability

12 GTL High-Performance Computing Roadmap Biological Complexity Comparative Genomics Constraint-Based Flexible Docking 1000 TF 100 TF 10 TF 1 TF* Constrained rigid docking Genome-scale protein threading Community metabolic regulatory, signaling simulations Molecular machine classical simulation Protein machine Interactions Cell, pathway, and network simulation Molecule-based cell simulation *Teraflops Current U.S. Computing

13 Swimming in Data: Exploding Need to Capture and Manipulate Data ● From Acquisition, Refinement, Reduction and Deposition ● Across Scales of Space and Time - Petabytes

14 Data Repositories Genomes, annotation and community ‘genomes’ Expression data and proteome composition Metabolite and flux data Metabolic pathways and kinetic parameters Protein interactions Protein machines repository - machine composition, function, homology, models Image data repository Regulatory network data and models Cell models repository Integrated or integrable Requires development of cross-facilities approach phylogeny microbial genomes protein domains pathways regulatory elements community genomes literature Metabolic models Expression proteomics protein machines regulatory networks protein structure Central Database Planning

15 Simulation of even “simple” metabolic pathway depends on large volume data Annotated data sets Raw data sets The GTL Knowledge Base: Integration of Large Datasets is a Precursor to Predictive Modeling GTL knowledge base will change how information about microbes reaches the community Models and simulations will be online We will know more and more about systems in each consecutive microbe

16


Download ppt "GTL Facilities Computing Infrastructure for 21 st Century Systems Biology Ed Uberbacher ORNL & Mike Colvin LLNL."

Similar presentations


Ads by Google