Pathway/Genome Databases and Software Tools Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International

Slides:



Advertisements
Similar presentations
Editing Pathway/Genome Databases. SRI International Bioinformatics Pathway Tools Paradigm Separate database from user interface Navigator provides one.
Advertisements

1 SRI International Bioinformatics The Ocelot Frame Knowledge Representation System Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International.
Chapter 10: Designing Databases
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Gene Ontology John Pinney
Curation of the EcoCyc Database: The EcoCyc Update Project Martha Arnaud Scientific Database Curator Bioinformatics Research Group SRI International
Overview of Genome Databases Peter D. Karp, Ph.D. SRI International www-db.stanford.edu/dbseminar/seminar.html.
Contents of this Talk [Used as intro to Genome Databases Seminar, 2002] Overview of bioinformatics Motivations for genome databases Analogy of virus reverse-eng.
The EcoCyc and MetaCyc Pathway/Genome Databases
Jennifer A. Dunne Santa Fe Institute Pacific Ecoinformatics & Computational Ecology Lab Rich William, Neo Martinez, et al. Challenges.
Interoperation of Molecular Biology Databases Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International Menlo Park, CA
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Fungal Semantic Web Stephen Scott, Scott Henninger, Leen-Kiat Soh (CSE) Etsuko Moriyama, Ken Nickerson, Audrey Atkin (Biological Sciences) Steve Harris.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
August 29, 2002InforMax Confidential1 Vector PathBlazer Product Overview.
陳虹瑋 國立陽明大學 生物資訊學程 Genome Engineering Lab. Genome Engineering Lab The Newest.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Update on The Pathway Tools Software Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org MetaCyc.org.
Creating a … Community Database Organism-Specific Database Model-Organism Database.
Pathways Database System: An Integrated System For Biological Pathways L. Krishnamurthy, J. Nadeau, G. Ozsoyoglu, M. Ozsoyoglu, G. Schaeffer, M. Tasan.
Computational Exploration of Metabolic Networks with Pathway Tools Part 1: Overview & Representations Suzanne Paley Bioinformatics Research Group SRI International.
Genome database & information system for Daphnia Don Gilbert, October 2002 Talk doc at
Integration of E. Coli Data (E. coli Pathway and Genomic Data from BioCyc) Jesse Walsh.
Ch10. Intermolecular Interactions and Biological Pathways
1 SRI International Bioinformatics BioCyc Tutorial Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org,
1 SRI International Bioinformatics The Pathway Tools Software and BioCyc Database Collection Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
SRI International Bioinformatics 1 Pathway Tools: Recent Developments GMOD Meeting, June 2006.
Automated Explanation of Gene-Gene Relationships Wacek Kuśnierczyk.
1 SRI International Bioinformatics EcoCyc, MetaCyc, and the Pathway Tools Software Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International.
Data Content of the BioCyc Databases. BioCyc Tier 1 Databases.
Network Services for Biologists in the Genome Era The Work of the European Bioinformatics Institute.
The Pathway Tools Ontology and Inferencing Layer Peter D. Karp, Ph.D. SRI International.
The BioCyc Collection of Pathway/Genome Databases Alexander Shearer Bioinformatics Research Group SRI International BioCyc.org EcoCyc.org.
SRI International Bioinformatics 1 Recent Developments in Pathway Tools GMOD Workshop November ‘07 Suzanne Paley Bioinformatics Research Group SRI International.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
SRI International Bioinformatics 1 Advanced Editing of Pathway/Genome Databases Ron Caspi.
MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution.
1 SRI International Bioinformatics GO Term Integration and Curation in Pathway Tools and EcoCyc Ingrid M. Keseler Bioinformatics Research Group SRI International.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Copyright © 1997 Pangea Systems, Inc. All rights reserved. Pathway Tools Training Course.
A rapid algorithm for generating minimal pathway distances: Pathway distance correlates with genome distance but not enzyme function Stuart Rison 1*, Evangelos.
An approach to carry out research and teaching in Bioinformatics in remote areas Alok Bhattacharya Centre for Computational Biology & Bioinformatics JAWAHARLAL.
Introduction to biological molecular networks
Oracle Spatial Network Data Model Overview Oracle Life Sciences User Group Meeting Susie Stephens Life Sciences Product Manager Oracle Corporation.
SRI International Bioinformatics Update your computers! To install a patch: Tools => Instant Patch => Download and Activate All Patches.
SRI International Bioinformatics 1 Editing Pathway/Genome Databases Ron Caspi.
Biomax Informatics AG Bioinformatics designed with you in mind. FunCat TM, a controlled vocabulary encompassing the biology of prokaryotes, plants and.
A Portrait of the Semantic Web in Action Jeff Heflin and James Hendler IEEE Intelligent Systems December 6, 2010 Hyewon Lim.
SRI International Bioinformatics 1 Pathway Tools Features Available Only in the Desktop Version PathoLogic.
SRI International Bioinformatics Selected PathoLogic Refining Tasks Creation of Protein Complexes Assignment of Modified Proteins Operon Prediction.
Recent Developments and Future Directions in Pathway Tools Peter D. Karp SRI International.
Database Environment Chapter 2. The Three-Level ANSI-SPARC Architecture External Level Conceptual Level Internal Level Physical Data.
High throughput biology data management and data intensive computing drivers George Michaels.
1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.
Database Systems: Design, Implementation, and Management Tenth Edition
Editing Pathway/Genome Databases
Why Create a PGDB? Perform pathway analyses as part of a genome project Analyze omics data Create a central public information resource for the organism,
An Advanced Web Query Interface for Biological Databases
The Pathway Tools FBA Module
The Pathway Tools Schema
How to Administer a PGDB
생물정보학 Bioinformatics.
The Pathway Tools Software and BioCyc Database Collection
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
A Community Effort to Model the Human Microbiome
Overview of Microbial Pathway and Genome Databases
Visualization of Content Information in Networks using GlyphNet
Overview of the Pathway Tools Software and Pathway/Genome Databases
Presentation transcript:

Pathway/Genome Databases and Software Tools Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International

SRI International Bioinformatics Overview Overview of bioinformatics Motivations for the EcoCyc project EcoCyc demo Description of EcoCyc database and Pathway Tools software Underlying technologies l Ocelot object database l GKB Editor l X-windows to WWW translator

SRI International Bioinformatics Definition of Bioinformatics Computational techniques for management and analysis of biological data and knowledge l Methods for disseminating, archiving, interpreting, and mining scientific information

SRI International Bioinformatics Motivations for Bioinformatics Growth in molecular-biology knowledge Industrialization of biological experimentation High-throughput biology l Genome sequences l Gene and protein expression data l Protein-protein interaction data l Protein 3-D structures l ….

SRI International Bioinformatics A E

SRI International Bioinformatics Motivations for EcoCyc -- E. coli Encyclopedia Integrate E. coli information dispersed in the literature New paradigm of scientific publishing Model the full metabolic network of an organism Integrate genomic data with functional data Develop algorithms for computing with function Provide a challenging domain for computer- science research

SRI International Bioinformatics Definitions A chemical reaction interconverts chemical compounds An enzyme is a protein that accelerates chemical reactions A pathway is a linked set of reactions A conceptual unit of cell’s biochemical machine A + B = C + D A C E

SRI International Bioinformatics Organism-Specific Pathway/Genome Databases Layer functional information above the genome Rich ontology to encode biological information with high fidelity l Chromosomes, genes, operons, gene products, reactions, pathways Curated by experts for that organism l Integrate literature and computational predictions

SRI International Bioinformatics Pathway Tools Software Pathway/Genome Navigator l WWW publishing of PGDBs l Graphic depictions of pathways, chromosomes, operons l Pathway visualization of gene-expression data Pathway/Genome Editors l Distributed curation of genome annotations l Distributed object database system l Interactive editing tools PathoLogic l Prediction of metabolic network from genome

SRI International Bioinformatics EcoCyc = E.coli Dataset + Pathway/Genome Navigator Genes: 4,393 Gene Products: 4,393 Reactions: 1,117 Pathways: 158 Metabolic Network Compounds: 1,887 Operons: 375

SRI International Bioinformatics EcoCyc Collaborative development via internet l Karp -- Bioinformatics architect l Riley -- Metabolic pathways, signal transduction l Saier and Paulsen -- Transport l Collado -- Regulation of gene expression Ontology of 1000 biological classes 14,000 instances Over 2,600 registered users

SRI International Bioinformatics Pathway Tools Software Pathway/ Genome Databases Pathway/Genome Navigator PathoLogic Pathway Predictor Pathway/ Genome Editors

SRI International Bioinformatics Pathway/Genome Navigator Algorithmic visualization of pathway and genome data Predefined queries for each object type Hypertext navigation X-windows and WWW PathoLogic and Pathway Editors are X-windows only

SRI International Bioinformatics Creation of the Overview Graph Run layout algorithms on individual pathway graphs l Automatically determine topology of pathway graph l Apply associated layout algorithm (linear, circular, tidy tree) Use superpathways to create hierarchical layouts l Treat each individual pathway as a single node l Pathway connections are edges l Run appropriate layout algorithm Manually position the resulting pathway clusters

SRI International Bioinformatics Inference of Metabolic Pathways Genomic Map Genes Gene Products Reactions Pathway Metabolic Network Compounds Pathway/Genome Database PathoLogic List of Genes/ORFs List of Gene Products ANNOTATED GENOME Structured ASCII Text File DNA Sequence Reports MetaCyc

SRI International Bioinformatics Summary of H. pylori Analysis For 121 E. coli pathways, what is the evidence that each pathway occurs in H. pylori? l Strong evidence: 41 l Medium evidence: 29 l Little or no evidence: 51 l 31 reactions catalyzed by H. pylori but not by E. coli H. pylori has partial abilities to synthesize cofactors and amino-acids, extremely limited carbohydrate catabolism, some amino acid utilization, and a reductive citric-acid pathway

SRI International Bioinformatics Microbial Pathway/ Genome DBs Literature-based Datasets: MetaCyc Escherichia coli PathoLogic-based Datasets: Bacillus subtilis Mycobacterium tuberculosis Helicobacter pylori Haemophilus influenzae Mycoplasma pneumonia Treponema pallidum Chlamydia trachomatis Saccharomyces cerevisiae

SRI International Bioinformatics Pathway Tools Software Architecture Implemented in Common Lisp WWW server runs as a single Unix process with a separate thread to service each query Grasper-CL graph manager Ocelot object database GKB Editor schema-driven editor

SRI International Bioinformatics EcoCyc WWW Server

SRI International Bioinformatics Pathway Tools Architecture -- Development Configuration Ocelot DBMS GFP API Pathway Genome Navigator WWW Server X-Windows Graphics Object Editor Pathway Editor Reaction Editor Oracle

SRI International Bioinformatics Ocelot Database System Object Database Manager Persistence via filesystem or relational DBMS Demand and background faulting of objects from RDBMS Two-level object caching Extensive bioinformatics schema Stored transaction history l Inspect object history

SRI International Bioinformatics Ocelot Knowledge Server Architecture Frame data model Persistent storage via l Disk files l Oracle DBMS Optimistic concurrency-control protocol Schema evolution Logging facility

SRI International Bioinformatics The Frame Data Model Frames are of two types: classes, instances Frames have slots that define their properties, attributes, relationships A slot has one or more values Each value can be any Lisp datatype Slotunits define metadata about slots: l Domain, range, inverse l Collection type, number of values, value constraints

SRI International Bioinformatics Inference Capabilities Inheritance of defaults Slot values computed via attached procedures Maintenance of inverse relationships Constraint system l Deferred evaluation l Tolerant of nonconformant data

SRI International Bioinformatics Storage System Architecture Oracle KBs DBMS is submerged within FRS Relational schema is domain independent, supports multiple KBs simultaneously Frames transferred from DBMS to Ocelot l On demand l By background prefetcher l Memory cache l Persistent disk cache to speed performance via Internet

SRI International Bioinformatics Frame Faulting (get-slot-value gene ‘map-position) Gene present in in-memory object cache? Gene present in cache on local disk? Query Oracle DBMS

SRI International Bioinformatics Logging Oracle DBMS stores: l The latest version of each frame l A history of all OKBC operations applied to KB Reconstruct earlier versions of KB View history of changes to an object Update replicates Concurrency control

SRI International Bioinformatics Schema Management FRSs store and process class and instance information similarly Applications can query schema information as easily as they can query instances

SRI International Bioinformatics GKB Editor Browser and editor for KBs and ontologies Four editing tools GKB Editor reusable with multiple FRSs l All database queries via OKBC/GFP API l Interoperability achieved with Ocelot, LOOM, Ontolingua All operations are schema driven

SRI International Bioinformatics Editors Taxonomy editor Frame editor Relationships editor Spreadsheet editor

SRI International Bioinformatics Results Ocelot in use in the EcoCyc project for 5 years Supports collaborative development of EcoCyc by four groups in North America l Distributed architecture l GKB Editor in active use Supports development of 8 Pathway/Genome Databases

SRI International Bioinformatics Summary Pathway/Genome Databases Pathway Tools software l Extract pathways from genomes l Distributed curation tools l Query, visualization, WWW publishing l Analysis algorithms

SRI International Bioinformatics Computer Science Results Extend scalability and multiuser access for knowledge representation systems Reusable, schema-driven KB editor Hierarchical graph layout algorithms Dynamic translation from X-windows to HTML+GIF Importance of ontologies and of content: Discovery = Algorithm + Database

SRI International Bioinformatics Problem Solving Depends on Algorithms and Content Database Size and Quality Solution Quality Algorithm Quality Compute Time

SRI International Bioinformatics Bioinformatics Results: Content The EcoCyc database describes the full metabolic map of an organism The MetaCyc database describes over 300 metabolic pathways Ontology spans genome to pathway information

SRI International Bioinformatics Bioinformatics Results: Algorithms Software environment for genome and pathway information l Query and visualization l Distributed database development PathoLogic algorithm predicts the metabolic network of an organism from its genome Algorithms under development for qualitative modeling of the cell

SRI International Bioinformatics Acknowledgements Funding sources: l NIH National Center for Research Resources Collaborators: l Monica Riley, Marine Biological Laboratory l Milton Saier, UC San Diego l Julio Collado, UNAM l Christos Ouzounis, European Bioinformatics Institute Peter D. Karp, Ph.D.