Presentation is loading. Please wait.

Presentation is loading. Please wait.

Overview of the Pathway Tools Software and Pathway/Genome Databases.

Similar presentations


Presentation on theme: "Overview of the Pathway Tools Software and Pathway/Genome Databases."— Presentation transcript:

1 Overview of the Pathway Tools Software and Pathway/Genome Databases

2 SRI International Bioinformatics Introductions BRG Staff l Peter Karp l Pallavi Kaipa l Mario Latendresse l Suzanne Paley l Markus Krummenacker l Ingrid Keseler l Ron Caspi l Alex Shearer l Carol Fulcher Attendees l Where from, what genome? l What do you hope to get out of the tutorial?

3 SRI International Bioinformatics SRI International Private nonprofit research institute No permanent funding sources 1300 staff in Menlo Park – Founded in 1946 as Stanford Research Institute – Separated from Stanford University in 1970 – Name changed to SRI International in 1977 – David Sarnoff Research Center acquired in 1987

4 SRI International Bioinformatics SRI Organization Information and Computing Sciences Engineering Systems And Sciences Physical Sciences Biopharmaceuticals And Pharmaceutical Discovery Education and Policy Bioinformatics Research Group

5 SRI International Bioinformatics Research in the SRI Bioinformatics Research Group EcoCyc MetaCyc Pathway Tools Pathway Holes BioWarehouse Enzyme Genomics

6 SRI International Bioinformatics Outline for Tutorial Monday l Introduction l Pathway/Genome Navigator l PathoLogic tutorial and demo l PathoLogic lab session – Make genome input files parsable Tuesday l PathoLogic tutorial l PathoLogic lab session – Build initial version of PGDB Wednesday l Pathway hole filler, operon predictor, transport inference parser Thursday l Editors l Feedback session

7 SRI International Bioinformatics Outline for Tutorial Wednesday l Introduction l Pathway/Genome Navigator l PathoLogic tutorial and demo l PathoLogic lab session -- build initial version of PGDB l PathoLogic lab session and Pathway Tools Schema Thursday l Editing tools –tutorial/lab sessions l Curation strategy Friday l Tutorial writing queries to PGDBs l Lisp, Java, and Perl APIs l How to send us a bug reports, auto-patch l Feedback session

8 SRI International Bioinformatics Tutorial Goals General familiarity with Pathway Tools goals and functionality Ability to create, edit, and navigate a new PGDB Create new PGDB for genome(s) you brought with you Familiarity with information resources available about Pathway Tools to continue your work

9 SRI International Bioinformatics SRI’s Support for Pathway Tools NIH grant finances software development and user support Additional grants finance other software development Email us bug reports, suggestions, questions Comprehensive bug reports are required for us to fix the problem you reported Keep us posted regarding your progress

10 SRI International Bioinformatics Administrative Details Please wear badges at all times Escort required outside this room/hallway Let us know when you are leaving Use E-Bldg Entrance Phone numbers to call from entrance Meals Wednesday outing possible Restrooms

11 SRI International Bioinformatics Tutorial Format Questions welcome during presentations Lab sessions will take different amounts of time for different people l Refine your PGDB l Read Pathway Tools manuals Buddy system for some computers Computer logins Internet connectivity

12 SRI International Bioinformatics Pathway/Genome Database Integrating Genomic and Biochemical Data Chromosomes, Plasmids Genes Proteins Reactions Pathways Compounds CELL Operons, Promoters, DNA Binding Sites

13 SRI International Bioinformatics Terminology Model Organism Database (MOD) – DB describing genome and other information about an organism Pathway/Genome Database (PGDB) – MOD that combines information about l Pathways, reactions, substrates l Enzymes, transporters l Genes, replicons l Transcription factors, promoters, operons, DNA binding sites BioCyc – Collection of 205 PGDBs at BioCyc.org l EcoCyc, AgroCyc, HumanCyc

14 SRI International Bioinformatics BioCyc Collection of Pathway/Genome Databases Pathway/Genome Database (PGDB) – combines information about l Pathways, reactions, substrates l Enzymes, transporters l Genes, replicons l Transcription factors/sites, promoters, operons Tier 1: Literature-Derived PGDBs l MetaCyc l EcoCyc -- Escherichia coli K-12 l BioCyc Open Chemical Database Tier 2: Computationally-derived DBs, Some Curation -- 12 PGDBs l HumanCyc l Mycobacterium tuberculosis Tier 3: Computationally-derived DBs, No Curation -- 191 DBs

15 SRI International Bioinformatics Terminology – Pathway Tools Software PathoLogic l Predicts operons, metabolic network, pathway hole fillers, from genome l Computational creation of new Pathway/Genome Databases Pathway/Genome Editors l Distributed curation of PGDBs l Distributed object database system, interactive editing tools Pathway/Genome Navigator l WWW publishing of PGDBs l Querying, visualization of pathways, chromosomes, operons l Analysis operations u Pathway visualization of gene-expression data u Global comparisons of metabolic networks Bioinformatics 18:S225 2002

16 SRI International Bioinformatics Pathway/Genome DBs Created by External Users 600+ licensees -- 50 groups applying software to 100+ organisms Software freely available to academics; Each PGDB owned by its creator Saccharomyces cerevisiae, SGD project, Stanford University l pathway.yeastgenome.org/biocyc / TAIR, Carnegie Institution of Washington Arabidopsis.org:1555 dictyBase, Northwestern University GrameneDB, Cold Spring Harbor Laboratory Planned: l CGD (Candida albicans), Stanford University l MGD (Mouse), Jackson Laboratory l RGD (Rat), Medical College of Wisconsin l WormBase ( C. elegans ), Caltech Large scale users: l C. Medigue, Genoscope, 67 PGDBs l G. Burger, U Montreal, 20 PGDBs DOE GTL contractors: l G. Church, Harvard, Prochlorococcus marinus MED4 l Larimer/Uberbacher, ORNL, Shewanella onedensis l J. Keasling, UC Berkeley, Desulfovibrio vulgaris Fiona Brinkman, Simon Fraser Univ, Pseudomonas aeruginosa

17 SRI International Bioinformatics Terminology “Database” = “DB” = “Knowledge Base” = “KB” = “Pathway/Genome Database” = “PGDB”

18 SRI International Bioinformatics Why Create PGDBs? Extract more information from your genome Create an up-to-date computable information repository about an organism Perform analyses on the genome and pathway complement of the organism, e.g., analyses of omics data Perform comparative analyses with other organisms Generate a genome poster and metabolic wall chart

19 SRI International Bioinformatics Sequence Project Workflow Raw Sequence Phred Phrap BLAST, BLOCKS GeneMark/Glimmer PathoLogic P/G Navigator P/G Editors WWW PublishingAnalyses Pathway Tools

20 SRI International Bioinformatics MetaCyc: Metabolic Encyclopedia Nonredundant metabolic pathway database Describe a representative sample of every experimentally determined metabolic pathway Literature-based DB with extensive references and commentary Pathways, reactions, enzymes, substrates Jointly developed by SRI and Carnegie Institution Nucleic Acids Research 34:D511-D516 2006

21 SRI International Bioinformatics MetaCyc Curation DB updates by 5 staff curators l Information gathered from biomedical literature l Emphasis on microbial and plant pathways l More prevalent pathways given higher priority Review-level database Four releases per year Quality assurance of data and software: l Evaluate database consistency constraints l Perform element balancing of reactions l Run other checking programs l Display every DB object

22 SRI International Bioinformatics MetaCyc Curation Ontologies guide querying l Pathways (recently revised), compounds, enzymatic reactions l Example: Coenzyme M biosynthesis Extensive citations and commentary Evidence codes l Controlled vocabulary of evidence types l Attach to pathways and enzymes: u Code : Citation : Curator : date Release notes explain recent updates l http://biocyc.org/metacyc/release-notes.shtml http://biocyc.org/metacyc/release-notes.shtml

23 SRI International Bioinformatics MetaCyc Data

24 SRI International Bioinformatics MetaCyc Pathway Variants Pathways that accomplish similar biochemical functions using different biochemical routes l Alanine biosynthesis I – E. coli l Alanine biosynthesis II – H. sapiens Pathways that accomplish similar biochemical functions using similar sets of reactions l Several variants of TCA Cycle

25 SRI International Bioinformatics MetaCyc Super-Pathways Groups of pathways linked by common substrates Example: Super-pathway containing l Chorismate biosynthesis l Tryptophan biosynthesis l Phenylalanine biosynthesis l Tyrosine biosynthesis Super-pathways defined by listing their component pathways Multiple levels of super-pathways can be defined Pathway layout algorithms accommodate super-pathways

26 SRI International Bioinformatics Family of Pathway/Genome Databases MetaCyc EcoCyc CauloCyc AraCyc MtbRvCyc HumanCyc

27 SRI International Bioinformatics Comparison of BioCyc to KEGG KEGG approach: Static collection of pathway diagrams that are color-coded to produce organism-specific views KEGG vs MetaCyc: Resource on literature-derived pathways l KEGG pathways maps are composites of pathways in many organisms -- do not identify what specific pathways elucidated in what organisms l KEGG has no literature citations, no comments, less enzyme detail KEGG vs BioCyc organism-specific PGDBs l KEGG re-annotates entire genome for each organism l KEGG does not curate or customize pathway networks for each organism Software tools l KEGG has no algorithmic visualization tools l KEGG has no queryable metabolic-map overview diagram l KEGG has no interactive editing tools

28 SRI International Bioinformatics Omics Viewer Import gene expression, proteomics, metabolomics data Obtain pathway based visualizations of omics data l Numerical spectrum of expression values mapped to a color spectrum l Steps of overview painted with color corresponding to expression level(s) of genes that encode enzyme(s) for that step

29 SRI International Bioinformatics Environment for Computational Exploration of Genomes Powerful ontology opens many facets of the biology to computational exploration Global characterization of metabolic network Analysis of interface between transport and metabolism Nutrient analysis of metabolic network

30 SRI International Bioinformatics Pathway Tools Implementation Details Allegro Common Lisp Sun, Linux, Windows platforms Ocelot object database 300,000+ lines of code Lisp-based WWW server at BioCyc.org l Manages 205 PGDBs

31 SRI International Bioinformatics Pathway Tools Architecture Object DBMS GFP API Pathway Genome Navigator WWW Server X-Windows Graphics Object Editor Pathway Editor Reaction Editor Oracle

32 SRI International Bioinformatics Ocelot Knowledge Server Architecture Frame data model l Classes, instances, inheritance l Frames have slots that define their properties, attributes, relationships l A slot has one or more values u Datatypes include numbers, strings, etc. Transaction logging facility Slot units define metadata about slots: l Domain, range, inverse l Collection type, number of values, value constraints

33 SRI International Bioinformatics Ocelot Storage System Architecture Persistent storage via disk files, Oracle DBMS l Concurrent development: Oracle l Single-user development: disk files l Read-only delivery: bundle data into binary program Oracle storage l Oracle is submerged within Ocelot, invisible to users l Frames transferred from DBMS to Ocelot u On demand u By background prefetcher u Memory cache u Persistent disk cache to speed performance via Internet

34 SRI International Bioinformatics The Common Lisp Programming Environment Gatt studied Lisp and Java implementation of 16 programs by 14 programmers (Intelligence 11:21 2000)

35 SRI International Bioinformatics Peter Norvig’s Solution “I wrote my version in Lisp. It took me about 2 hours (compared to a range of 2-8.5 hours for the other Lisp programmers in the study, 3-25 for C/C++ and 4-63 for Java) and I ended up with 45 non-comment non-blank lines (compared with a range of 51-182 for Lisp, and 107-614 for the other languages). (That means that some Java programmer was spending 13 lines and 84 minutes to provide the functionality of each line of my Lisp program.)” http://www.norvig.com/java-lisp.html

36 SRI International Bioinformatics Survey Please complete survey at end of each day

37 SRI International Bioinformatics PGDB(s) That You Build Before you leave l Tar up your PGDB directory and FTP it home, email it home, or copy it to flash disk l We will create a backup copy of your PGDB directory if the directory is still there at the end of the tutorial l Delete the PGDB directory if you don’t want us to back it up l We will not give the backed up data to anyone else

38 SRI International Bioinformatics Summary Pathway Tools and Pathway/Genome Databases l Not just for pathways! l Computational inferences u Operons, metabolic pathways, pathway hole fillers l Editing tools l Analysis tools: Omics data on pathways l Web publishing of PGDBs Main classes of users: l Develop PGDB to extract more information from genome for genome paper l Develop a model-organism DB for the organism that is updated regularly and published on the web

39 SRI International Bioinformatics Information Sources Pathway Tools User’s Guide l /root/aic-export/ecocyc/genopath/released/doc/manuals/userguide1.pdf u Pathway/Genome Navigator u Appendix A: Guide to the Pathway Tools Schema l /toot/aic-export/ecocyc/genopath/released/doc/manuals/userguide2.pdf u PathoLogic, Editing Tools l NOTE: Location of the aic-export directory can vary across different computers Pathway Tools Web Site l http://bioinformatics.ai.sri.com/ptools/ http://bioinformatics.ai.sri.com/ptools/ l Publications, programming examples, etc. Slides from this tutorial l http://bioinformatics.ai.sri.com/ptools/tutorial/ http://bioinformatics.ai.sri.com/ptools/tutorial/


Download ppt "Overview of the Pathway Tools Software and Pathway/Genome Databases."

Similar presentations


Ads by Google