Presentation is loading. Please wait.

Presentation is loading. Please wait.

Overview of the Pathway Tools Software and Pathway/Genome Databases.

Similar presentations

Presentation on theme: "Overview of the Pathway Tools Software and Pathway/Genome Databases."— Presentation transcript:

1 Overview of the Pathway Tools Software and Pathway/Genome Databases

2 SRI International Bioinformatics Introductions BRG Staff l Peter Karp l Tomer Altman l Joe Dale l Fred Gilham l John Myers l Suzanne Paley l Markus Krummenacker l Ingrid Keseler l Ron Caspi l Alex Shearer l Carol Fulcher Attendees l Where from, what genome? l What do you hope to get out of the tutorial?

3 SRI International Bioinformatics SRI International Private nonprofit research institute No permanent funding sources 1300 staff in Menlo Park – Founded in 1946 as Stanford Research Institute – Separated from Stanford University in 1970 – Name changed to SRI International in 1977

4 SRI International Bioinformatics SRI Organization Information and Computing Sciences Engineering Systems And Sciences Physical Sciences Biopharmaceuticals And Pharmaceutical Discovery Education and Policy Bioinformatics Research Group

5 SRI International Bioinformatics Research in the SRI Bioinformatics Research Group BioCyc Database Collection l EcoCyc l MetaCyc Pathway Tools BioWarehouse

6 SRI International Bioinformatics Outline for Tutorial Monday l Introduction l Pathway/Genome Navigator l Introduction to Pathway/Genome Editors Tuesday l PathoLogic tutorial l PathoLogic lab session – Build initial version of PGDB l Pathway hole filler lecture+lab Wednesday l PathoLogic: Creating protein complexes, operon predictor, transport inference parser l Pathway Tools Schema l Model organism database projects Thursday l Advanced Pathway/Genome Editors Friday l Overviews and Omics Viewers l Comparative analysis l Structured Advanced Query Form l Metabolite Tracing l Regulation

7 SRI International Bioinformatics Outline for Tutorial Monday l Introduction l Pathway/Genome Navigator l Metabolite Tracing l Omics Viewers Tuesday l PathoLogic tutorial l PathoLogic lab session – Build initial version of PGDB l Pathway hole filler (run overnight) Wednesday l PathoLogic: Creating protein complexes, operon predictor, transport inference parser l Pathway Tools Schema l Model organism database projects Thursday l Editors l Feedback session Friday l Writing programs to access and modify PGDBs

8 SRI International Bioinformatics Tutorial Goals General familiarity with Pathway Tools goals and functionality Ability to create, edit, and navigate a new PGDB Create new PGDB for genome(s) you brought with you Familiarity with information resources available about Pathway Tools to continue your work

9 SRI International Bioinformatics SRIs Support for Pathway Tools NIH grant finances software development and user support Additional grants finance other software development us bug reports, suggestions, questions Comprehensive bug reports are required for us to fix the problem you reported Keep us posted regarding your progress

10 SRI International Bioinformatics Administrative Details Please wear badge at all times Escort required outside this room/hallway Let us know when you are leaving Use E-Bldg Entrance Phone numbers to call from entrance Meals Restrooms

11 SRI International Bioinformatics Tutorial Format Questions welcome during presentations Lab sessions will take different amounts of time for different people l Refine your PGDB l Read Pathway Tools manuals Computer logins Internet connectivity

12 SRI International Bioinformatics Pathway/Genome Database Chromosomes Plasmids Genes Proteins RNAs Reactions Pathways Compounds CELL Operons Promoters DNA Binding Sites Regulatory Interactions Sequence Features

13 SRI International Bioinformatics BioCyc Collection of Pathway/Genome Databases Pathway/Genome Database (PGDB) – combines information about l Pathways, reactions, substrates l Enzymes, transporters l Genes, replicons l Transcription factors/sites, promoters, operons Tier 1: Literature-Derived PGDBs l MetaCyc l EcoCyc -- Escherichia coli K-12 Tier 2: Computationally-derived DBs, Some Curation PGDBs l HumanCyc l Mycobacterium tuberculosis Tier 3: Computationally-derived DBs, No Curation DBs

14 SRI International Bioinformatics Terminology – Pathway Tools Software PathoLogic l Predicts operons, metabolic network, pathway hole fillers, from genome l Computational creation of new Pathway/Genome Databases Pathway/Genome Editors l Distributed curation of PGDBs l Distributed object database system, interactive editing tools Pathway/Genome Navigator l WWW publishing of PGDBs l Querying, visualization of pathways, chromosomes, operons l Analysis operations u Pathway visualization of gene-expression data u Global comparisons of metabolic networks Bioinformatics 18:S

15 SRI International Bioinformatics Pathway Tools Software: PGDBs Created Outside SRI licensees: 75+ groups applying software to 150+ organisms Saccharomyces cerevisiae, SGD project, Stanford University l / Mouse, MGD, Jackson Laboratory dictyBase, Northwestern University Under development: l CGD ( Candida albicans ), Stanford University l Drosophila, P. Ebert in collaboration with FlyBase l C. elegans, P. Ebert in collaboration with WormBase Planned: l RGD (Rat), Medical College of Wisconsin Arabidopsis thaliana, TAIR, Carnegie Institution of Washington Tomato and Potato, Cornell University GrameneDB, Cold Spring Harbor Laboratory Medicago truncatula, Samuel Roberts Noble Foundation

16 SRI International Bioinformatics Pathway Tools Software: PGDBs Created Outside SRI NIAID BRCs: BioHealthBase (M. tuberculosis, F. tuleremia), PATRIC, ApiDB (Cryptosporidium) F. Brinkman, Simon Fraser Univ, Pseudomonas aeruginosa V. Schachter, Genoscope, Acinetobacter M. Bibb, John Innes Centre, Streptomyces coelicolor G. Church, Harvard, Prochlorococcus marinus, multiple strains E. Uberbacher, ORNL and G. Serres, MBL, Shewanella onedensis R.J.S. Baerends, University of Groningen, Lactococcus lactis IL1403, Lactococcus lactis MG1363, Streptococcus pneumoniae TIGR4, Bacillus subtilis 168, Bacillus cereus ATCC14579 Matthew Berriman, Sanger Centre, Trypanosoma brucei, Leishmania major Herbert Chiang, Washington University, Bacteroides thetaiotaomicron Sergio Encarnacion, UNAM, Sinorhizobium meliloti Gregory Fournier, MIT, Mesoplasma florum Mark van der Giezen, University of London, Entamoeba histolytica, Giardia intestinalis Michael Gottfert, Technische Universitat Dresden, Bradyrhizobium japonicum Artiva Maria Goudel, Universidade Federal de Santa Catarina, Brazil, Chromobacterium violaceum ATCC Kenneth J. Kauffman, University of California, Riverside, Desulfovibrio vulgaris

17 SRI International Bioinformatics Pathway Tools Software: PGDBs Created Outside SRI Mike McLeod, University of British Columbia, Rhodococcus sp. RHA1 Robert S. Munson, Children's Research Institute, Ohio, Haemophilus ducreyi, Haemophilus influenzae NP John Nash, Canadian NRC, Campylobacter jejuni Christopher S. Reigstad, Washington University, Escherichia coli UTI89 Haluk Resat, Pacific Northwest Lab, Rhodobacter sphearoides Gary Xie, Los Alamos Lab, Bacillus cereus Large scale users: l C. Medigue, Genoscope, 107 PGDBs l G. Burger, U Montreal, 48 PGDBs l Bart Weimer, Utah State University, Lactococcus lactis, Brevibacterium linens, Lactobacillus acidophilus, Lactobacillus plantarum, Lactobacillus johnsonii, Listeria monocytogenes Partial listing of outside PGDBs at

18 SRI International Bioinformatics Terminology Database = DB = Knowledge Base = KB = Pathway/Genome Database = PGDB

19 SRI International Bioinformatics Why Create PGDBs? Extract more information from your genome Create an up-to-date computable information repository about an organism Perform analyses on the genome and pathway complement of the organism l Analyses of omics data l Analyses of cellular systems (dead-end metabolites) l Reports generated by Pathway Tools Perform comparative analyses with other organisms Generate a genome poster and metabolic wall chart

20 SRI International Bioinformatics Sequence Project Workflow Raw Sequence Phred Phrap BLAST, BLOCKS GeneMark/Glimmer PathoLogic P/G Navigator P/G Editors WWW PublishingAnalyses Pathway Tools

21 SRI International Bioinformatics EcoCyc = E.coli Dataset + Pathway/Genome Navigator Genes: 4,516 Proteins: 4,316 RNAs: 277 Reactions: 4,956 Metabolic: 993 Transport: 235 Pathways: 205 Compounds: 1,187 URL: Gene Regulation: Operons: 3133 Trans Factors: 172 Promoters: 1649 TF Binding Sites: 1770 Citations: 15,880

22 SRI International Bioinformatics EcoCyc Project – E. coli Encyclopedia l Review-level Model-Organism Database for E. coli l Tracks evolving annotation of the E. coli genome and cellular networks l The two paradigms of EcoCyc Multi-dimensional annotation of the E. coli K-12 genome l Positions of genes; functions of gene products – 76% / 66% exp l Gene Ontology terms; MultiFun terms l Gene product summaries and literature citations l Evidence codes l Multimeric complexes l Metabolic pathways l Regulation of transcription initiation Nuc. Acids Res. 35: ASM News 70: Science 293:2040 Karp, Gunsalus, Collado-Vides, Paulsen

23 SRI International Bioinformatics Paradigm 1: EcoCyc as Textual Review Article All gene products for which experimental literature exists are curated with a minireview summary l Found on protein and RNA pages, not gene pages! l 3257 gene products contain summaries Summaries cover function, interactions, mutant phenotypes, crystal structures, regulation, and more Additional summaries found in pages for operons, pathways EcoCyc cites 15,880 publications

24 SRI International Bioinformatics Summaries in Gene Products

25 SRI International Bioinformatics Paradigm 2: EcoCyc as Computational Symbolic Theory Highly structured, high-fidelity knowledge representation provides computable information Each molecular species defined as a DB object l Genes, proteins, small molecules Each molecular interaction defined as a DB object l Metabolic reactions l Transport reactions l Transcriptional regulation of gene expression 220 database fields capture extensive properties and relationships

26 SRI International Bioinformatics EcoCyc Procedures DB updates performed by 5 staff curators l Information gathered from biomedical literature u Enter data into structured database fields u Author extensive summaries u Update evidence codes l Corrections submitted by E. coli researchers Four releases per year Quality assurance of data and software l Evaluate database consistency constraints l Perform element balancing of reactions l Run other checking programs

27 SRI International Bioinformatics MetaCyc : Metabolic Encyclopedia Describe a representative sample of every experimentally determined metabolic pathway Describe properties of metabolic enzymes Literature-based DB with extensive references and commentary Pathways, reactions, enzymes, substrates Jointly developed by l P. Karp, R. Caspi, C. Fulcher, SRI International l L. Mueller, A. Pujar, Cornell Univ l S. Rhee, P. Zhang, Carnegie Institution Nucleic Acids Research 2008

28 SRI International Bioinformatics MetaCyc Data -- Version 11.6 Pathways 1010 Reactions 6,576 Enzymes 4,582 Small Molecules 6,561 Organisms1,077 Citations 15,875

29 SRI International Bioinformatics Taxonomic Distribution of MetaCyc Pathways Bacteria517 Green Plants 372 Mammals 90 Fungi 89 Archaea65

30 SRI International Bioinformatics Family of Pathway/Genome Databases MetaCyc EcoCyc CauloCyc AraCyc MtbRvCyc HumanCyc

31 SRI International Bioinformatics Comparison of BioCyc to KEGG: The Data KEGG approach: Static collection of pathway diagrams that are color-coded to produce organism-specific views KEGG vs MetaCyc: Resource on literature-derived pathways l KEGG pathway maps are composites of pathways in many organisms -- do not identify what specific pathways elucidated in what organisms l KEGG pathway maps encompass multiple biological pathways; are 2-4 times the size of MetaCyc pathways l KEGG has no literature citations, no summaries, less enzyme detail KEGG vs BioCyc organism-specific PGDBs l KEGG re-annotates entire genome for each organism l KEGG does not curate or customize pathway networks for each organism

32 SRI International Bioinformatics Comparison of Pathway Tools to KEGG: The Software KEGG has no pathway hole filler or transport inference parser or operon predictor KEGG has no interactive editing tools – you cannot refine a KEGG pathway DB KEGG has no algorithmic visualization tools – pathway diagrams are pre-drawn l May become out of date l Cannot show pathways at multiple detail levels KEGG genome browser has very limited functionality KEGG has one overview diagram with limited functionality KEGG has no metabolite tracing tool KEGG has no Structured Advanced Query Tool

33 SRI International Bioinformatics Overviews and Omics Viewers Genome-scale Visualizations l Metabolic map l Transcriptional regulatory network l Genome map Overlay gene expression, proteomics, metabolomics data Obtain pathway based visualizations of omics data l Numerical spectrum of expression values mapped to a color spectrum l Steps of overview painted with color corresponding to expression level(s) of genes that encode enzyme(s) for that step

34 SRI International Bioinformatics Environment for Computational Exploration of Genomes Powerful ontology opens many facets of the biology to computational exploration Global characterization of metabolic network Analysis of interface between transport and metabolism Nutrient analysis of metabolic network

35 SRI International Bioinformatics Pathway Tools Implementation Details Allegro Common Lisp Sun, Linux, Windows, Macintosh platforms Ocelot object database 370,000+ lines of code Lisp-based WWW server at l Manages 370+ PGDBs

36 SRI International Bioinformatics The Common Lisp Programming Environment Gatt studied Lisp and Java implementation of 16 programs by 14 programmers (Intelligence 11: )

37 SRI International Bioinformatics Peter Norvigs Solution I wrote my version in Lisp. It took me about 2 hours (compared to a range of hours for the other Lisp programmers in the study, 3-25 for C/C++ and 4-63 for Java) and I ended up with 45 non-comment non-blank lines (compared with a range of for Lisp, and for the other languages). (That means that some Java programmer was spending 13 lines and 84 minutes to provide the functionality of each line of my Lisp program.)

38 SRI International Bioinformatics Survey Please complete survey at end of each day

39 SRI International Bioinformatics PGDB(s) That You Build Before you leave l Tar up your PGDB directory and FTP it home, it home, or copy it to flash disk l We will create a backup copy of your PGDB directory if the directory is still there at the end of the tutorial l Delete the PGDB directory if you dont want us to back it up l We will not give the backed up data to anyone else

40 SRI International Bioinformatics Information Sources Pathway Tools Users Guide l /root/aic-export/pathway-tools/ptools/11.5/doc/manuals/userguide.pdf l NOTE: Location of the aic-export directory can vary across different computers Pathway Tools Web Site l Publications, FAQ, programming examples, etc. l BioCyc Publications Page l MetaCyc Guide l Slides from this tutorial l BioCyc Webinars l

41 SRI International Bioinformatics Reporting Pathway Tools Problems Tell us: l What platform you are running on l What version of Pathway Tools you are running l The error message l Result of [1] EC(2) :zoom :count :all l What operation were you performing when the error occurred? New patches automatically downloaded and loaded with PTools starts up Auto-Patch l Tools -> Instant Patch -> Download and Activate All Patches

42 SRI International Bioinformatics Summary Pathway Tools and Pathway/Genome Databases l Not just for pathways! l Computational inferences u Operons, metabolic pathways, pathway hole fillers l Editing tools l Analysis tools: Omics data on pathways l Web publishing of PGDBs Main classes of users: l Develop PGDB to extract more information from genome for genome paper l Develop a model-organism DB for the organism that is updated regularly and published on the web

Download ppt "Overview of the Pathway Tools Software and Pathway/Genome Databases."

Similar presentations

Ads by Google