Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computational Exploration of Metabolic Networks with Pathway Tools Part 1: Overview & Representations Suzanne Paley Bioinformatics Research Group SRI International.

Similar presentations


Presentation on theme: "Computational Exploration of Metabolic Networks with Pathway Tools Part 1: Overview & Representations Suzanne Paley Bioinformatics Research Group SRI International."— Presentation transcript:

1 Computational Exploration of Metabolic Networks with Pathway Tools Part 1: Overview & Representations Suzanne Paley Bioinformatics Research Group SRI International paley@ai.sri.com http://BioCyc.org/

2 SRI International Bioinformatics Motivation: Theories of Cellular Function Too Large for One Mind to Grasp Example: E. coli metabolic network l 160 pathways involving 744 reactions and 791 substrates Example: E. coli genetic network l Control by 97 transcription factors of 1174 genes in 630 transcription units Past solutions: l Partition theories across multiple minds l Encode theories in natural-language text We cannot compute with theories in those forms l Evaluate theories for consistency with new data: microarrays l Refine theories with respect to new data l Compare theories describing different organisms

3 SRI International Bioinformatics Solution: Biological Knowledge Bases Store biological knowledge and theories in computers in a declarative form l Amenable to computational analysis and generative user interfaces Establish ongoing efforts to curate (maintain, refine, embellish) these knowledge bases A high quality comprehensive knowledge base enables us to ask and answer important new questions

4 SRI International Bioinformatics Terminology Model Organism Database (MOD) – DB describing genome and other information about an organism Pathway/Genome Database (PGDB) – MOD that combines information about l Pathways, reactions, substrates l Enzymes, transporters l Genes, replicons l Transcription factors, promoters, operons, DNA binding sites BioCyc – Collection of 15 PGDBs at BioCyc.org l EcoCyc, AgroCyc, HumanCyc

5 SRI International Bioinformatics Pathway Tools Software PathoLogic l Prediction of metabolic network from genome l Computational creation of new Pathway/Genome Databases Pathway/Genome Editors l Distributed curation of genome annotations l Distributed object database system l Interactive editing tools Pathway/Genome Navigator l WWW publishing of PGDBs l Graphic depictions of pathways, chromosomes, operons l Analysis operations u Pathway visualization of gene-expression data u Global comparisons of metabolic networks

6 SRI International Bioinformatics Pathway Tools Software Pathway/ Genome Databases Pathway/Genome Navigator PathoLogic Pathway Predictor Pathway/ Genome Editors

7 SRI International Bioinformatics Pathway/Genome Database Chromosomes, Plasmids Genes Proteins Reactions Pathways Compounds CELL Operons, Promoters, DNA Binding Sites

8 SRI International Bioinformatics Pathway Tools Algorithms Visualization and editing tools for following datatypes Full Metabolic Map l Paint gene expression data on metabolic network; compare metabolic networks Pathways l Pathway prediction Reactions l Balance checker Compounds l Chemical substructure comparison Enzymes, Transporters, Transcription Factors Genes Chromosomes Operons l Operon prediction; visualize genetic network

9 SRI International Bioinformatics Definitions Chemical reactions interconvert chemical compounds An enzyme is a protein that accelerates chemical reactions A pathway is a linked set of reactions l Often regulated as a unit l A conceptual unit of cell’s biochemical machine A + B C + D A C E

10 SRI International Bioinformatics

11 SRI International Bioinformatics

12 SRI International Bioinformatics

13 SRI International Bioinformatics

14 SRI International Bioinformatics

15 SRI International Bioinformatics

16 SRI International Bioinformatics

17 SRI International Bioinformatics

18 SRI International Bioinformatics

19 SRI International Bioinformatics

20 SRI International Bioinformatics Operations of the Metabolic Overview Find pathways, compounds Find reactions l By enzyme name, EC number, substrates, modulation l All with isozymes l All occurring in multiple pathways l By EC class, pathway class Find genes l By name, gene class l All regulated by transcriptional regulator protein

21 SRI International Bioinformatics Metabolic Overview Queries Species comparison l Highlight reactions that are u Shared/not-shared with u Any-one/All-of u A specified set of species Overlay expression data l Colors reflects expression level and are user-configurable l Can show single experiment or animated time series

22 SRI International Bioinformatics EcoCyc Project E. coli Encyclopedia l Model-Organism Database for E. coli l Began in 1992 as collaboration between Karp and Riley l Over 3500 literature citations Collaborative development via Internet l Karp (SRI) -- Bioinformatics architect l John Ingraham -- Advisor l (SRI) Metabolic pathways l Saier (UCSD) and Paulsen (TIGR)-- Transport l Collado (UNAM)-- Regulation of gene expression Ontology: 1000 biological classes Database content: 17,700 instances

23 SRI International Bioinformatics EcoCyc = E.coli Dataset + Pathway/Genome Navigator Genes: 4,393 Proteins: 4,273 Reactions: 2,760 Pathways: 165 Compounds: 774 http://BioCyc.org/ Transcription Units: 724 Factors: 110 Enzymes: 914 Transporters: 162 Promoters: 812 TransFac Sites: 956 Citations: 3,508

24 SRI International Bioinformatics MetaCyc: Metabolic Encyclopedia Nonredundant metabolic pathway database Describe a representative sample of every experimentally determined metabolic pathway Literature-based DB with extensive references and commentary Pathways, reactions, enzymes, substrates 460 pathways, 1267 enzymes, 4294 reactions l 172 E. coli pathways, 2735 citations Nucleic Acids Research 30:59-61 2002. Jointly developed by SRI and Carnegie Institution l New focus on plant pathways

25 SRI International Bioinformatics MetaCyc Data MetaCyc contains one DB object for each distinct pathway l Distinct in terms of reaction steps l Each pathway labeled with species it occurs in MetaCyc pathways are experimentally determined 4218 reactions in MetaCyc l 401 lack EC numbers

26 SRI International Bioinformatics MetaCyc Enzyme Data Reaction(s) catalyzed Alternative substrates Cofactors / prosthetic groups Activators and inhibitors Subunit structure Molecular weight, pI Comment, literature citations Species

27 SRI International Bioinformatics MetaCyc Frequent Organisms Escherichia coli156 Arabidopsis thaliana47 Homo sapiens30 Pseudomonas21 Bacillus subtilis20 Salmonella typhimurium20 Sulfolobus solfataricus18 Pseudomonas putida14 Saccharomyces cerevisiae14 Haemophilus influenzae13 Glycine max11 Deinococcus radiourans10

28 SRI International Bioinformatics EcoCyc and MetaCyc Review level databases Data derived primarily from biomedical literature l Manual entry by staff curators l Updates by staff curators only Data validation l Consistency constraints l Lisp programs that verify other semantic relationships u Unbalanced chemical reactions

29 SRI International Bioinformatics Computationally-Derived PGDBs Pathway/Genome Database Annotated Genomic Sequence Genes/ORFs Gene Products DNA Sequences Reactions Pathways Compounds Multi-organism Pathway Database (MetaCyc) PathoLogic Software Integrates genome and pathway data to identify putative metabolic networks Genomic Map Genes Gene Products Reactions Pathways Compounds

30 SRI International Bioinformatics PathoLogic Input/Output Inputs: l File listing genetic elements u http://bioinformatics.ai.sri.com/ptools/genetic-elements.dat l Files containing DNA sequence for each genetic element l Files containing annotation for each genetic element l MetaCyc database Output: l Pathway/genome database for the subject organism l Directory tree for the subject organism l Reports that summarize: u Evidence contained in the input genome for the presence of reference pathways u Reactions missing from inferred pathways

31 SRI International Bioinformatics PathoLogic Functionality Initialize schema for new PGDB Transform existing genome to PGDB form Infer metabolic pathways and store in PGDB Infer operons and store in PGDB Assist user with manual tasks l Assign enzymes to reactions they catalyze l Identify false-positive pathway predictions l Build protein complexes from monomers l Assemble Overview diagram

32 SRI International Bioinformatics BioCyc Collection of Pathway/Genome DBs Literature-based Datasets: Escherichia coli (EcoCyc) MetaCyc PGDBs at other sites: Arabidopsis thaliana (TAIR) Methanococcus jannaschii (EBI) Saccharomyces cerevisiae (SGD) Synechocystis PCC6803 Computationally-derived datasets: Agrobacterium tumefaciens Caulobacter crescentus Chlamydia trachomatis Bacillus subtilis Helicobacter pylori Haemophilus influenzae Homo sapiens Mycobacterium tuberculosis RvH37 Mycobacterium tuberculosis CDC1551 Mycoplasma pneumonia Pseudomonas aeruginosa Treponema pallidum Vibrio cholerae Yellow = Open Database http://BioCyc.org/

33 SRI International Bioinformatics HumanCyc: Human Metabolic Pathway Database PGDB of human metabolic pathways built using PathoLogic Contains information on 28,700 genes, their products, and the metabolic reactions and pathways they catalyze (no signalling pathways) Chromosome and contigs from Ensembl Human genetic loci from LocusLink Mitochondrion data from GenBank Ensembl and LocusLink gene entries were merged to eliminate redundancies where possible. Contains links to human genome web sites Plan to hire one curator to refine and curate with respect to literature over a 2 year period l Remove false-positive predictions l Insert known pathways missed by PathoLogic l Add comments and citations from pathways and enzymes to the literature l Add enzyme activators, inhibitors, cofactors, tissue information Funded by commercial consortium

34 SRI International Bioinformatics BioCyc and Pathway Tools Availability WWW BioCyc freely available to all l BioCyc.org l Six BioCyc DBs openly available to all BioCyc DBs freely available to non-profits l Flatfiles downloadable from BioCyc.org l Binary executable: u Sun UltraSparc-170 w/ 64MB memory u PC, 400MHz CPU, 64MB memory, Windows-98 or newer l PerlCyc API Pathway Tools freely available to non-profits

35 SRI International Bioinformatics Information Sources Pathway Tools User’s Guide l aic-export/ecocyc/genopath/released/doc/userguide1.pdf u Pathway/Genome Navigator u Appendix A: Guide to the Pathway Tools Schema l aic-export/ecocyc/genopath/released/doc/userguide2.pdf u PathoLogic, Editing Tools Pathway Tools Web Site l http://bioinformatics.ai.sri.com/ptools/ http://bioinformatics.ai.sri.com/ptools/ l Publications, programming examples, etc. Pathway Tools Tutorial l http://bioinformatics.ai.sri.com/ptools/tutorial/ http://bioinformatics.ai.sri.com/ptools/tutorial/

36 SRI International Bioinformatics Pathway Tools Implementation Details Allegro Common Lisp Sun and PC platforms Ocelot object database 250,000 lines of code Lisp-based WWW server at BioCyc.org l Manages 15 PGDBs

37 SRI International Bioinformatics Frame Data Model Frame Data Model -- organizational structure for a PGDB Knowledge base (KB, Database, DB) Frames Slots

38 SRI International Bioinformatics Knowledge Base Collection of frames and their associated slots, values, facets, and annotations AKA: Database, PGDB Can be stored within l An Oracle DB l A disk file l A Pathway Tools binary program

39 SRI International Bioinformatics Frames Entities with which facts are associated Kinds of frames: l Classes: Genes, Pathways, Biosynthetic Pathways l Instances (objects): trpA, TCA cycle Classes: l Superclass(es) l Subclass(es) l Instance(s) A symbolic frame name (id, key) uniquely identifies each frame

40 SRI International Bioinformatics Slots Encode attributes/properties of a frame l Integer, real number, string Represent relationships between frames l The value of a slot is the identifier of another frame Every slot is described by a “slot frame” in a KB that defines meta information about that slot

41 SRI International Bioinformatics Properties of Slots Number of values l Single valued l Multivalued: sets, bags Slot values l Any LISP object: Integer, real, string, symbol (frame name) Slotunits define properties of slots: datatypes, classes, constraints Two slots are inverses if they encode opposite relationships l Slot Product in class Genes l Slot Gene in class Polypeptides

42 SRI International Bioinformatics Pathway Tools Ontology 1064 classes l Main classes such as: u Pathways, Reactions, Compounds, Macromolecules, Proteins, Replicons, DNA-Segments (Genes, Operons, Promoters) l Taxonomies for Pathways, Reactions, Compounds 205 slots l Meta-data: Creator, Creation-Date l Comment, Citations, Common-Name, Synonyms l Attributes: Molecular-Weight, DNA-Footprint-Size l Relationships: Catalyzes, Component-Of, Product Classes, instances, slots all stored side by side in DBMS, share a single namespace

43 SRI International Bioinformatics Slot Links from Gene to Pathway Frame Sdh-flavoSdh-Fe-SSdh-membrane-1Sdh-membrane-2 sdhAsdhB sdhCsdhD succinate + FAD = fumarate + FADH 2 Enzymatic-reaction Succinate dehydrogenase TCA Cycle product component-of catalyzes reaction in-pathway Chrom succinate FAD fumarate FADH 2 left right

44 SRI International Bioinformatics Enzymatic-reaction frame stores properties of pairing between enzyme and reaction Sdh-flavoSdh-Fe-SSdh-membrane-1Sdh-membrane-2 sdhAsdhB sdhCsdhD Succinate + FAD = fumarate + FADH 2 Enzymatic-reaction Succinate dehydrogenase TCA Cycle EC# K eq Cofactors Inhibitors Molecular wt pI Left-end-position

45 SRI International Bioinformatics Monofunctional Monomer Gene Reaction Enzymatic-reaction Monomer Pathway

46 SRI International Bioinformatics Bifunctional Monomer Gene Reaction Enzymatic-reaction Monomer Pathway Reaction Enzymatic-reaction

47 SRI International Bioinformatics Monofunctional Multimer Monomer Gene Reaction Enzymatic-reaction Multimer Pathway

48 SRI International Bioinformatics Pathway and Substrates Reactant-1 Reaction Pathway Reaction Reactant-2 Product-2 Product-1 in-pathway left right

49 SRI International Bioinformatics Genetic Network Representation Describe biological entities involved in control of transcription initiation l Promoters, operators, transcription factors, operons, terminators Describe molecular interactions among these entities l Modulation of transcription factor activity l Binding of transcription factors to DNA binding sites l Effects on transcription initiation

50 SRI International Bioinformatics Ontology for Transcriptional Regulation One DB object defined for each biological entity and for each molecular interaction site001 pro001 trpE trpD trpC trpB trpA trpL Int002RpoSig70 TrpR*trpInt001 trpLEDCBA trp apoTrpR Complexation reaction Int001 (binding of TrpR*trp to site001) inhibits Int002 (binding of RNA Polymerase to promoter) and consequently prevents transcription of genes in transcription unit.

51 SRI International Bioinformatics Principle Classes Class names are capitalized, plural Genetic-Elements, with subclasses: l Chromosomes l Plasmids Genes Transcription-Units RNAs Proteins, with subclasses: l Polypeptides l Protein-Complexes

52 SRI International Bioinformatics Principle Classes Reactions, with subclasses: l Transport-Reactions Enzymatic-Reactions Pathways Compounds-And-Elements

53 SRI International Bioinformatics Slots in Multiple Classes Common-Name Synonyms Names (computed as union of Common-Name, Synonyms) Comment Citations DB-Links

54 SRI International Bioinformatics Genes Slots Chromosome Left-End-Position Right-End-Position Centisome-Position Transcription-Direction Product

55 SRI International Bioinformatics Proteins Slots Molecular-Weight-Seq Molecular-Weight-Exp pI Locations Modified-Form Unmodified-Form Component-Of

56 SRI International Bioinformatics Polypeptides Slots Gene

57 SRI International Bioinformatics Protein-Complexes Slots Components

58 SRI International Bioinformatics Reactions Slots EC-Number Left, Right Substrates (computed as union of Left, Right) Enzymatic-Reaction DeltaG0 Spontaneous?

59 SRI International Bioinformatics Enzymatic-Reactions Slots Enzyme Reaction Activators Inhibitors Physiologically-Relevant Cofactors Prosthetic-Groups Alternative-Substrates Alternative-Cofactors Reaction-direction

60 SRI International Bioinformatics Pathways Slots Reaction-List Predecessors Primaries


Download ppt "Computational Exploration of Metabolic Networks with Pathway Tools Part 1: Overview & Representations Suzanne Paley Bioinformatics Research Group SRI International."

Similar presentations


Ads by Google