Presentation is loading. Please wait.

Presentation is loading. Please wait.

Integration of E. Coli Data (E. coli Pathway and Genomic Data from BioCyc) Jesse Walsh.

Similar presentations


Presentation on theme: "Integration of E. Coli Data (E. coli Pathway and Genomic Data from BioCyc) Jesse Walsh."— Presentation transcript:

1 Integration of E. Coli Data (E. coli Pathway and Genomic Data from BioCyc) Jesse Walsh

2 Outline Description of BioCyc data – Format – Key Classes How I am retrieving and storing the data – SPDB schema – Key tables Recent Developments

3 BioCyc Data Format Frames are made of slots – Slots are made of facets – Slots values can have annotations Slot Frame Facet Annotation Reaction X Common Name EC # Reactants Coefficient Compartment :VALUE-TYPE, :DOCUMENTATION

4 BioCyc Class Hierarchy…. Complicated

5 Key Classes in BioCyc Genes Proteins Polypeptides (a subclass of Proteins) Protein-Complexes (a subclass of Proteins) Pathways Reactions Compounds-And-Elements Enzymatic-Reactions Transcription-Units Promoters http://bioinformatics.ai.sri.com/ptools/classes.html

6 Why not just use BioCyc? Advantages: – Fast access to individual objects – Logic based assertions Disadvantages – Hard to query – Difficult to understand the structures – Difficult to know all of what is in the database – Difficult to integrate other types of data Solution: – Create a relational database

7 SPDB Schema Simple Pathway DataBase

8 Pathway “Central” table Allows organization of major pathways Easy to retrieve a pathway, or all reactions that share a pathway with a specified reaction

9 Reaction Reactions types include: – Catalysis, Spontaneous, Transcription, Translation, Promoter, Transcription Factor Transcription, Translation, Promoter, and TF reactions are all inferred reactions Reactions are the “nodes” of networks in SPBD

10 Entity Entities include: – Compound, Protein (Complex/Monomer), Gene, Transcription Unit, Promoter Entities with multiple types are represented with the most specific type in its hierarchy – (i.e. A protein that is also a complex will be listed as “Complex”, not “Protein” – “Enzyme” status is stored as a participation type

11 Participation in Reactions Entities participate in reactions Information includes km data Unsure if condition data exists, and unsure how to access evidence data

12 Data Links in BioCyc Pathway Reaction Reactants/ProductsEnzymes/Cofactors Genes Transcriptional Unit Promoter Transcription FactorSigma Factor Translation Reaction Transcription Reaction Promoter Relation Activation/Repression Specificity Relation

13 Data Retrieval Strategy Pathway Reaction Reactants/ProductsEnzymes/Cofactors Genes Transcriptional Unit Promoter Transcription FactorSigma Factor Translation Reaction Transcription Reaction Promoter Relation Activation/Repression Specificity Relation 1 2 3

14 Improvements to SPDB Explicitly organize pathway networks and reaction networks Allow recursive tracing of pathway elements

15 Old Organization of Reaction Data Pathway Rxn

16 Better Way Rxn Pathway Explicitly link reactions in the context of individual pathways

17 Recursively Tracing the Data Pathway Reaction Reactants/ProductsEnzymes/Cofactors Genes Transcriptional Unit Promoter Transcription FactorSigma Factor Translation Reaction Transcription Reaction Promoter Relation Activation/Repression Specificity Relation Genes of TFs

18 Coefficient Data for Reactions 6 ATP + 3 L-serine + 3 2,3-dihydroxybenzoate  6 diphosphate + 6 AMP + enterobactin + 9 H +

19 To Do MIAME experimental conditions Explore other data in BioCyc

20 Flow of Data (The Big Picture) Data is imported from BioCyc (EcoCyc + MetaCyc) Changes can be made to BioCyc via Cell Designer, which will then be propagated to SPDB Biomart is one option to directly view data in SPDB BioCyc PGDB SPDB JavaCycConnectionBioCycImporter Lisp Based DB MySQL Object Oriented DB API based on JavaCyc Cell Designer BioMart Researcher

21 Data in BioCyc 13.113.0SPDB Pathways 242 (Excludes Superpathways) 237290 Reactions17841751 10714(1751 not inferred, 4373 ‘orphaned’) Enzymes14151409 Transporters244243-- Gene product summaries 3599-- Genes449644774523 Transcription Units 335633753337 Citations18,46917,842--

22 SPDB Networks

23 BioCyc Updates 2006200720082009 March 13, 2006January 10, 2007April 1, 2008March 9, 2009 May 19, 2006March 16, 2007June 27, 2008June 19, 2009 September 8, 2006May 25, 2007October 15, 2008 August 15, 2007 December 5, 2007 Update history shows from 1 to 5 updates per year (~3 times a year on avg) Will have to manually import check for updates and import new data into our database “Actual curation of the data occurs within BioCyc, and the information is periodically propagated to RegulonDB.”

24 SPDB Schema Simple Pathway DataBase Compound Complex Gene TranscriptionUnit Promoter Monomer Frame Reactant Product Modifier Cofactor Activator Repressor Promoter Catalysis Spontaneous Transcription Translation Promoter Transcription Factor


Download ppt "Integration of E. Coli Data (E. coli Pathway and Genomic Data from BioCyc) Jesse Walsh."

Similar presentations


Ads by Google