SRI International Bioinformatics 4 Reactions Represents information about a reaction that is independent of enzymes that catalyze the reaction Connected to enzyme(s) via enzymatic reaction frames Classified with EC system when possible Example: 126.96.36.199 – DNA-directed DNA polymerization l Carried out by five enzymes in E. coli
SRI International Bioinformatics 5 Enzymatic Reactions (DnaE and 188.8.131.52) A necessary bridge between enzymes and “generic” versions of reactions Carries information specific to an enzyme/reaction combination: l Cofactors and prosthetic groups l Alternative substrates l Links to regulatory interactions l Kinetic data (Km, Vmax, etc.) Frame is generated when protein is associated with reaction (via protein or reaction editor)
SRI International Bioinformatics 6 Reaction Ontology
SRI International Bioinformatics 7 Where is 184.108.40.206 in the Ontology?
SRI International Bioinformatics 8 Reaction Direction Left/Right reflect direction of reaction as written by Enzyme Commission l Reflects systematic direction for different reaction classes Left/Right do not necessarily correspond to physiological direction of a reaction Get-rxn-direction(rxn) l Returns :L2R or :R2L or :BOTH or NIL l Integrates all available info about direction of this reaction u Direction explicitly curated for reaction u Direction(s) it occurs in all pathways in the PGDB u Direction(s) as specified in Enzymatic-Reactions
SRI International Bioinformatics 9 Slots of Reaction Frames Balance-state EC-number Enzymatic-reaction l Generated in protein or reaction editor In-pathway l Generated in pathway editor Left and Right Reaction-Direction l :left-to-right, :right-to-left, :reversible Spontaneous?
SRI International Bioinformatics 10 Slots of Enzymatic-Reaction Frames Enzyme Reaction Regulated-By Cofactors Kinetic slots: l Vmax, Km, Kcat, Specific-Activity, Temperature-Opt, pH-Opt Reaction-Direction l Reaction may be reversible, but this enzyme effectively only catalyzes it in one direction.
SRI International Bioinformatics 12 Reaction relationships
SRI International Bioinformatics 13 Semantic Inference Layer Genes-of-reaction (rxn) Substrates-of-reaction (rxn) Enzymes-of-reaction (rxn) Lacking-ec-number (organism) l Returns list of rxns with no ec numbers in that database Get-reaction-direction-in-pathway (pwy rxn) Reaction-type(rxn) u Indicates types of Rxn as: Small molecule rxn, transport rxn, protein-small-molecule rxn (one substrate is protein and one is a small molecule), protein rxn (all substrates are proteins), etc. All-rxns(type) l Specify the type of reaction (see above for type) Obtain-rxn-stats l Returns six values u Length of : all-rxns, transport, non-transport, etc…
SRI International Bioinformatics 14 Exercises Find all small-molecule reactions that have no enzyme but are not spontaneous (“orphan” reactions) Find all reactions that consume a given compound
SRI International Bioinformatics 15 Solutions to Exercises Find all small-molecule reactions that have no enzyme but are not spontaneous (“orphan” reactions): l (loop for r in (all-rxns :small-molecule) when (and (not (slot-has-value-p r 'enzymatic-reaction) (not (get-slot-value r 'spontaneous?))) collect r)
SRI International Bioinformatics 16 Solutions to Exercises, cont. Find all reactions that consume a given compound: (defun rxns-consuming-cpd (cpd) (append (loop for r in (get-slot-values cpd ‘appears-in-left-side-of) for dir = (get-rxn-direction r) if (or (eq dir :l2r) (eq dir :both)) collect r) (loop for r in (get-slot-values cpd ‘appears-in-right-side-of) for dir = (get-rxn-direction r) if (or (eq dir :r2l) (eq dir :both)) collect r)))
SRI International Bioinformatics 18 RNAs PGDBs only represent RNAs that are “terminal gene products” l tRNAs l rRNAs l Regulatory RNAs l Miscellaneous small RNAs Slots similar to proteins tRNAs can have an anticodon
SRI International Bioinformatics 20 The RNA Ontology
SRI International Bioinformatics 21 Pathway Tools Schema and Semantic Inference Layer: Pathways
SRI International Bioinformatics 22 What is a Pathway? An ordered set of interconnected, directed biochemical reactions Reactions form a coherent unit, e.g. l Regulated as a single unit l Evolutionarily conserved across organisms as a single unit l When combined, perform a single cellular function l Historically grouped together as a unit Includes metabolic pathways and signaling pathways Evidence for all reactions in a single organism Pathways can be linear, cyclical, branched, or some combination
SRI International Bioinformatics 23 Internal Representation of Metabolic Pathways REACTION-LIST: unordered list of reactions that comprise the pathway PREDECESSORS: list of reaction pairs that define ordering relationships between reactions. E.g. R1 R2 C A B R3 D (R2 R1) : Predecessor of R2 is R1 (R3 R1) : Predecessor of R3 is R1 (R1) : R1 has no predecessor (can be omitted)
SRI International Bioinformatics 24 What is missing from Pathway Representation? Reaction directions l Some reactions are unidirectional, but many are reversible – how do we know in which direction to draw the reaction? Main vs. side substrates A B C D E F l Main compounds form the backbone of the pathway u substrates shared between connecting reactions u major inputs and outputs. l Side compounds omitted from pathway diagrams at low detail levels l Individual reactions do not necessarily have main and side compounds – a particular substrate may be either a main or a side depending on the pathway context.
SRI International Bioinformatics 25 Computing Directionality and Mains/Sides Our philosophy: Enable curator to specify as little as possible. Compute as much as possible. This reduces redundancy and potential for inconsistencies. Example: Reactions R1: A + B C + D R2: B E Predecessors: (R2 R1) Only substrate overlap is B B must be a main substrate A must be a side substrate, R1 must proceed from right to left R2 must proceed from left to right C + D B E A
SRI International Bioinformatics 26 Unfortunately, mains, sides and reaction directions are sometimes ambiguous: At beginnings and ends of pathways l Use heuristics to determine main/side substrates at beginnings, ends of pathways l Not always what the curator wants Substrate overlap with both sides of a reaction, e.g. A + B C + D C + B E Solution: Additional slot PRIMARIES, should only be populated when necessary: PRIMARIES: (R (A B) (C)) says that for reaction R, A and B are both main reactants, and C is a main product. But…
SRI International Bioinformatics 27 More Complications… ENZYMES-NOT-USED: a reaction may be catalyzed by multiple enzymes, but not all the enzymes necessarily participate in a given pathway l Not present in the same compartment with rest of pathway enzymes l Down-regulated or not expressed under conditions in which pathway is active l ENZYMES-NOT-USED slot lists enzymes that are not involved in the pathway even though they catalyze one of its reactions. LAYOUT-ADVICE: helps software draw pathway correctly, e.g. in a cyclical pathway, tells which substrate should be at the top. HYPOTHETICAL-REACTIONS: list of reactions in the pathway that are considered hypothetical (i.e. no direct experimental evidence)
SRI International Bioinformatics 28 Polymerization Pathways … X [n] X [n+1] X  POLYMERIZATION-LINKS: specifies reactions that should be connected by a polymerization link (X R1 R1) --- REACTANT-NAME-SLOT: N-NAME --- PRODUCT-NAME-SLOT: N+1-NAME CLASS-INSTANCE-LINKS: specifies when a link should be drawn between a substrate class and some instance of it (necessary only if instance is not a member of some reaction, so no predecessor relationship can be defined) R1 --- PRODUCT-INSTANCES: X 
SRI International Bioinformatics 29 Pathway Links Can be used as an alternative or in addition to defining super-pathways Link must be to or from some main substrate in the pathway Other end of link can be a pathway, a reaction, or an arbitrary text string Software automatically computes direction of link, but curator can override it
SRI International Bioinformatics 30 Super-Pathways Collection of pathways that connect to each other via common substrates or reactions, or as part of some larger logical unit Can contain both sub-pathways and additional connecting reactions Can be nested arbitrarily REACTION-LIST: a pathway ID instead of a reaction ID in this slot means include all reactions from the specified pathway PREDECESSORS: a pathway ID instead of a tuple in this slot means include all predecessor tuples from the specified pathway
SRI International Bioinformatics 31 Signaling Pathways Signaling pathways have different layout conventions than metabolic pathways Layout is done manually by curator using specialized editor l Curator has more control l Lack of automatic layout algorithm means that diagram won’t update automatically when data changes.
SRI International Bioinformatics 33 Querying Pathways Programmatically See http://bioinformatics.ai.sri.com/ptools/ptools-resources.html (all-pathways) (base-pathways) l Returns list of all pathways that are not super-pathways (genes-of-pathway pwy) (unique-genes-of-pathway pwy) l Returns list of all genes of a pathway that are not also part of other pathways (enzymes-of-pathway pwy) (compounds-of-pathway pwy) (get-reaction-list pwy) (variants-of-pathway pwy) l Returns all pathways in the same variant class as a pathway (get-predecessors rxn pwy), (get-successors rxn pwy) (get-rxn-direction-in-pathway pwy rxn) (pathway-inputs pwy), (pathway-outputs pwy) l Returns all compounds consumed (produced) but not produced (consumed) by pathway (ignores stoichiometry)
SRI International Bioinformatics 34 Exercises Find all genes involved in metabolic pathways Find all compounds that are unique to a single pathway Find the reactions of a pathway that have multiple isozymes
SRI International Bioinformatics 35 Solutions to Exercises Find all genes involved in metabolic pathways: l (remove-duplicates (loop for p in (all-pathways) append (genes-of-pathway p))) Find all compounds that are unique to a single pathway: l (loop for p in (base-pathways) append (loop for c in (compounds-of-pathway p) when (null (remove p (pathways-of-compound c))) collect (list c p)))
SRI International Bioinformatics 36 Solutions to Exercises, cont. Find the reactions of a pathway that have multiple isozymes: l (defun rxns-w-multiple-isozymes (pwy) (loop for rxn in (get-reaction-list pwy) for enzymes = (enzymes-of-reaction rxn) when (> (length enzymes) 1) collect rxn))
SRI International Bioinformatics 37 Regulation Class Regulation with subclasses that describe different biochemical mechanisms of regulation Slots: l Regulator l Regulated-Entity l Mode l Mechanism
SRI International Bioinformatics 38 Regulation Class Taxonomy
SRI International Bioinformatics 39 Regulation of Enzyme Activity Class Regulation-of-Enzyme-Activity Each instance of the class describes one regulatory interaction Slots: l Regulator -- usually a small molecule l Regulated-Entity -- an Enzymatic-Reaction l Mechanism -- One of: u Competitive, Uncompetitive, Noncompetitive, Irreversible, Allosteric, Unkmech, Other l Mode -- One of: +, - l Physiologically-Relevant?
SRI International Bioinformatics 40 Transcription-Units, Promoters, Terminators, Binding-Sites Transcription-Unit l One or more genes, transcribed as a unit u Any gene, promoter, etc. can belong to multiple transcription-units l One or zero promoters l Zero or more terminators, binding-sites l Transcription-Direction: +, - Promoter l Absolute-plus-1-pos = transcription start site l Binds-Sigma-Factor Terminator l Rho-Dependent or Rho-Independent l Left-End-Position, Right-End-Position Binding-Site l DNA-Binding-Site or mRNA-Binding-Site l Left-End-Position, Right-End-Position l Involved-in-Regulation
SRI International Bioinformatics 41 Transcription Initiation Class Transcription-Factor-Binding Slots: l Regulator -- instance of Proteins or Complexes (a transcription-factor) l Regulated-Entity -- instance of Promoters or Transcription- Units or Genes l Mode -- One of: +, - l Associated-Binding-Site
SRI International Bioinformatics 42 Attenuation Class Transcriptional-Attenuation Several subclasses depending on type of attenuation Slots common to all: l Regulator -- Depends on subtype of attenuation l Regulated-Entity -- instance of Terminators or Genes or Transcription-Units l Mode -- One of: +, - Slots particular to one or more subclasses: l Associated-Binding-Site l Antiterminator-Start-Pos, Antiterminator-End-Pos l Pause-Start-Pos, Pause-End-Pos
SRI International Bioinformatics 43 Attenuation Subtypes Protein-Mediated-Attenuation RNA-Mediated-Attenuation Small-Molecule-Mediated-Attenuation l Regulator = A protein/RNA/small molecule l Leader transcript binds protein/RNA/small molecule and determines formation of terminator or antiterminator Ribosome-Mediated-Attenuation l Regulator = charged tRNA l Ribosome pauses, determining whether terminator or antiterminator forms RNA-Polymerase-Modification l Regulator = instance of Proteins or Complexes l Regulatory protein binds to site in transcription unit and interacts with RNA polymerase to determine termination Rho-Blocking-Antitermination
SRI International Bioinformatics 44 Regulation of Translation Class Regulation-of-Translation Several subclasses depending on type of attenuation l Compound-Mediated (i.e. riboswitch) l RNA-Mediated l Protein-Mediated Slots: l Regulator -- Depends on subtype of attenuation l Regulated-Entity -- Transcription-Unit or Gene l Mode -- One of: +, - l Mechanism – Translation-Blocking, mRNA-Degradation, and/or Translation-Attenuation l Associated-Binding-Site
SRI International Bioinformatics 45 Transcription-Unit API functions (transcription-unit-genes tu) (transcription-unit-promoter tu) (transcription-unit-binding-sites tu) (transcription-unit-mrna-binding-sites tu) (transcription-unit-terminators) (transcription-unit-transcription-factors) (terminators-affecting-gene gene) (containing-tus frame) (binding-site-transcription-factors bs)
SRI International Bioinformatics 46 Regulation API Functions (genes-regulating-gene gene) (genes-regulated-by-gene gene) l Includes all transcriptional or translational regulation (direct-activators frame) (direct-inhibitors frame) l Frame will be an enzymatic-reaction, transcription-unit, promoter, etc. – this is a low-level function (transcription-unit-activators tu) (transcription-unit-inhibitors tu) l Includes transcriptional and translational regulators, and regulators of promoter as well as direct regulators of tu
SRI International Bioinformatics 47 Exercises Find all DNA-binding-sites that regulate a gene Find only those substrate-level regulators of an enzyme that are considered physiologically relevant Find all inhibitors of an enzyme, including transcriptional, translational and substrate-level inhibition.
SRI International Bioinformatics 48 Solution to Exercises Find all DNA-binding-sites that regulate a gene: (defun gene-binding-sites (gene) (remove-duplicates (loop for tu in (containing-tus gene) append (transcription-unit-binding-sites tu)))) Find only those substrate-level regulators of an enzyme that are considered physiologically relevant: (defun relevant-regulators (enz) (loop for enzrxn in (get-slot-values enz ‘catalyzes) append (loop for regframe in (get-slot-values enzrxn ‘regulated-by) when (get-slot-value regframe ‘physiologically-relevant?) collect (get-slot-value regframe ‘regulator) )))
SRI International Bioinformatics 49 Solution to Exercises, cont. Find all inhibitors of an enzyme, including transcriptional, translational and substrate-level inhibition: (defun enzyme-inhibitors (enz) (let* ((genes (genes-of-enzyme enz)) (tus (remove-duplicates (loop for g in genes append (containing-tus g)))) (terminators (remove-duplicates (loop for g in genes append (terminators-affecting-gene g)))) (enzrxns (get-slot-values enz 'catalyzes)) ) (append (loop for enzrxn in enzrxns append (direct-inhibitors enzrxn)) (loop for tu in tus append (transcription-unit-inhibitors tu)) (loop for term in terminators append (direct-inhibitors term)) )))