Presentation is loading. Please wait.

Presentation is loading. Please wait.

V18 EcoCyc Analysis of E.coli Metabolism

Similar presentations


Presentation on theme: "V18 EcoCyc Analysis of E.coli Metabolism"— Presentation transcript:

1 V18 EcoCyc Analysis of E.coli Metabolism
E.coli genome contains 4391 predicted genes, of which 4288 code for proteins. 676 of these genes form 607 enzymes of E.coli small-molecule metabolism. Of those enzymes, 311 are protein complexes, 296 are monomers. Organization of protein complexes. Distribution of subunit counts for all EcoCyc protein complexes. The predominance of monomers, dimers, and tetramers is obvious Ouzonis, Karp, Genome Res. 10, 568 (2000) 18. Lecture WS 2004/05 Bioinformatics III

2 Ouzonis, Karp, Genome Res. 10, 568 (2000)
Reactions EcoCyc describes 905 metabolic reactions that are catalyzed by E. coli. Of these reactions, 161 are not involved in small-molecule metabolism, e.g. they participate in macromolecule metabolism such as DNA replication and tRNA charging. Of the remaining 744 reactions, 569 have been assigned to at least one pathway. Ouzonis, Karp, Genome Res. 10, 568 (2000) 18. Lecture WS 2004/05 Bioinformatics III

3 Ouzonis, Karp, Genome Res. 10, 568 (2000)
Reactions The number of reactions (744) and the number of enzymes (607) differ ... WHY?? (1) there is no one-to-one mapping between enzymes and reactions – some enzymes catalyze multiple reactions, and some reactions are catalyzed by multiple enzymes. (2) for some reactions known to be catalyzed by E.coli, the enzyme has not yet been identified. Ouzonis, Karp, Genome Res. 10, 568 (2000) 18. Lecture WS 2004/05 Bioinformatics III

4 Ouzonis, Karp, Genome Res. 10, 568 (2000)
Compounds The 744 reactions of E.coli small-molecule metabolism involve a total of 791 different substrates. On average, each reaction contains 4.0 substrates. Number of reactions containing varying numbers of substrates (reactants plus products). Fill out the plot for the data in EcoCyc (# of reactions vs. substrates) Ouzonis, Karp, Genome Res. 10, 568 (2000) 18. Lecture WS 2004/05 Bioinformatics III

5 Ouzonis, Karp, Genome Res. 10, 568 (2000)
Pathways EcoCyc describes 131 pathways: energy metabolism nucleotide and amino acid biosynthesis secondary metabolism Pathways vary in length from a single reaction step to 16 steps with an average of 5.4 steps. Length distribution of EcoCyc pathways Q: Gilt diese Verteilung (für die Länge von Schritten eines biochemischen Pfades) auch für die elementaren Moden von E.coli? A: Quatsch. Ouzonis, Karp, Genome Res. 10, 568 (2000) 18. Lecture WS 2004/05 Bioinformatics III

6 Reactions Catalyzed by More Than one Enzyme
Diagram showing the number of reactions that are catalyzed by one or more enzymes. Most reactions are catalyzed by one enzyme, some by two, and very few by more than two enzymes. For 84 reactions, the corresponding enzyme is not yet encoded in EcoCyc. What may be the reasons for isozyme redundancy? (1) the enzymes that catalyze the same reaction are homologs and have duplicated (or were obtained by horizontal gene transfer), acquiring some specificity but retaining the same mechanism (divergence) (2) the reaction is easily „invented“; therefore, there is more than one protein family that is independently able to perform the catalysis (convergence). Ouzonis, Karp, Genome Res. 10, 568 (2000) 18. Lecture WS 2004/05 Bioinformatics III

7 Enzymes that catalyze more than one reaction
Genome predictions usually assign a single enzymatic function. However, E.coli is known to contain many multifunctional enzymes. Of the 607 E.coli enzymes, 100 are multifunctional, either having the same active site and different substrate specificities or different active sites. Number of enzymes that catalyze one or more reactions. Most enzymes catalyze one reaction; some are multifunctional. The enzymes that catalyze 7 and 9 reactions are purine nucleoside phosphorylase and nucleoside diphosphate kinase. Take-home message: The high proportion of multifunctional enzymes implies that the genome projects significantly underpredict multifunctional enzymes! Ouzonis, Karp, Genome Res. 10, 568 (2000) 18. Lecture WS 2004/05 Bioinformatics III

8 Reactions participating in more than one pathway
The 99 reactions belonging to multiple pathways appear to be the intersection points in the complex network of chemical processes in the cell. E.g. the reaction present in 6 pathways corresponds to the reaction catalyzed by malate dehydrogenase, a central enzyme in cellular metabolism. Ouzonis, Karp, Genome Res. 10, 568 (2000) 18. Lecture WS 2004/05 Bioinformatics III

9 Connectivity distributions P(k) for substrates
consider in and out-degrees separately because many biochemical reactions are not reversible. a, Archaeoglobus fulgidus (archae); b, E. coli (bacterium); c, Caenorhabditis elegans (eukaryote), shown on a log–log plot, counting separately the incoming (In) and outgoing links (Out) for each substrate. kin (kout) corresponds to the number of reactions in which a substrate participates as a product (educt). d, The connectivity distribution averaged over 43 organisms. Warum ist es sinnvoll, in- und out-degree getrennt zu betrachten? Jeong et al. Nature 407, 651 (2000) 18. Lecture WS 2004/05 Bioinformatics III

10 Properties of metabolic networks
a, The histogram of the biochemical pathway lengths, l, in E. coli. b, The average path length (diameter) for each of the 43 organisms. c, d, Average number of incoming links (c) or outgoing links (d) per node for each organism. e, The effect of substrate removal on the metabolic network diameter (average shortest biochemical pathway between 2 substrates) of E. coli. In the top curve (red) the most connected substrates are removed first. In the bottom curve (green) nodes are removed randomly. M  = 60 corresponds to 8% of the total number of substrates in found in E. coli. The horizontal axis in b– d denotes the number of nodes in each organism. b–d, Archaea (magenta), bacteria (green) and eukaryotes (blue) are shown. Zeichnen Sie in (e) den erwarteten Kurvenverlauf für den Durchmeser für Entfernung eines Hub-Proteins bzw. eines zufälligen Proteins ein. 18. Lecture WS 2004/05 Bioinformatics III

11 Stoichiometric matrix
A matrix with reaction stochio-metries as columns and metabolite participations as rows. The stochiometric matrix is an important part of the in silico model. With the matrix, the methods of extreme pathway and elementary mode analyses can be used to generate a unique set of pathways P1, P2, and P3 (see future lecture). Hierzu kommt definitv eine Aufgabe a la: stellen Sie für das folgende Netzwerk die stöchiometrische Matrix auf. Papin et al. TIBS 28, 250 (2003) 18. Lecture WS 2004/05 Bioinformatics III

12 Flux balancing Any chemical reaction requires mass conservation.
Therefore one may analyze metabolic systems by requiring mass conservation. Only required: knowledge about stoichiometry of metabolic pathways and metabolic demands For each metabolite: Under steady-state conditions, the mass balance constraints in a metabolic network can be represented mathematically by the matrix equation: S · v = 0 where the matrix S is the m  n stoichiometric matrix, m = the number of metabolites and n = the number of reactions in the network. The vector v represents all fluxes in the metabolic network, including the internal fluxes, transport fluxes and the growth flux. Was ist die Grundannahme für die Anwendung von Flux Balance Analysis, d.h. Lösen der Gleichung S  v = 0 ? 18. Lecture WS 2004/05 Bioinformatics III

13 Flux balance analysis Since the number of metabolites is generally smaller than the number of reactions (m < n) the flux-balance equation is typically underdetermined. Therefore there are generally multiple feasible flux distributions that satisfy the mass balance constraints. The set of solutions are confined to the nullspace of matrix S. To find the „true“ biological flux in cells ( e.g. Heinzle, Huber, UdS) one needs additional (experimental) information, or one may impose constraints on the magnitude of each individual metabolic flux. The intersection of the nullspace and the region defined by those linear inequalities defines a region in flux space = the feasible set of fluxes. 18. Lecture WS 2004/05 Bioinformatics III

14 Feasible solution set for a metabolic reaction network
(A) The steady-state operation of the metabolic network is restricted to the region within a cone, defined as the feasible set. The feasible set contains all flux vectors that satisfy the physicochemical constrains. Thus, the feasible set defines the capabilities of the metabolic network. All feasible metabolic flux distributions lie within the feasible set, and (B) in the limiting case, where all constraints on the metabolic network are known, such as the enzyme kinetics and gene regulation, the feasible set may be reduced to a single point. This single point must lie within the feasible set. Edwards & Palsson PNAS 97, 5528 (2000) 18. Lecture WS 2004/05 Bioinformatics III

15 Summary FBA analysis constructs the optimal network utilization simply using stoichiometry of metabolic reactions and capacity constraints. For E.coli the in silico results are consistent with experimental data. FBA shows that in the E.coli metabolic network there are relatively few critical gene products in central metabolism. However, the ability to adjust to different environments (growth conditions) may be dimished by gene deletions. FBA identifies „the best“ the cell can do, not how the cell actually behaves under a given set of conditions. Here, survival was equated with growth. FBA does not directly consider regulation or regulatory constraints on the metabolic network. This can be treated separately (see future lecture). Edwards & Palsson PNAS 97, 5528 (2000) 18. Lecture WS 2004/05 Bioinformatics III

16 V19 Extreme Pathways introduced into metabolic analysis by the lab of Bernard Palsson (Dept. of Bioengineering, UC San Diego). The publications of this lab are available at The extreme pathway technique is based on the stoichiometric matrix representation of metabolic networks. All external fluxes are defined as pointing outwards. Schilling, Letscher, Palsson, J. theor. Biol. 203, 229 (2000) 18. Lecture WS 2004/05 Bioinformatics III

17 Extreme Pathways – theorem
Theorem. A convex flux cone has a set of systemically independent generating vectors. Furthermore, these generating vectors (extremal rays) are unique up to a multiplication by a positive scalar. These generating vectors will be called „extreme pathways“. (1) The existence of a systemically independent generating set for a cone is provided by an algorithm to construct extreme pathways (see below). (2) uniqueness? Let {p1, ..., pk} be a systemically independent generating set for a cone. Then follows that if pj = c´+ c´´ both c´and c´´ are positive multiples of pj. Schilling, Letscher, Palsson, J. theor. Biol. 203, 229 (2000) 18. Lecture WS 2004/05 Bioinformatics III

18 Extreme Pathways – algorithm - setup
The algorithm to determine the set of extreme pathways for a reaction network follows the pinciples of algorithms for finding the extremal rays/ generating vectors of convex polyhedral cones. Combine n  n identity matrix (I) with the transpose of the stoichiometric matrix ST. I serves for bookkeeping. Schilling, Letscher, Palsson, J. theor. Biol. 203, 229 (2000) S I ST 18. Lecture WS 2004/05 Bioinformatics III

19 separate internal and external fluxes
Examine constraints on each of the exchange fluxes as given by j  bj  j If the exchange flux is constrained to be positive  do nothing. If the exchange flux is constrained to be negative  multiply the corresponding row of the initial matrix by -1. If the exchange flux is unconstrained  move the entire row to a temporary matrix T(E). This completes the first tableau T(0). T(0) and T(E) for the example reaction system are shown on the previous slide. Each element of this matrices will be designated Tij. Starting with x = 1 and T(0) = T(x-1) the next tableau is generated in the following way: Schilling, Letscher, Palsson, J. theor. Biol. 203, 229 (2000) 18. Lecture WS 2004/05 Bioinformatics III

20 idea of algorithm (1) Identify all metabolites that do not have an unconstrained exchange flux associated with them. The total number of such metabolites is denoted by . For the example, this is only the case for metabolite C ( = 1). What is the main idea? - We want to find balanced extreme pathways that don‘t change the concentrations of metabolites when flux flows through (input fluxes are channelled to products not to accumulation of intermediates). - The stochiometrix matrix describes the coupling of each reaction to the concentration of metabolites X. - Now we need to balance combinations of reactions that leave concentrations unchanged. Pathways applied to metabolites should not change their concentrations  the matrix entries need to be brought to 0. Schilling, Letscher, Palsson, J. theor. Biol. 203, 229 (2000) 18. Lecture WS 2004/05 Bioinformatics III

21 keep pathways that do not change concentrations of internal metabolites
(2) Begin forming the new matrix T(x) by copying all rows from T(x – 1) which contain a zero in the column of ST that corresponds to the first metabolite identified in step 1, denoted by index c. (Here 3rd column of ST.) Schilling, Letscher, Palsson, J. theor. Biol. 203, 229 (2000) 1 -1 Hierzu kommt ebenfalls eine Rechenaufgabe. Wir werden das Endresultat angeben, sodass Sie Ihre Lösung überprüfen können. T(0) = T(1) = 1 -1 + 18. Lecture WS 2004/05 Bioinformatics III

22 balance combinations of other pathways
(3) Of the remaining rows in T(x-1) add together all possible combinations of rows which contain values of the opposite sign in column c, such that the addition produces a zero in this column. Schilling, et al. JTB 203, 229 1 -1 T(0) = 1 -1 T(1) = 18. Lecture WS 2004/05 Bioinformatics III

23 remove “non-orthogonal” pathways
(4) For all of the rows added to T(x) in steps 2 and 3 check to make sure that no row exists that is a non-negative combination of any other sets of rows in T(x) . One method used is as follows: let A(i) = set of column indices j for with the elements of row i = 0. For the example above Then check to determine if there exists A(1) = {2,3,4,5,6,9,10,11} another row (h) for which A(i) is a A(2) = {1,4,5,6,7,8,9,10,11} subset of A(h). A(3) = {1,3,5,6,7,9,11} A(4) = {1,3,4,5,7,9,10} If A(i)  A(h), i  h A(5) = {1,2,3,6,7,8,9,10,11} where A(6) = {1,2,3,4,7,8,9} A(i) = { j : Ti,j = 0, 1  j  (n+m) } then row i must be eliminated from T(x) Schilling et al. JTB 203, 229 Q: was ist der Sinn hiervon? A: Elementarität bzw. Minimalität. in V24 heisst A(i) nun Z(i), also der zero set. 18. Lecture WS 2004/05 Bioinformatics III

24 repeat steps for all internal metabolites
(5) With the formation of T(x) complete steps 2 – 4 for all of the metabolites that do not have an unconstrained exchange flux operating on the metabolite, incrementing x by one up to . The final tableau will be T(). Note that the number of rows in T () will be equal to k, the number of extreme pathways. Schilling et al. JTB 203, 229 18. Lecture WS 2004/05 Bioinformatics III

25 balance external fluxes
(6) Next we append T(E) to the bottom of T(). (In the example here  = 1.) This results in the following tableau: Schilling et al. JTB 203, 229 1 -1 T(1/E) = 18. Lecture WS 2004/05 Bioinformatics III

26 balance external fluxes
(7) Starting in the n+1 column (or the first non-zero column on the right side), if Ti,(n+1)  0 then add the corresponding non-zero row from T(E) to row i so as to produce 0 in the n+1-th column. This is done by simply multiplying the corresponding row in T(E) by Ti,(n+1) and adding this row to row i . Repeat this procedure for each of the rows in the upper portion of the tableau so as to create zeros in the entire upper portion of the (n+1) column. When finished, remove the row in T(E) corresponding to the exchange flux for the metabolite just balanced. Schilling et al. JTB 203, 229 18. Lecture WS 2004/05 Bioinformatics III

27 balance external fluxes
(8) Follow the same procedure as in step (7) for each of the columns on the right side of the tableau containing non-zero entries. (In this example we need to perform step (7) for every column except the middle column of the right side which corresponds to metabolite C.) The final tableau T(final) will contain the transpose of the matrix P containing the extreme pathways in place of the original identity matrix. Schilling et al. JTB 203, 229 18. Lecture WS 2004/05 Bioinformatics III

28 pathway matrix T(final) = PT = p1 p7 p3 p2 p4 p6 p5
-1 T(final) = PT = Schilling et al. JTB 203, 229 v1 v2 v3 v4 v5 v6 b1 b2 b3 b4 p1 p7 p3 p2 p4 p6 p5 1 -1 18. Lecture WS 2004/05 Bioinformatics III

29 Extreme Pathways for model system
2 pathways p6 and p7 are not shown (right below) because all exchange fluxes with the exterior are 0. Such pathways have no net overall effect on the functional capabilities of the network. They belong to the cycling of reactions v4/v5 and v2/v3. Schilling et al. JTB 203, 229 v1 v2 v3 v4 v5 v6 b1 b2 b3 b4 p1 p7 p3 p2 p4 p6 p5 1 -1 18. Lecture WS 2004/05 Bioinformatics III

30 How reactions appear in pathway matrix
In the matrix P of extreme pathways, each column is an EP and each row corresponds to a reaction in the network. The numerical value of the i,j-th element corresponds to the relative flux level through the i-th reaction in the j-th EP. Papin, Price, Palsson, Genome Res. 12, 1889 (2002) 18. Lecture WS 2004/05 Bioinformatics III

31 Properties of pathway matrix
A symmetric Pathway Length Matrix PLM can be calculated: where the values along the diagonal correspond to the length of the EPs. The off-diagonal terms of PLM are the number of reactions that a pair of extreme pathways have in common. Wie können Sie aus der Pathway Matrix P die Länge der beteiligten Pfade berechnen? Papin, Price, Palsson, Genome Res. 12, 1889 (2002) 18. Lecture WS 2004/05 Bioinformatics III

32 Properties of pathway matrix
One can also compute a reaction participation matrix PPM from P: where the diagonal correspond to the number of pathways in which the given reaction participates. Wie können Sie aus der Pathway Matrix P ermitteln, an wievielen Pfaden eine Reaktion beteiligt ist? Papin, Price, Palsson, Genome Res. 12, 1889 (2002) 18. Lecture WS 2004/05 Bioinformatics III

33 Summary (extreme pathways)
Extreme pathway analysis provides a mathematically rigorous way to dissect complex biochemical networks. The matrix products PT  P and PT  P are useful ways to interpret pathway lengths and reaction participation. However, the number of computed vectors may range in the 1000sands. Therefore, meta-methods (e.g. singular value decomposition) are required that reduce the dimensionality to a useful number that can be inspected by humans. Single value decomposition may be one useful method ... and there are more to come. Single value decomposition kommt nicht dran. Price et al. Biophys J 84, 794 (2003) 18. Lecture WS 2004/05 Bioinformatics III

34 Computational metabolomics: modelling constraints
Surviving (expressed) phenotypes must satisfy constraints imposed on the molecular functions of a cell, e.g. conservation of mass and energy. Fundamental approach to understand biological systems: identify and formulate constraints. Important constraints of cellular function: physico-chemical constraints Topological constraints Environmental constraints Regulatory constraints Price et al. Nature Rev Microbiol 2, 886 (2004) 18. Lecture WS 2004/05 Bioinformatics III

35 Physico-chemical constraints
These are „hard“ constraints: Conservation of mass, energy and momentum. Contents of a cell are densely packed  viscosity can be 100 – 1000 times higher than that of water Therefore, diffusion rates of macromolecules in cells are slower than in water. Many molecules are confined inside the semi-permeable membrane  high osmolarity. Need to deal with osmotic pressure (e.g. Na+K+ pumps) Reaction rates are determined by local concentrations inside cells Enzyme-turnover numbers are generally less than 104 s-1. Maximal rates are equal to the turnover-number multiplied by the enzyme concentration. Biochemical reactions are driven by negative free-energy change in forward direction. Price et al. Nature Rev Microbiol 2, 886 (2004) 18. Lecture WS 2004/05 Bioinformatics III

36 Topological constraints
The crowding of molecules inside cells leads to topological (3D)-constraints that affect both the form and the function of biological systems. Price et al. Nature Rev Microbiol 2, 886 (2004) 18. Lecture WS 2004/05 Bioinformatics III

37 Environmental constraints
Environmental constraints on cells are time and condition dependent: Nutrient availability, pH, temperature, osmolarity, availability of electron acceptors. E.g. Heliobacter pylori lives in the human stomach at pH=1  needs to produce NH3 at a rate that will maintain its immediate surrounding at a pH that is sufficiently high to allow survival. Ammonia is made from elementary nitrogen  H. pylori has adapted by using amino acids instead of carbohydrates as its primary carbon source. Q: welche Methode ist besser geeignet, den Effekt von Zwangsbedingungen wie die Einschränkung der Reaktionsrate eines Enzym auf das Verhalten eines Metabolismus zu beschreiben? FBA oder Extreme Pathways? A: FBA. Modellierung als Einschränkung des Flusses v1 Price et al. Nature Rev Microbiol 2, 886 (2004) 18. Lecture WS 2004/05 Bioinformatics III

38 Regulatory constraints
Regulatory constraints are self-imposed by the organism and are subject to evolutionary change  they are no „hard“ constraints. Regulatory constraints allow the cell to eliminate suboptimal phenotypic states and to confine itself to behaviors of increased fitness. Nennen Sie 5 Zwangsbedingungen, die für die Modellierung zellulärer Netzwerke wichtig sein können. Price et al. Nature Rev Microbiol 2, 886 (2004) 18. Lecture WS 2004/05 Bioinformatics III

39 Mathematical formation of constraints
There are two fundamental types of constraints: balances and bounds. Balances are constraints that are associated with conserved quantities as energy, mass, redox potential, momentum or with phenomena such as solvent capacity, electroneutrality and osmotic pressure. Bounds are constraints that limit numerical ranges of individual variables and parameters such as concentrations, fluxes or kinetic constants. Both bound and balance constraints limit the allowable functional states of reconstructed cellular metabolic networks. Price et al. Nature Rev Microbiol 2, 886 (2004) 18. Lecture WS 2004/05 Bioinformatics III

40 Tools for analyzing network states
The two steps that are used to form a solution space — reconstruction and the imposition of governing constraints — are illustrated in the centre of the figure. Several methods are being developed at various laboratories to analyse the solution space. Ci and Cj concentrations of compounds i and j; EP, extreme pathway; vi and vj fluxes through reactions i and j; v1 –v3 flux through reactions 1-3; vnet, net flux through loop. Price et al. Nature Rev Microbiol 2, 886 (2004) 18. Lecture WS 2004/05 Bioinformatics III

41 Determining optimal states
Price et al. Nature Rev Microbiol 2, 886 (2004) 18. Lecture WS 2004/05 Bioinformatics III

42 Characterizing the whole solution space
Price et al. Nature Rev Microbiol 2, 886 (2004) 18. Lecture WS 2004/05 Bioinformatics III

43 V20 Applications of Flux Balance Analysis
Many aspects of metabolism are currently being studied to understand its hierarchical and modular organization: Topology (Jeong et al. V18) Reaction fluxes (Almaas et al. - today) Epistasis (Segrè et al. - today) Coupling to gene expression (V22) 18. Lecture WS 2004/05 Bioinformatics III

44 E.coli in silico Stochiometric matrix Sij for E.coli strain MG1655 containing 537 metabolites Ai and 739 reactions j. Apply flux balance analysis: in a steady state the concentrations of all the metabolites are time independent vj is the flux of reaction j and Sij is the stoichiometric coefficient of reaction j. We denote the mass carried by reaction j producing (consuming) metabolite i by Fluxes vary widely: e.g. dimensionless flux of succinyl coenzyme A synthetase reaction is 0.185, whereas the flux of the aspartate oxidase reaction is times smaller, 2.2  10-5. Wie können Sie die Beteiligung einer Reaktion j an der Produktion eines Metaboliten i charakterisieren? 18. Lecture WS 2004/05 Bioinformatics III

45 Overall flux organization of E.coli metabolic network
a, Flux distribution for optimized biomass production on succinate (black) and glutamate (red) substrates. The solid line corresponds to the power-law fit that a reaction has flux v P(v)  (v + v0)- , with v0 = and  = 1.5. d, The distribution of experimentally determined fluxes from the central metabolism of E. coli shows power-law behaviour as well, with a best fit to P(v) v- with  = 1. Both computed and experimental flux distribution show wide spectrum of fluxes. Almaar et al., Nature 427, 839 (2004) 18. Lecture WS 2004/05 Bioinformatics III

46 Overall flux organization of E.coli metabolic network
The observed flux distribution is compatible with two different potential local flux structures: (a) a homogenous local organization would imply that all reactions producing (consuming) a given metabolite have comparable fluxes (b) a more delocalized „high-flux backbone (HFB)“ is expected if the local flux organisation is heterogenous such that each metabolite has a dominant source (consuming) reaction. Schematic illustration of the hypothetical scenario in which (a) all fluxes have comparable activity, in which case we expect kY(k)  1 and (b) the majority of the flux is carried by a single incoming or outgoing reaction, for which we should have kY(k)  k . Almaar et al., Nature 427, 839 (2004) 18. Lecture WS 2004/05 Bioinformatics III

47 Overall flux organization of E.coli metabolic network
To distinguish between these 2 schemes for each metabolite i produced (consumed) by k reactions, define where vij is the mass carried by reaction j which produces (consumes) metabolite i. If all reactions producing (consuming) metabolite i have comparable vij values, Y(k,i) scales as 1/k. If, however, the activity of a single reaction dominates we expect Y(k,i) 1 (independent of k). Beschreiben Sie die Strategie von Barabasi und Mitarbeitern, zwischen den Szenarien (a) und (b) zu unterscheiden. Almaar et al., Nature 427, 839 (2004) 18. Lecture WS 2004/05 Bioinformatics III

48 Characterizing the local inhomogeneity of the flux net
a, Measured kY(k) shown as a function of k for incoming and outgoing reactions, averaged over all metabolites, indicates that Y(k)  k-0.27, as the solid line has the slope  = 0.73. Inset shows non-zero mass flows, v^ij, producing (consuming) FAD on a glutamate-rich substrate.  an intermediate behavior is found between the two extreme cases.  the large-scale inhomogeneity observed in the overall flux distribution is also increasingly valid at the level of the individual metabolites. The more reactions that consume (produce) a given metabolite, the more likely it is that a single reaction carries most of the flux, see FAD. Almaar et al., Nature 427, 839 (2004) 18. Lecture WS 2004/05 Bioinformatics III

49 Clean up metabolic network
Simple algorithm removes for each metabolite systematically all reactions but the one providing the largest incoming (outgoing) flux distribution. The algorithm uncovers the „high-flux-backbone“ of the metabolism, a distinct structure of linked reactions that form a giant component with a star-like topology. Almaar et al., Nature 427, 839 (2004) 18. Lecture WS 2004/05 Bioinformatics III

50 Maximal flow networks glutamate rich succinate rich substrates
Directed links: Two metabolites (e.g. A and B) are connected with a directed link pointing from A to B only if the reaction with maximal flux consuming A is the reaction with maximal flux producing B. Shown are all metabolites that have at least one neighbour after completing this procedure. The background colours denote different known biochemical pathways. Almaar et al., Nature 427, 839 (2004) 18. Lecture WS 2004/05 Bioinformatics III

51 FBA-optimized network on glutamate-rich substrate
High-flux backbone for FBA-optimized metabolic network of E. coli on a glutamate-rich substrate. Metabolites (vertices) coloured blue have at least one neighbour in common in glutamate- and succinate-rich substrates, and those coloured red have none. Reactions (lines) are coloured blue if they are identical in glutamate- and succinate-rich substrates, green if a different reaction connects the same neighbour pair, and red if this is a new neighbour pair. Black dotted lines indicate where the disconnected pathways, for example, folate biosynthesis, would connect to the cluster through a link that is not part of the HFB. Thus, the red nodes and links highlight the predicted changes in the HFB when shifting E. coli from glutamate- to succinate-rich media. Dashed lines indicate links to the biomass growth reaction. (1) Pentose Phospate (11) Respiration (2) Purine Biosynthesis (12) Glutamate Biosynthesis (20) Histidine Biosynthesis (3) Aromatic Amino Acids (13) NAD Biosynthesis (21) Pyrimidine Biosynthesis (4) Folate Biosynthesis (14) Threonine, Lysine and Methionine Biosynthesis (5) Serine Biosynthesis (15) Branched Chain Amino Acid Biosynthesis (6) Cysteine Biosynthesis (16) Spermidine Biosynthesis (22) Membrane Lipid Biosynthesis (7) Riboflavin Biosynthesis (17) Salvage Pathways (23) Arginine Biosynthesis (8) Vitamin B6 Biosynthesis (18) Murein Biosynthesis (24) Pyruvate Metabolism (9) Coenzyme A Biosynthesis (19) Cell Envelope Biosynthesis (25) Glycolysis (10) TCA Cycle Almaar et al., Nature 427, 839 (2004) 18. Lecture WS 2004/05 Bioinformatics III

52 Interpretation Only a few pathways appear disconnected indicating that although these pathways are part of the HFB, their end product is only the second-most important source for another HFB metabolite. Groups of individual HFB reactions largely overlap with traditional biochemical partitioning of cellular metabolism. Almaar et al., Nature 427, 839 (2004) 18. Lecture WS 2004/05 Bioinformatics III

53 How sensitive is the HFB to changes in the environment?
b, Fluxes of individual reactions for glutamate-rich and succinate-rich conditions. Reactions with negligible flux changes follow the diagonal (solid line). Some reactions are turned off in only one of the conditions (shown close to the coordinate axes). Reactions belonging to the HFB are indicated by black squares, the rest are indicated by blue dots. Reactions in which the direction of the flux is reversed are coloured green. Only the reactions in the high-flux territory undergo noticeable differences! Type I: reactions turned on in one conditions and off in the other (symbols). Type II: reactions remain active but show an orders-in-magnitude shift in flux under the two different growth conditions. Almaar et al., Nature 427, 839 (2004) 18. Lecture WS 2004/05 Bioinformatics III

54 Flux distributions for individual reactions
Shown is the flux distribution for four selected E. coli reactions in a 50% random environment. a Triosphosphate isomerase; b carbon dioxide transport; c NAD kinase; d guanosine kinase. Reactions on the   v curve (small fluxes) have unimodal/gaussian distributions (a and c). Shifts in growth-conditions only lead to small changes of their flux values. Reactions off this curve have multimodal distributions (b and d), showing several discrete flux values under diverse conditions. Under different growth conditions they show several discrete and distinct flux values. Almaar et al., Nature 427, 839 (2004) 18. Lecture WS 2004/05 Bioinformatics III

55 Summary Metabolic network use is highly uneven (power-law distribution) at the global level and at the level of the individual metabolites. Whereas most metabolic reactions have low fluxes, the overall activity of the metabolism is dominated by several reactions with very high fluxes. E. coli responds to changes in growth conditions by reorganizing the rates of selected fluxes predominantly within this high-flux backbone. Apart from minor changes, the use of the other pathways remains unaltered. These reorganizations result in large, discrete changes in the fluxes of the HFB reactions.  may represent a universal feature of metabolic activity in all cells. Implications for metabolic engineering? 18. Lecture WS 2004/05 Bioinformatics III

56 A global view of epistasis
The relationship between genotype and phenotype is expected to be nonlinear for most common human diseases. Part of this complexity can be attributed to epistasis – the effects of a given gene on a trait can be dependent on one or more other genes. Moore, Nature Gen 37, 13 (2005) 18. Lecture WS 2004/05 Bioinformatics III

57 Why is there epistasis? Exact origin is unknown.
Suspect: canalization stabilizes phenotypes through natural selection. Hypothesis: phenotypes should be stable in the presence of mutations  they must have an underlying genetic architecture that is comprised of networks of genes that are redundant and robust. Therefore, substantial effects on the phenotype are observed only when there are multiple mutational hits to the gene network.  creates dependencies among the genes in the network. Approach: systematic study epistatis in well-understood model organisms. Here: single and double knockouts of 890 metabolic genes of yeast. Estimate growth phenotypes of all knockouts by metabolic flux analysis. Moore, Nature Gen 37, 13 (2005) 18. Lecture WS 2004/05 Bioinformatics III

58 How does one measure epistasis?
WX, WY: fitness of individual mutants WXY: fitness of double mutant Here: compute maximal rate of biomass Scaled espsilon can distinguish between ^production Vgrowth of all networks. WX = 0.7, WY = 0.7, WXY = 0.54 For deletion of gene X, define fitness as WX = 0.54, WY = 0.91, WXY = 0.54 In both cases WX  WY = 0.49 and the non-scaled epsilon cannot distinguish both cases. Moore, Nature Gen 37, 13 (2005) erschwerend/risikoerhöhend 18. Lecture WS 2004/05 Bioinformatics III

59 Examples of gene deletion interactions
full dots: essential biomass components, need to be synthesized for the cell to be able to grow. a idealized case: all biomass components are derived from independent nutrient sources. WX = SX, WY = SY, WXY = min(WX,WY) b more realistic case, parallel and mutually required pathways demand optimal allocation of a common nutrient. c simplified example of synthetic lethality: a single biomass component can be produced through 2 alternative routes. Single mutants can grow, double mutant is lethal. d complete buffering WXY = min(WX,WY) effect of gene deletion X : reduced „effective stochiometry“ SX. Moore, Nature Gen 37, 13 (2005) 18. Lecture WS 2004/05 Bioinformatics III

60 Fitness of double mutants
Fitness values of all possible double mutants relative to the expected no-epistasis values are calculated with FBA over all pairs of enzyme deletions (excluding essential genes and gene-deletions with no phenotypic consequence). The trimodal distribution is uncovered by transforming the (a) nonscaled epistasis level = WXY - WXWY into (b) the new scale defined in Table 1. The new espilon values are used to classify the interactions into buffering (green) at  > +; aggravating (red), including synthetic lethal at  = -1 and strong synthetic sick at  < -; and no epistasis (black). Here we used (-, +) = (-0.25, 0.95). Relatively few interaction pairs (gray) fall in a nondecisive area. Although the  = 1 point is the outermost value in the FBA model, in experimental measurements compensatory interactions could exceed this buffering case. Segrè et al., Nature Genet. 37, 77 (2005) 18. Lecture WS 2004/05 Bioinformatics III

61 Classification of gene interactions
(c) The classification of gene interactions is also evident in a scatter plot showing / versus  normalized to the effect of the double mutation, 1 - WXY. The ratio between the x and y axes is equal to the scaled epistasis level . (d) A schematic metabolic network showing simple examples of buffering, aggravating and multiplicative interactions (green, red and black arcs, respectively) between gene deletions (X). The synthesis of biomass (full square) from biomass components (full dots) requires an optimal allocation of a common nutrient (empty square) through intermediate metabolites (empty dots). Additional reactions (dotted arrow) may account for more subtle buffering interactions in the complete network. (e,f) Distribution of epistasis in experimental data of fitness measurements of double and single mutants in RNA viruses. The unimodal distribution of  (e) diverges into a trimodal distribution when  is used (f). While these data support the FBA-derived trimodal distribution in the [-1,1] range of  , the presence of pairs with  > 1 stresses the relevance of the additional class of such compensatory interactions (31 pairs, not shown). In viewing these results, one should keep in mind that the data are based on a heterogeneous collection of diverse experiments and may not represent a truly random set of mutations. Welche Effekte erwarten Sie für die gleichzeitige Entfernung der beiden roten/ grünen/schwarzen Proteine? Segrè et al., Nature Genet. 37, 77 (2005) 18. Lecture WS 2004/05 Bioinformatics III

62 Epistatic interactions between genes
Epistatic interactions between genes classified by functional annotation groups tend to be of a single sign (i.e. monochromatic). (a) Representation of the number of buffering and aggravating interactions within and between groups of genes defined by common preassigned annotation from the FBA model. The radii of the pies represent the total number of interactions (ranging logarithmically from 1 in the smallest pies to 35 in the largest). The red and green pie slices reflect the numbers of aggravating and buffering interactions, respectively. Monochromatic interactions, represented by whole green or red pies, are much more common than would be expected by chance. (b) Sensitivity analysis of the prevalence of monochromaticity with respect to changes in the growth conditions In each matrix, an input parameter was modified with respect to the nominal analysis: O−, oxygen concentration; C−, carbon concentration; AC, acetate (instead of glucose) supplied as carbon source. The color of the matrix element indicates the kind of interactions observed between the genes in different annotation groups: red for pure aggravating, green for pure buffering and yellow for mixed links. Segrè et al., Nature Genet. 37, 77 (2005) 18. Lecture WS 2004/05 Bioinformatics III

63 Segrè et al., Nature Genet. 37, 77 (2005)
Prism algorithm (a) The algorithm arranges a network of aggravating (red) and buffering (green) interactions into modules whose genes interact with one another in a strictly monochromatic way. This classification allows a system-level description of buffering and aggravating interactions between functional modules. Two networks with the same topology, but different permutations of link colors, can have different properties of monochromatic clusterability: permuting links 3−4 with 2−4 transforms a 'clusterable' graph (b) into a 'nonclusterable' one (c). Segrè et al., Nature Genet. 37, 77 (2005) 18. Lecture WS 2004/05 Bioinformatics III

64 Prism algorithm Prism stands for „pairwise reduction into subgraphs monochromatically“ Examples of monochromatic clustering of 3 toy networks using Prism algorithm. Prism performs agglomerative clustering with the additional feature of avoiding, when possible, the generation of clusters that interact with each other with both aggravating and buffering epistatic links. (a) And (b) show examples of networks which are clustered with no monochromatic violations, i.e. with Qmodule = 0. (c) provides an example for which a monochromatic solution does not exist. In this network, any two pairs of genes clustered in the first step will cause a monochromatic violation. The Prism algorithm would find the solution shown, which have a total of Qmodule = 1 violations. 18. Lecture WS 2004/05 Bioinformatics III

65 Agglomerative clustering
Start: assign each gene to a distinct cluster. Iteration: at each sequential clustering step, pairs of clusters are combined depending on the biological proximity, or affinity Ax,y between cluster x (size nx) and cluster y (size ny). Ax,y is computed as the linear combination of a direct affinity and an associative affinity The epistasis network, EX,Y is defined as a discretization of the X,Y values based on the cut-off parameters . EX,Y = -1 if X,Y < - and EX,Y if X,Y > +. EX,Y = 0 otherwise. End: when the whole network is covered. 18. Lecture WS 2004/05 Bioinformatics III

66 Agglomerative clustering
At every step, each cluster (x,y) is also assigned an integer Cx,y counting how many non-monochromatic connections would be formed if clusters x and y were joined (i.e. the number of clusters z that have buffering links with x and aggravating links with y, or vice versa). The algorithm hence identifies the set  of (x,y) pairs for which Cx,y = Cm, where The set  contains all the candidate pairs that, if joined at the next step, would cause the minimal possible number of monochromatic conflicts. The pair with highest Ax,y in  is then chosen as the pair of clusters to be combined. At a given step, monochromaticity is preserved if Cm = 0. The final clustering solution is assigned a total module-module monochromaticity violation number, Qmodule =  Cm, where the sum is over all the clustering steps. Q: Beschreiben Sie die Grundidee des PRISM-Algorithmus, agglomeratives Clustering (welches Abstandsmass wird benutzt?) und welches zusätzliche Kriterium (zähle die Anzahl an nicht-monochromatischen Verbindungen). 18. Lecture WS 2004/05 Bioinformatics III

67 Segrè et al., Nature Genet. 37, 77 (2005)
Example for Prism Schematic demonstration of monochromatic classification. A network of two types of connections, such as buffering (green) and aggravating (red) epistasis, is sorted into module of genes that interact with one another in a purely monochromatic way. Segrè et al., Nature Genet. 37, 77 (2005) 18. Lecture WS 2004/05 Bioinformatics III

68 Prism  monochromatic gene interaction network
Buffering (green) and aggravating (red) gene interaction network. Genes (black nodes) are grouped into monochromatically interacting modules (enclosing boxes). Gene annotations (white letters inside nodes) correlate well with the unsupervised classification. For directional buffering links, arrows point from the deletion with the larger effect to the deletion with the smaller effect (i.e., to the mutation whose fitness effect is buffered by the presence of the other). Gene names are indicated on the side of the nodes. Names consisting of the letter U followed by a number correspond to enzymatic or transport reactions with unassigned genes. Prism parameter = 0.3 was used. Segrè et al., Nature Genet. 37, 77 (2005) 18. Lecture WS 2004/05 Bioinformatics III

69 System-level view of interactions between functional modules
Notable predictions of module-module interactions include the aggravating link between LYSbs and TRPcat and the buffering one between PRObs and ATPs. 'Buffering chains', such as PENT  ATPs  PRObs  IDP, can be observed owing to the coherent directionality of the buffering links in a. Such chains are not necessarily transitive; e.g., although PENT buffers ATPs, which buffers PRObs, there is no direct buffering from PENT to PRObs. The interacting functional modules are shown at their approximate locations on a schematic metabolic chart. Functional modules are named to reflect the main common metabolic processes of the genes involved: ACAL, acetaldehyde and acetate metabolism ATPs, ATP synthase COA, pantothenate and coenzyme-A biosynthesis ETHxt, ethanol transport GLUCN, gluconeogenesis GLYC, glycolysis IDP, isocitrate dehydrogenase LYSbs, lysine biosynthesis PENT, pentose phosphate pathway PRObs, proline biosynthesis PROcat, proline catabolism PYR, pyruvate metabolism RESPIR, respiratory chain STEROL, sterol biosynthesis TCA, TCA cycle TRPcat, tryptophan catabolism URA, uracil biosynthesis. Segrè et al., Nature Genet. 37, 77 (2005) 18. Lecture WS 2004/05 Bioinformatics III

70 Segrè et al., Nature Genet. 37, 77 (2005)
Conclusion Epistatic interactions, manifested in the effects of mutations on the phenotypes caused by other mutations, may help uncover the functional organization of complex biological networks. Here, system-level epistatic interactions were studied by computing growth phenotypes of all single and double knockouts of 890 metabolic genes in S. cerevisiae, using framework of FBA. A new scale for epistasis identified a distinctive trimodal distribution of these epistatic effects, allowing gene pairs to be classified as buffering, aggravating or noninteracting. The epistatic interaction network could be organized hierarchically into function-enriched modules that interact with each other 'monochromatically' (i.e., with purely aggravating or purely buffering epistatic links). This property extends the concept of epistasis from single genes to functional units and provides a new definition of biological modularity, which emphasizes interactions between, rather than within, functional modules. Our approach can be used to infer functional gene modules from purely phenotypic epistasis measurements. Segrè et al., Nature Genet. 37, 77 (2005) 18. Lecture WS 2004/05 Bioinformatics III

71 V21 Metabolic Pathway Analysis (MPA)
Metabolic Pathway Analysis searches for meaningful structural and functional units in metabolic networks. The most promising, very similar approaches are based on convex analysis and use the sets of elementary flux modes (Schuster et al. 1999, 2000) and extreme pathways (Schilling et al. 2000). Both sets span the space of feasible steady-state flux distributions by non-decomposable routes, i.e. no subset of reactions involved in an EFM or EP can hold the network balanced using non-trivial fluxes. MPA can be used to study e.g. - routing + flexibility/redundancy of networks - functionality of networks - idenfication of futile cycles - gives all (sub)optimal pathways with respect to product/biomass yield - can be useful for calculability studies in MFA Klamt et al. Bioinformatics 19, 261 (2003) 18. Lecture WS 2004/05 Bioinformatics III

72 Metabolic Pathway Analysis: Elementary Flux Modes
The technique of Elementary Flux Modes (EFM) was developed prior to extreme pathways (EP) by Stephan Schuster, Thomas Dandekar and co-workers: Pfeiffer et al. Bioinformatics, 15, 251 (1999) Schuster et al. Nature Biotech. 18, 326 (2000) The method is very similar to the „extreme pathway“ method to construct a basis for metabolic flux states based on methods from convex algebra. Extreme pathways are a subset of elementary modes, and for many systems, both methods coincide. Are the subtle differences important? 18. Lecture WS 2004/05 Bioinformatics III

73 Two approaches for Metabolic Pathway Analysis?
The pathway P(v) is an elementary flux mode if it fulfills conditions C1 – C3. (C1) Pseudo steady-state. S  e = 0. This ensures that none of the metabolites is consumed or produced in the overall stoichiometry. (C2) Feasibility: rate ei  0 if reaction is irreversible. This demands that only thermodynamically realizable fluxes are contained in e. (C3) Non-decomposability: there is no vector v (unequal to the zero vector and to e) fulfilling C1 and C2 and that P(v) is a proper subset of P(e). This is the core characteristics for EFMs and EPs and supplies the decomposition of the network into smallest units (able to hold the network in steady state). C3 is often called „genetic independence“ because it implies that the enzymes in one EFM or EP are not a subset of the enzymes from another EFM or EP. Klamt & Stelling Trends Biotech 21, 64 (2003) 18. Lecture WS 2004/05 Bioinformatics III

74 Two approaches for Metabolic Pathway Analysis?
The pathway P(e) is an extreme pathway if it fulfills conditions C1 – C3 AND conditions C4 – C5. (C4) Network reconfiguration: Each reaction must be classified either as exchange flux or as internal reaction. All reversible internal reactions must be split up into two separate, irreversible reactions (forward and backward reaction). (C5) Systemic independence: the set of EPs in a network is the minimal set of EFMs that can describe all feasible steady-state flux distributions. Klamt & Stelling Trends Biotech 21, 64 (2003) Q: Nennen Sie die 2 Hauptunterschiede zwischen den Methoden Extreme Pathways und Elementarmodenanalyse. A: siehe Zusatzbedingungen C4 und C5. 18. Lecture WS 2004/05 Bioinformatics III

75 Two approaches for Metabolic Pathway Analysis?
A(ext) B(ext) C(ext) R1 R2 R3 R4 B R8 R7 R5 A C P R9 R6 D Klamt & Stelling Trends Biotech 21, 64 (2003) 18. Lecture WS 2004/05 Bioinformatics III

76 Reconfigured Network B A C P D
A(ext) B(ext) C(ext) R1 R2 R3 R4 B R8 R7f R7b A C P R5 R9 R6 D 3 EFMs are not systemically independent: EFM1 = EP4 + EP5 EFM2 = EP3 + EP5 EFM4 = EP2 + EP3 Klamt & Stelling Trends Biotech 21, 64 (2003) 18. Lecture WS 2004/05 Bioinformatics III

77 Property 1 of EFMs The only difference in the set of EFMs emerging upon reconfiguration consists in the two-cycles that result from splitting up reversible reactions. However, two-cycles are not considered as meaningful pathways. Valid for any network: Property 1 Reconfiguring a network by splitting up reversible reactions leads to the same set of meaningful EFMs. Klamt & Stelling Trends Biotech 21, 64 (2003) 18. Lecture WS 2004/05 Bioinformatics III

78 Software: FluxAnalyzer
What is the consequence of when all exchange fluxes (and hence all reactions in the network) are irreversible? EFMs and EPs always co-incide! Klamt & Stelling Trends Biotech 21, 64 (2003) 18. Lecture WS 2004/05 Bioinformatics III

79 Property 2 of EFMs Property 2
If all exchange reactions in a network are irreversible then the sets of meaningful EFMs (both in the original and in the reconfigured network) and EPs coincide. Klamt & Stelling Trends Biotech 21, 64 (2003) 18. Lecture WS 2004/05 Bioinformatics III

80 Reconfigured Network B A C P D
A(ext) B(ext) C(ext) R1 R2 R3 R4 B R8 R7f R7b A C P R5 R9 R6 D 3 EFMs are not systemically independent: EFM1 = EP4 + EP5 EFM2 = EP3 + EP5 EFM4 = EP2 + EP3 Klamt & Stelling Trends Biotech 21, 64 (2003) 18. Lecture WS 2004/05 Bioinformatics III

81 Comparison of EFMs and EPs
Problem EFM (network N1) EP (network N2) Recognition of 4 genetically indepen- Set of EPs does not contain operational modes: dent routes all genetically independent routes for converting (EFM1-EFM4) routes. Searching for EPs exclusively A to P. leading from A to P via B, no pathway would be found. Klamt & Stelling Trends Biotech 21, 64 (2003) 18. Lecture WS 2004/05 Bioinformatics III

82 Comparison of EFMs and EPs
Problem EFM (network N1) EP (network N2) Finding all the EFM1 and EFM2 are One would only find the optimal routes: optimal because they suboptimal EP1, not the optimal pathways for yield one mole P per optimal routes EFM1 and synthesizing P during mole substrate A EFM2. growth on A alone. (i.e. R3/R1 = 1), whereas EFM3 and EFM4 are only sub- optimal (R3/R1 = 0.5). Klamt & Stelling Trends Biotech 21, 64 (2003) 18. Lecture WS 2004/05 Bioinformatics III

83 Comparison of EFMs and EPs
Problem Analysis of network flexibility (structural robustness, redundancy): relative robustness of exclusive growth on A or B. EFM (network N1) 4 pathways convert A to P (EFM1-EFM4), whereas for B only one route (EFM8) exists. When one of the internal reactions (R4-R9) fails, for production of P from A 2 pathways will always „survive“. By contrast, removing reaction R8 already stops the production of P from B alone. EP (network N2) Only 1 EP exists for producing P by substrate A alone, and 1 EP for synthesizing P by (only) substrate B. One might suggest that both substrates possess the same redundancy of pathways, but as shown by EFM analysis, growth on substrate A is much more flexible than on B. Klamt & Stelling Trends Biotech 21, 64 (2003) 18. Lecture WS 2004/05 Bioinformatics III

84 Comparison of EFMs and EPs
Problem Relative importance of single reactions: relative importance of reaction R8. EFM (network N1) R8 is essential for producing P by substrate B, whereas for A there is no structurally „favored“ reaction (R4-R9 all occur twice in EFM1-EFM4). However, considering the optimal modes EFM1, EFM2, one recognizes the importance of R8 also for growth on A. EP (network N2) Consider again biosynthesis of P from substrate A (EP1 only). Because R8 is not involved in EP1 one might think that this reaction is not important for synthesizing P from A. However, without this reaction, it is impossible to obtain optimal yields (1 P per A; EFM1 and EFM2). Klamt & Stelling Trends Biotech 21, 64 (2003) 18. Lecture WS 2004/05 Bioinformatics III

85 Comparison of EFMs and EPs
Problem Enzyme subsets and excluding reaction pairs: suggest regulatory structures or rules. EFM (network N1) R6 and R9 are an enzyme subset. By contrast, R6 and R9 never occur together with R8 in an EFM. Thus (R6,R8) and (R8,R9) are excluding reaction pairs. (In an arbitrary composable steady-state flux distribution they might occur together.) EP (network N2) The EPs pretend R4 and R8 to be an excluding reaction pair – but they are not (EFM2). The enzyme subsets would be correctly identified. However, one can construct simple examples where the EPs would also pretend wrong enzyme subsets (not shown). Klamt & Stelling Trends Biotech 21, 64 (2003) 18. Lecture WS 2004/05 Bioinformatics III

86 Comparison of EFMs and EPs
Problem Pathway length: shortest/longest pathway for production of P from A. EFM (network N1) The shortest pathway from A to P needs 2 internal reactions (EFM2), the longest 4 (EFM4). EP (network N2) Both the shortest (EFM2) and the longest (EFM4) pathway from A to P are not contained in the set of EPs. Klamt & Stelling Trends Biotech 21, 64 (2003) 18. Lecture WS 2004/05 Bioinformatics III

87 Comparison of EFMs and EPs
Problem Removing a reaction and mutation studies: effect of deleting R7. EFM (network N1) All EFMs not involving the specific reactions build up the complete set of EFMs in the new (smaller) sub-network. If R7 is deleted, EFMs 2,3,6,8 „survive“. Hence the mutant is viable. EP (network N2) Analyzing a subnetwork implies that the EPs must be newly computed. E.g. when deleting R2, EFM2 would become an EP. For this reason, mutation studies cannot be performed easily. Klamt & Stelling Trends Biotech 21, 64 (2003) 18. Lecture WS 2004/05 Bioinformatics III

88 Integrated Analysis of Metabolic and Regulatory Networks
Sofar, studies of large-scale cellular networks have focused on their connectivities. The emerging picture shows a densely-woven web where almost everything is connected to everything. In the cell‘s metabolic network, hundreds of substrates are interconnected through biochemical reactions. Although this could in principle lead to the simultaneous flow of substrates in numerous directions, in practice metabolic fluxes pass through specific pathways ( high flux backbone, V20). Topological studies sofar did not consider how the modulation of this connectivity might also determine network properties. Therefore it is important to correlate the network topology (picture derived from EFMs and EPs) with the expression of enzymes in the cell. 18. Lecture WS 2004/05 Bioinformatics III

89 Analyze transcriptional control in metabolic networks
Regulatory and metabolic functions of cells are mediated by networks of interacting biochemical components. Metabolic flux is optimized to maximize metabolic efficiency under different conditions. Control of metabolic flow: - allosteric interactions - covalent modifications involving enzymatic activity - transcription (revealed by genome-wide expression studies) Here: N. Barkai and colleagues analyzed published experimental expression data of Saccharomyces cerevisae. Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 18. Lecture WS 2004/05 Bioinformatics III

90 Recurrence signature algorithm
Availability of DNA microarray data  study transcriptional response of a complete genome to different experimental conditions. An essential task in studying the global structure of transcriptional networks is the gene classification. Commonly used clustering algorithms classify genes successfully when applied to relatively small data sets, but their application to large-scale expression data is limited by 2 well-recognized drawbacks: - commonly used algorithms assign each gene to a single cluster, whereas in fact genes may participate in several functions and should thus be included in several clusters - these algorithms classify genes on the basis of their expression under all experimental conditions, whereas cellular processes are generally affected only by a small subset of these conditions. Ihmels et al. Nat Genetics 31, 370 (2002) 18. Lecture WS 2004/05 Bioinformatics III

91 Recurrence signature algorithm
Aim: identify transcription „modules“ (TMs).  a set of randomly selected genes is unlikely to be identical to the genes of any TM. Yet many such sets do have some overlap with a specific TM. In particular, sets of genes that are compiled according to existing knowledge of their functional (or regulatory) sequence similarity may have a significant overlap with a transcription module. Algorithm receives a gene set that partially overlaps a TM and then provides the complete module as output. Therefore this algorithm is referred to as „signature algorithm“. Ihmels et al. Nat Genetics 31, 370 (2002) 18. Lecture WS 2004/05 Bioinformatics III

92 Recurrence signature algorithm
normalization of data identify modules classify genes into modules a, The signature algorithm. b , Recurrence as a reliability measure. The signature algorithm is applied to distinct input sets containing different subsets of the postulated transcription module. If the different input sets give rise to the same module, it is considered reliable. c, General application of the recurrent signature method. Ihmels et al. Nat Genetics 31, 370 (2002) 18. Lecture WS 2004/05 Bioinformatics III

93 Correlation between genes of the same metabolic pathway
Distribution of the average correlation between genes assigned to the same metabolic pathway in the KEGG database. The distribution corresponding to random assignment of genes to metabolic pathways of the same size is shown for comparison. Importantly, only genes coding for enzymes were used in the random control. Interpretation: pairs of genes associated with the same metabolic pathway show a similar expression pattern. However, typically only a set of the genes assigned to a given pathway are coregulated. Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 18. Lecture WS 2004/05 Bioinformatics III

94 Correlation between genes of the same metabolic pathway
Genes of the glycolysis pathway (according KEGG) were clustered and ordered based on the correlation in their expression profiles. Shown here is the matrix of their pair-wise correlations. The cluster of highly correlated genes (orange frame) corresponds to genes that encode the central glycolysis enzymes. The linear arrangement of these genes along the pathway is shown at right. Of the 46 genes assigned to the glycolysis pathway in the KEGG database, only 24 show a correlated expression pattern. In general, the coregulated genes belong to the central pieces of pathways. Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 18. Lecture WS 2004/05 Bioinformatics III

95 Coexpressed enzymes often catalyze linear chain of reactions
Coregulation between enzymes associated with central metabolic pathways. Each branch corresponds to several enzymes. In the cases shown, only one of the branches downstream of the junction point is coregulated with upstream genes. Interpretation: coexpressed enzymes are often arranged in a linear order, corresponding to a metabolic flow that is directed in a particular direction. Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 18. Lecture WS 2004/05 Bioinformatics III

96 Co-regulation at branch points
To examine more systematically whether coregulation enhances the linearity of metabolic flow, analyze the coregulation of enzymes at metabolic branch-points. Search KEGG for metabolic compounds that are involved in exactly 3 reactions. Only consider reactions that exist in S.cerevisae. 3-junctions can integrate metabolic flow (convergent junction) or allow the flow to diverge in 2 directions (divergent junction). In the cases where several reactions are catalyzed by the same enzymes, choose one representative so that all junctions considered are composed of precisely 3 reactions catalyzed by distinct enzymes. Each 3-junction is categorized according to the correlation pattern found between enzymes catalyzing its branches. Correlation coefficients > 0.25 are considered significant. Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 18. Lecture WS 2004/05 Bioinformatics III

97 Coregulation pattern in three-point junctions
All junctions corresponding to metabolites that participate in exactly 3 reactions (according to KEGG) were identified and the correlations between the genes associated with each such junction were calculated. The junctions were grouped according to the directionality of the reactions, as shown. Divergent junctions, which allow the flow of metabolites in two alternative directions, predominantly show a linear coregulation pattern, where one of the emanating reaction is correlated with the incoming reaction (linear regulatory pattern) or the two alternative outgoing reactions are correlated in a context-dependent manner with a distinct isozyme catalyzing the incoming reaction (linear switch). By contrast, the linear regulatory pattern is significantly less abundant in convergent junctions, where the outgoing flow follows a unique direction, and in conflicting junctions that do not support metabolic flow. Most of the reversible junctions comply with linear regulatory patterns. Indeed, similar to divergent junctions, reversible junctions allow metabolites to flow in two alternative directions. Reactions were counted as coexpressed if at least two of the associated genes were significantly correlated (correlation coefficient >0.25). As a random control, we randomized the identity of all metabolic genes and repeated the analysis. In the majority of divergent junctions, only one of the emanating branches is significantly coregulated with the incoming reaction that synthesizes the metabolite. Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 18. Lecture WS 2004/05 Bioinformatics III

98 Co-regulation at branch points: conclusions
The observed co-regulation patterns correspond to a linear metabolic flow, whose directionality can be switched in a condition-specific manner. When analyzing junctions that allow metabolic flow in a larger number of directions, there also only a few important branches are coregulated with the incoming branch. Therefore: transcription regulation is used to enhance the linearity of metabolic flow, by biasing the flow toward only a few of the possible routes. Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 18. Lecture WS 2004/05 Bioinformatics III

99 Connectivity of metabolites
The connectivity of a given metabolite is defined as the number of reactions connecting it to other metabolites. Shown are the distributions of connectivity between metabolites in an unrestricted network () and in a network where only correlated reactions are considered (). In accordance with previous results (Jeong et al. 2000) , the connectivity distribution between metabolites follows a power law (log-log plot). In contrast, when coexpression is used as a criterion to distinguish functional links, the connectivity distribution becomes exponential (log-linear plot). Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 18. Lecture WS 2004/05 Bioinformatics III

100 Differential regulation of isozymes
Observe that isozymes at junction points are often preferentially coexpressed with alternative reactions.  investigate their role in the metabolic network more systematically. Two possible functions of isozymes associated with the same metabolic reaction. An isozyme pair could provide redundancy which may be needed for buffering genetic mutations or for amplifying metabolite production. Redundant isozymes are expected to be coregulated. Alternatively, distinct isozymes could be dedicated to separate biochemical pathways using the associated reaction. Such isozymes are expected to be differentially expressed with the two alternative processes. Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 18. Lecture WS 2004/05 Bioinformatics III

101 Differential regulation of isozymes in central metabolic PW
Arrows represent metabolic pathways composed of a sequence of enzymes. Coregulation is indicated with the same color (e.g., the isozyme represented by the green arrow is coregulated with the metabolic pathway represented by the green arrow).  Most members of isozyme pairs are separately coregulated with alternative processes. Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 18. Lecture WS 2004/05 Bioinformatics III

102 Differential regulation of isozymes
Regulatory pattern of all gene pairs associated with a common metabolic reaction (according to KEGG). All such pairs were classified into several classes: (1) parallel, where each gene is correlated with a distinct connected reaction (a reaction that shares a metabolite with the reaction catalyzed by the respective gene pair); (2) selective, where only one of the enzymes shows a significant correlation with a connected reaction; and (3) converging, where both enzymes were correlated with the same reaction. Correlations coefficients >0.25 were considered significant. To be counted as parallel, rather than converging, we demanded that the correlation with the alternative reaction be <80% of the correlation with the preferred reaction. Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 18. Lecture WS 2004/05 Bioinformatics III

103 Differential regulation of isozymes: interpretation
The primary role of isozyme multiplicity is to allow for differential regulation of reactions that are shared by separated processes. Dedicating a specific enzyme to each pathway may offer a way of independently controlling the associated reaction in response to pathway-specific requirements, at both the transcriptional and the post-transcriptional levels. Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 18. Lecture WS 2004/05 Bioinformatics III

104 Genes coexpressed with metabolic pathways
Identify the coregulated subparts of each metabolic pathway and identify relevant experimental conditions that induce or repress the expression of the pathway genes. Also associate additional genes showing similar expression profiles with each pathway using the signature algorithm. Input: set of genes, some of which are expected to be coregulated. Output: coregulated part of the input and additional coregulated genes together with the set of conditions where the coregulation is realized. Numerous genes were found that are not directly involved in enzymatic steps: - transporters - transcription factors Q: Von welchen Proteinklassen erwarten Sie, dass sie mit den Proteinen eines biochemischen Pfades co-exprimiert sind? A: Feeder-Pathways, Transporter, Transkriptions- faktoren. 18. Lecture WS 2004/05 Bioinformatics III

105 Co-expression of transporters
Transporter genes are co-expressed with the relevant metabolic pathways providing the pathways with its metabolites. Co-expression is marked in green. Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 18. Lecture WS 2004/05 Bioinformatics III

106 Co-regulation of transcription factors
Transcription factors are often co-regulated with their regulated pathways. Shown here are transcription factors which were found to be co-regulated in the analysis. Co-regulation is shown by color-coding such that the transcription factor and the associated pathways are of the same color. Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 18. Lecture WS 2004/05 Bioinformatics III

107 Hierarchical modularity in the metabolic network
Sofar: co-expression analysis revealed a strong tendency toward coordinated regulation of genes involved in individual metabolic pathways. Does transcription regulation also define a higher-order metabolic organization, by coordinated expression of distinct metabolic pathways? Based on observation that feeder pathways (which synthesize metabolites) are frequently coexpressed with pathways using the synthesized metabolites. Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 18. Lecture WS 2004/05 Bioinformatics III

108 Feeder-pathways/enzymes
Feeder pathways or genes co-expressed with the pathways they fuel. The feeder pathways (light blue) provide the main pathway (dark blue) with metabolites in order to assist the main pathway, indicating that co-expression extends beyond the level of individual pathways. These results can be interpreted in the following way: the organism will produce those enzymes that are needed. Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 18. Lecture WS 2004/05 Bioinformatics III

109 Hierarchical modularity in the metabolic network
Derive hierarchy by applying an iterative signature algorithm to the metabolic pathways, and decreasing the resolution parameter (coregulation stringency) in small steps. Each box contains a group of coregulated genes (transcription module). Strongly associated genes (left) can be associated with a specific function, whereas moderately correlated modules (right) are larger and their function is less coherent. The merging of 2 branches indicates that the associated modules are induced by similar conditions. All pathways converge to one of 3 low-resolution modules: amino acid biosynthesis, protein synthesis, and stress. Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 18. Lecture WS 2004/05 Bioinformatics III

110 Hierarchical modularity in the metabolic network
Although amino acids serve as building blocks for proteins, the expression of genes mediating these 2 processes is clearly uncoupled! This may reflect the association of rapid cell growth (which triggers enhanced protein synthesis) with rich growth conditions, where amino acids are readily available and do not need to be synthesized. Amino acid biosynthesis genes are only required when external amino acids are scarce. In support of this view, a group of amino acid transporters converged to the protein synthesis module, together with other pathways required for rapid cell growth (glucose fermentation, nucleotide synthesis and fatty acid synthesis). Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 18. Lecture WS 2004/05 Bioinformatics III

111 Global network properties
Jeong et al. showed that the structural connectivity between metabolites imposes a hierarchical organization of the metabolic network. That analysis was based on connectivity between substrates, considering all potential connections. Here, analysis is based on coexpression of enzymes. In both approaches, related metabolic pathways were clustered together! There are, however, some differences in the particular groupings (not discussed here), and importantly, when including expression data the connectivity pattern of metabolites changes from a power-law dependence to an exponential one corresponding to a network structure with a defined scale of connectivity. This reflects the reduction in the complexity of the network. Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 18. Lecture WS 2004/05 Bioinformatics III

112 Summary Transcription regulation is prominently involved in shaping the metabolic network of S. cerevisae. Transcription leads the metabolic flow toward linearity. Individual isozymes are often separately coregulated with distinct processes, providing a means of reducing crosstalk between pathways using a common reaction. Transcription regulation entails a higher-order structure of the metabolic network. It exists a hierarchical organization of metabolic pathways into groups of decreasing expression coherence. Ihmels, Levy, Barkai, Nat. Biotech 22, 86 (2004) 18. Lecture WS 2004/05 Bioinformatics III

113 V24 Framework for computation of elementary modes
18. Lecture WS 2004/05 Bioinformatics III

114 Definition and benefits of Elementary Modes
Consider a metabolic network with m metabolites and q reactions. Reactions may involve further metabolites that are not considered as proper members of the system of study. These are considered to be buffered, and are called external metabolites in opposition to the m metabolites within the boundary of the system, called internal metabolites. The stoichiometry matrix N is an m × q matrix. Its element nij is the signed stoichiometric coefficient of metabolite i in reaction j with the following sign convention: negative for educts, positive for products. 18. Lecture WS 2004/05 Bioinformatics III

115 Definition and benefits of Elementary Modes
Some reactions, called irreversible reactions, are thermodynamically feasible in only one direction under the normal conditions of the system. Therefore, the reaction indices are split into two sets: Irrev (the set of irreversible reaction indices) and Rev (the set of reversible reaction indices). A flux vector (flux distribution), denoted v, is a q-vector of the reaction space q, in which each element vi describes the net rate of the ith reaction. Sometimes we are interested only in the relative proportions of fluxes in a flux vector. In this sense, two flux vectors v and v' can be seen to be equivalent, denoted by v ≃ v', if and only if there is some α > 0 such that v = α · v'. 18. Lecture WS 2004/05 Bioinformatics III

116 Definition and benefits of Elementary Modes
Metabolism involves fast reactions and high turnover of substances compared to events of gene regulation. Therefore, it is often assumed that metabolite concentrations and reaction rates are equilibrated, thus constant, in the timescale of study. The metabolic system is then considered to be in quasi steady state. This assumption implies Nv = 0. Thermodynamics impose the rate of each irreversible reaction to be nonnegative. Consequently the set of feasible flux vectors is restricted to P = {v  q : Nv = 0 and vi ≥ 0, i  Irrev}     (1) P is a set of q-vectors that obey a finite set of homogeneous linear equalities and inequalities, namely the |Irrev| inequalities defined by vi ≥ 0, i  Irrev and the m equalities defined by Nv = 0. P is therefore – by definition – a convex polyhedral cone. Q: Welche Bedingungen müssen gültige Flüsse in einem metabolischen Netz erfüllen? 18. Lecture WS 2004/05 Bioinformatics III

117 Elementary Flux Modes Metabolic pathway analysis serves to describe the (infinite) set P of feasible states by providing a (finite) set of vectors that allow the generation of any vectors of P and are of fundamental importance for the overall capabilities of the metabolic system. One of this set is the so-called set of elementary (flux) modes (EMs). For a given flux vector v, we note R(v) = {i : vi ≠ 0} the set of indices of the reactions participating in v. R(v) can be seen as the underlying pathway of v. 18. Lecture WS 2004/05 Bioinformatics III

118 Elementary Flux Modes By definition, a flux vector e is an elementary mode (EM) if and only if it fulfills the following three conditions: In other words, e is an EM if and only if it works at quasi steady state, is thermodynamically feasible and there is no other non-null flux vector (up to a scaling) that both satisfies these constraints and involves a proper subset of its participating reactions. With this convention, reversible modes are here considered as two vectors of opposite directions. Q: welche Eigenschaften müssen Elementarmoden erfüllen? 18. Lecture WS 2004/05 Bioinformatics III

119 Applications of elementary modes in metabolic studies
(1) Identification of pathways: The set of EMs comprises all admissible routes through the network and thus of "pathways" in the classical sense, i.e. of routes that convert some educts into some products. (2) Network flexibility: The number of EMs is at least a rough measure of the network's flexibility (redundancy, structural robustness) to perform a certain function. (3) Identification of all pathways with optimal yield: Consider the linear optimization problem, where all flux vectors with optimal product yield are to be identified, i.e. where the moles of products generated per mole of educts is maximal. Then, one or several of the EMs reach this optimum and any optimal flux vector is a convex combination of these optimal EMs. (4) Importance of reactions: The importance or relevance of a reaction can be assessed by its participation frequency or/and flux values in the EMs. (4a) Inference of viability of mutants: If a reaction is involved in all growth-related EMs its deletion can be predicted to be lethal, since all EMs would disappear. 18. Lecture WS 2004/05 Bioinformatics III

120 Applications of elementary modes in metabolic studies
(5) Reaction correlations: EMs can be used to analyze structural couplings between reactions, which might give hints for underlying regulatory circuits. An extreme case is an enzyme (or reaction) subset (set of reactions which can operate only together) or a pair of mutually excluding reactions (two reactions never occurring together in any EM). (6) Detection of thermodynamically infeasible cycles: EMs representing internal cycles (without participation of external material or energy sources) are infeasible by laws of thermodynamics and thus reflect structural inconsistencies. Q: wieso sind interne geschlossene Zyklen nicht zulässig? Q: wie kann man durch EM-Analyse lethale Mutanten in einem metabolischen Netz identifizieren? 18. Lecture WS 2004/05 Bioinformatics III

121 A unified framework - Elementary modes as extreme rays in networks of irreversible reactions
In the particular case of a metabolic system with only irreversible reactions, the set of admissible reactions reads: Compared with (1) P is in this case a particular, namely a pointed polyhedral cone. A pointed polyhedral cone. Dashed lines represent virtual cuts of unbounded areas 18. Lecture WS 2004/05 Bioinformatics III

122 Pointed polyhedral cones
This geometry can be intuitively understood, noting that there are certainly 'enough' intersecting half-spaces (given by the inequalities v ≥ 0) to have this 'pointed' effect in 0: P contains no real line (otherwise there coexist x and -x not null in P, a contradiction with the constraint v ≥ 0). The figure even suggests that a pointed polyhedral cone can be either defined in an implicit way, by the set of constraints as we did until now, or in an explicit or generative way, by its 'edges', the so-called extreme rays (or generating vectors) that unambiguously define its boundaries. In the following, we show that elementary modes always correspond to extreme rays of a particular pointed cone as defined in (3) and that their computation therefore matches to the so-called extreme ray enumeration problem, i.e. the problem of enumerating all extreme rays of a pointed polyhedral cone defined by its constraints. ! Q: wann kann eine Linie zu dem Lösungs- raum der gültigen Flüssen gehören? 18. Lecture WS 2004/05 Bioinformatics III

123 Pointed polyhedral cone – more precise
Definition P is a pointed polyhedral cone of d if and only if P is defined by a full rank h × d matrix A (rank(A) = d) such that, Insert: the rank of a matrix is the dimension of the range of the matrix, corresponding to the number of linearly independent rows or columns of the matrix. The h rows of the matrix A represent h linear inequalities, whereas the full rank mention imposes the "pointed" effect in 0. Note that a pointed polyhedral cone is, in general, not restricted to be located completely in the positive orthant as in (3). For example, the cone considered in extreme-pathway analysis may have negative parts (namely for exchange reactions), however, by using a particular configuration it is ensured that the spanned cone is pointed. 18. Lecture WS 2004/05 Bioinformatics III

124 Extreme rays A vector r is said to be a ray of P(A) if r ≠ 0 and for all α > 0, α · r  P(A). We identify two rays r and r' if there is some α > 0 such that r = α · r' and we denote r ≃ r', analogous as introduced above for flux vectors. For any vector x in P(A), the zero set or active set Z(x) is the set of inequality indices satisfied by x with equality. Noting Ai• the ith row of A, Z(x) = {i : Ai•x = 0}. Zero sets can be used to characterize extreme rays. Q: Welche Beziehung herrscht zwischen Z(x), dem Zero-Set, und R(x), dem Reaktionssatz? A: Z(x) ist das Komplement von R(x). 18. Lecture WS 2004/05 Bioinformatics III

125 Extreme rays - definition
Definition 1: Extreme ray Let r be a ray of the pointed polyhedral cone P(A). The following statements are equivalent: (a) r is an extreme ray of P(A) (b) if r' is a ray of P(A) with Z(r)  Z(r') then r' ≃ r Since A is full rank, 0 is the unique vector that solves all constraints with equality. The extreme rays are those rays of P(A) that solve a maximum but not all constraints with equalities. This is expressed in (b) by requiring that no other ray in P(A) solves the same constraints plus additional ones with equalities. Note that in (b) Z(r) = Z(r') consequently holds. An important property of the extreme rays is that they form a finite set of generating vectors of the pointed cone: any vector of P(A) can be expressed as a non-negative linear combination of extreme rays, and the converse is true: all non-negative combinations of extreme rays lie in P(A). The set of extreme rays is the unique minimal set of generating vectors of a pointed cone P(A) (up to positive scalings). 18. Lecture WS 2004/05 Bioinformatics III

126 Elementary modes Lemma 1: EMs in networks of irreversible reactions
In a metabolic system where all reactions are irreversible, the EMs are exactly the extreme rays of P = {v  q : Nv = 0 and v ≥ 0}. 18. Lecture WS 2004/05 Bioinformatics III

127 Notations We denote the original reaction network by S and the reconfigured network (with all reversible reactions split up) by S'. The reactions of S are indexed from 1 to q. Irrev denotes the set of irreversible reaction indices and Rev the reversible ones. An irreversible reaction indexed i gives rise to a reaction of S' indexed i. A reversible reaction indexed i gives rise to two opposite reactions of S' indexed by the pairs (i,+1) and (i,-1) for the forward and the backward respectively. The reconfiguration of a flux vector v  q of S is a flux vector v'   Irrev  Rev × {-1;+1} of S' such that 18. Lecture WS 2004/05 Bioinformatics III

128 Notations Let N' be the stoichiometry matrix of S'. N' can be written as N' = [N - NRev] where NRev consists of all columns of N corresponding to reversible reactions. Note that if v is a flux vector of S and v' is its reconfiguration then Nv = N'v'. If possible, i.e. if v'  Irrev  Rev × {-1;+1} is such that for any reversible reaction index i  Rev at least one of the two coefficients v'(i,+1) or v'(i,-1) equals zero, then we define the reverse operation, called back-configuration that maps v' back to a flux vector v such that: 18. Lecture WS 2004/05 Bioinformatics III

129 Theorem 1: EMs in original and in reconfigured networks
Let S be a metabolic system and S' its reconfiguration by splitting up reversible reactions. Then the set of EMs of S' is the union of a) the set of reconfigured EMs of S b) the set of two-cycles made of a forward and a backward reaction of S' derived from the same reversible reaction of S 18. Lecture WS 2004/05 Bioinformatics III

130 V25 Framework for computation of elementary modes II
18. Lecture WS 2004/05 Bioinformatics III

131 Elementary modes Lemma 1: EMs in networks of irreversible reactions
In a metabolic system where all reactions are irreversible, the EMs are exactly the extreme rays of P = {v  q : Nv = 0 and v ≥ 0}. Proof: P is the solution set of the linear inequalities defined by where I is the q × q identity matrix. Since it contains I, A is full rank and therefore P is a pointed polyhedral cone. All v  P obey Nv = 0, thus the 2m first inequalities defined by A hold with equality for all vectors in P and the inclusion condition of Definition 1 can be restricted to the last q inequalities, i.e. the inequalities corresponding to the reactions. Inclusion over the zero set can be equivalently seen as containment over the set of non-zeros in v, i.e. R(v). Consequently, e  P is an extreme ray of P if and only if: for all e'  P : R(e')  R(e)  e' = 0 or e' ≃ e, i.e. if and only if e is elementary. Thus, all three conditions in (2) are fulfilled. 18. Lecture WS 2004/05 Bioinformatics III

132 Strategy of the Double Description Method
Iteratively build a minimal DD pair (Ak, Rk) from a minimal DD pair (Ak - 1, Rk - 1), where Ak is a submatrix of A made of k rows of A. At each step the columns of Rk are the extreme rays of P(Ak), the convex polyhedron defined by the linear inequalities Ak. The incremental step introduces a constraint of A that is not yet satisfied by all computed extreme rays. Some extreme rays are kept, some are discarded and new ones are generated. The generation of new extreme rays relies on the notion of adjacent extreme rays. Definition 2: Adjacent extreme rays Let r and r' be distinct rays of the pointed polyhedral cone P(A). Then the following statements are equivalent: (a) r and r' are adjacent extreme rays (b) if r" is a ray of P(A) with Z(r) ∩ Z(r')  Z(r") then either r" ≃ r or r" ≃ r' 18. Lecture WS 2004/05 Bioinformatics III

133 Initialization The initialization of the double description method must be done with a minimal DD pair. One possibility is the following. Since P is pointed, A has full rank and contains a nonsingular submatrix of order d denoted by Ad. Insert: a square matrix A has an inverse A-1 (so that A  A-1 = 1) if its determinant |A|  0 Hence, Ad-1 can be constructed and (Ad, Ad-1) is a minimal DD pair which works as initialization and leads directly to step k = d. Note: there is some freedom in choosing a submatrix Ad or some alternative starting minimal DD pair. 18. Lecture WS 2004/05 Bioinformatics III

134 Incremental step Assume (Ak-1, Rk–1) is a minimal DD pair and consider a kth constraint defined by a not yet extracted row of A, denoted Ai•. Let J be the set of column indices of Rk - 1 and rj, j  J, its column vectors, i.e. the extreme rays of P(Ak – 1), the polyhedral cone of the previous iteration. Ai• splits J in three parts (see Figure) whether rj satisfies the constraint with strict inequality (positive ray), with equality (zero ray) or does not satisfy it (negative ray): J+ = {j  J : Ai• rj > 0} J0 = {j  J : Ai• rj = 0}     (6) J- = {j  J : Ai• rj < 0} Double description incremental step. The scene is best visualized with a polytope; consider the cube pictured here as a 3 projection of a 4 polyhedral cone. Extreme rays from the previous iteration are {a,b,c,d,e,f,g,h} whose adjacencies are represented by edges. For the considered constraint, whose null space is the hyperplane depicted by the bold black border lines, b and f are positive rays, a and c are zero rays, d, e, g and h are negative rays. b, f, a and c satisfy the constraint and are kept for the next iteration. {f,e} and {f,g} are the only two pairs of adjacent positive/negative rays and only they give rise to new rays: i and j at the intersection of the hyperplane and the respective edges. The new polytope is then defined by its extreme rays: {a,b,c,f,i,j}. 18. Lecture WS 2004/05 Bioinformatics III

135 Minimality of Rk Minimality of Rk is ensured in considering all positive rays, all zero rays and new rays obtained as combination of a positive and a negative ray that are adjacent to each other. For convenience, we denote by Adj the index set of the newly generated rays in which every new ray is expressed by a pair of indices corresponding to the two adjacent rays combined. Hence, Rk is defined as the set of column vectors rj, j  J' with The incremental step is repeated until k = h i.e. having treated all rows of the matrix A. The columns of the final matrix Rm are the extreme rays of P(A) 18. Lecture WS 2004/05 Bioinformatics III

136 Computing EMs The Double Description Method together with Theorem 1 offers a framework for computing EMs. The only steps to include are a reconfiguration step that splits reversible reactions and builds the matrix A, and a post-processing step that gets rid of futile two-cycles and computes the back-configuration. The dimension of the space is given by the number of reactions in the reconfigured network: q' = q + |Rev|. This results in the following algorithmic scheme: 18. Lecture WS 2004/05 Bioinformatics III


Download ppt "V18 EcoCyc Analysis of E.coli Metabolism"

Similar presentations


Ads by Google