Presentation is loading. Please wait.

Presentation is loading. Please wait.

Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich.

Similar presentations


Presentation on theme: "Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich."— Presentation transcript:

1 Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich

2 2 Part I 1. Discover and understand the underlying gene regulatory mechanisms by means of inferring them from data. 2. By using the inferred model, endeavor to make useful predictions by mathematical analysis and computer simulations.

3 3 genetic networks Complex regulatory networks among genes and their products control cell behaviors such as: Complex regulatory networks among genes and their products control cell behaviors such as: –cell cycle –apoptosis –cell differentiation –communication between cells in tissues A paramount problem is to understand the dynamical interactions among these genes, transcription factors, and signaling cascades, which govern the integrated behavior of the cell. A paramount problem is to understand the dynamical interactions among these genes, transcription factors, and signaling cascades, which govern the integrated behavior of the cell. Analogy: circuit diagram

4 4 Clinical Impact Model-based and computational analysis can Model-based and computational analysis can –open up a window on the physiology of an organism and disease progression; –translate into accurate diagnosis, target identification, drug development, and treatment.

5 5 What class of models should be chosen? The selection should be made in view of The selection should be made in view of –data requirements –goals of modeling and analysis. Data Model Goals

6 6 Classical tradeoff A fine model with many parameters A fine model with many parameters –may be able to capture detailed low-level phenomena (protein concentrations, reaction kinetics); –requires large amounts of data for inference A coarse model with low complexity A coarse model with low complexity –may succeed in capturing only high-level phenomena (e.g. which genes are ON/OFF); –requires smaller amounts of data

7 7 Ockhams Razor Underlies all scientific theory building. Underlies all scientific theory building. Model complexity should never be made higher than what is necessary to faithfully explain the data. Model complexity should never be made higher than what is necessary to faithfully explain the data. What kind of data do we have and how much? What kind of data do we have and how much? William of Ockham (1280-1349)

8 8 Boolean Networks 1. To what extent do such models represent reality? 2. Do we have the right type of data to infer these models? 3. What do we hope to learn from them?

9 9 Basic Structure of Boolean Networks A X B Boolean function A B X 0 0 1 0 1 1 1 0 0 1 1 1 1 means active/expressed 0 means inactive/unexpressed In this example, two genes (A and B) regulate gene X. In principle, any number of input genes are possible. Positive/negative feedback is also common (and necessary for homeostasis).

10 10 Dynamics of Boolean Networks 011001 ABCDEF Time 1 A 1 B 0 C 1 D 1 E 0 F

11 11 State Space of Boolean Networks Picture generated using the program DDLab. equate cellular states (or fates) with attractors. equate cellular states (or fates) with attractors. attractor states are stable under small perturbations attractor states are stable under small perturbations –most perturbations cause the network to flow back to the attractor. –some genes are more important and changing their activation can cause the system to transition to a different attractor.

12 12 Taylor, Galitski Non-Filamentous Filamentous Environmental Input Mpt5 Cdc42 Dig1/2 Kss1 Ras2 Ste11 Ste20 Ste7 Tec1-Ste12 Boolean model of the yeast filamentation network

13 13 But can we extract meaningful biological information from gene expression data entirely in the binary domain? We reasoned that if genes, when quantized to only two levels (1 or 0) would not be informative in separating known subclasses of tumors, then there would be little hope for Boolean inference of real genetic networks. We reasoned that if genes, when quantized to only two levels (1 or 0) would not be informative in separating known subclasses of tumors, then there would be little hope for Boolean inference of real genetic networks.

14 14 Gene expression analysis in the binary domain By using binary gene expression data and Hamming distance as a similarity metric, a separation between different subtypes of gliomas is evident, using multidimensional scaling. By using binary gene expression data and Hamming distance as a similarity metric, a separation between different subtypes of gliomas is evident, using multidimensional scaling. Shmulevich, I. and Zhang, W. (2002) Bioinformatics 18(4), 555-565.

15 15 Boolean Framework Limited amounts of data and the noisy nature of the measurements can make useful quantitative inferences problematic and a coarse-scale qualitative modeling approach seems to be justified. Limited amounts of data and the noisy nature of the measurements can make useful quantitative inferences problematic and a coarse-scale qualitative modeling approach seems to be justified. Boolean idealization enormously simplifies the modeling task. Boolean idealization enormously simplifies the modeling task. We wish to study the collective regulatory behavior without specific quantitative details. We wish to study the collective regulatory behavior without specific quantitative details. Boolean networks qualitatively capture typical genetic behavior. Boolean networks qualitatively capture typical genetic behavior. Albert, R & Othmer, H.G. (2003) J. Theor. Biol. 223, 1-18. Albert, R & Othmer, H.G. (2003) J. Theor. Biol. 223, 1-18. Mendoza, L., Thieffry, D. & Alvarez-Buylla, R.E. (1999) Bioinformatics 15, 593-606. Mendoza, L., Thieffry, D. & Alvarez-Buylla, R.E. (1999) Bioinformatics 15, 593-606. Huang, S. & Ingber, D. E. (2000) Exp. Cell Res. 261, 91-103. Huang, S. & Ingber, D. E. (2000) Exp. Cell Res. 261, 91-103. Li F, Long T, Lu Y, Ouyang Q, Tang C. (2004) PNAS. 101(14):4781-6. Li F, Long T, Lu Y, Ouyang Q, Tang C. (2004) PNAS. 101(14):4781-6.

16 16

17 17 Probabilistic Boolean Networks (PBN) Share the appealing rule-based properties of Boolean networks. Share the appealing rule-based properties of Boolean networks. Robust in the face of uncertainty. Robust in the face of uncertainty. Dynamic behavior can be studied in the context of Markov Chains. Dynamic behavior can be studied in the context of Markov Chains. –Boolean networks are just special cases. Close relationship to (dynamic) Bayesian networks Close relationship to (dynamic) Bayesian networks –Explicitly represent probabilistic relationships between genes. (Lähdesmäki et al. (2006) Sig. Proc., 86(4):814-834) –Can represent the same joint probability distribution. Allow quantification of influence of genes on other genes (stay tuned for examples) Allow quantification of influence of genes on other genes (stay tuned for examples) Shmulevich et al. (2002) Proceedings of the IEEE, 90(11), 1778-1792.

18 18 Basic structure of PBNs If we have several good competing predictors (functions) for a given gene and each one has determinative power, dont put all our faith in one of them!

19 19 Model Inference from Gene Expression Data Two approaches: Two approaches: –Coefficient of Determination (Dougherty et al. 2000) –Best-Fit Extensions Lähdesmäki et al. (2003) Machine Learning, 52, 147-167.

20 20 Coefficient of Determination (COD) COD is used to discover associations between variables. COD is used to discover associations between variables. It measures the degree to which the expression levels of an observed gene set can be used to improve the prediction of the expression of a target gene relative to the best possible prediction in the absence of observations. It measures the degree to which the expression levels of an observed gene set can be used to improve the prediction of the expression of a target gene relative to the best possible prediction in the absence of observations. Using the COD, one can find sets of genes related multivariately to a given target gene. Using the COD, one can find sets of genes related multivariately to a given target gene.

21 21 COD Definition Target gene Observed genes Optimal Predictor i is the error of the best (constant) estimate of x i in the absence of any conditional variables opt is the optimal error achieved by f

22 22 Constraints During Inference Constraining the class of predictors can have advantages: Constraining the class of predictors can have advantages: –lessening the data requirements for reliable estimation; –incorporating prior knowledge of the class of functions representing genetic interactions; –certain classes of functions are more plausible from the point of view of evolution, noise resilience, network dynamics, etc.

23 23 Example of Constraint: Post Classes The class is sufficiently large (this is important for inference). The class is sufficiently large (this is important for inference). An abundance of functions from this class will tend to prevent chaotic behavior in networks. An abundance of functions from this class will tend to prevent chaotic behavior in networks. Eukaryotic cells are not chaotic! (Shmulevich et al. (2005) PNAS 102(38), 13439-13444.) Eukaryotic cells are not chaotic! (Shmulevich et al. (2005) PNAS 102(38), 13439-13444.) Functions from this class have a natural way to ensure robustness against noise and uncertainty. Functions from this class have a natural way to ensure robustness against noise and uncertainty. Emil Post (1897-1954) Shmulevich et al. (2003) PNAS 100(19), 10734-10739.

24 24 Post Class Constraints During Inference We compared the Post classes to the class of all Boolean functions (i.e. no constraint) by estimating the corresponding prediction error for a set of target genes, using available gene expression data. We compared the Post classes to the class of all Boolean functions (i.e. no constraint) by estimating the corresponding prediction error for a set of target genes, using available gene expression data. We found that the optimal error of Post functions compares favorably with optimal error without constraint. We found that the optimal error of Post functions compares favorably with optimal error without constraint. A hypothesis testing-based study gives no statistically significant evidence against the use of constrained function classes (i.e. cost of constraint). A hypothesis testing-based study gives no statistically significant evidence against the use of constrained function classes (i.e. cost of constraint). Thus, Post classes are also plausible in light of experimental data. Thus, Post classes are also plausible in light of experimental data.

25 25 Subnetworks Theory and Examples aim: discover relatively small subnetworks aim: discover relatively small subnetworks –whose genes interact significantly and –whose genes are not strongly influenced by genes outside the subnetwork. Principle of Autonomy Principle of Autonomy Start with a seed gene set and iteratively adjoin new genes so as to enhance subnetwork autonomy. Start with a seed gene set and iteratively adjoin new genes so as to enhance subnetwork autonomy.

26 26 Growing Algorithm To achieve network autonomy, both of these strengths of connections should be high. The sensitivity of Y from the outside should be small. Various stopping criteria can be used Hashimoto et al. (2004) Bioinformatics 20(8): 1241-1247.

27 27 Cancer tissues need nutrients. Gliomas are highly angiogenic. Expression of VEGF is often elevated.

28 28 VEGF is elevated in advanced stage of gliomas Confirmation and localization by tissue microarray

29 29 VEGF protein is secreted outside the cells and binds to its receptor on the endothelial cells to promote their growth.

30 30 GRB2 FGF7 FSHR PTK7 VEGF Member of fibroblast growth factor family Follicle-stimulating hormone receptor Tyrosine kinase receptor The protein products of all four genes are part of signal transduction pathways that involve surface tyrosine kinase receptors. These receptors, when activated, recruit a number of adaptor proteins to relay the signal to downstream molecules GRB2 is one of the most crucial adaptors that have been identified. GRB2 is also a target for cancer intervention because of its link to multiple growth factor signal transduction pathways.

31 31 GRB2 GNB2 Molecular studies have demonstrated that activation of protein tyrosine kinase receptor- GRB-2 complex activates ras-MAP kinase-NF B pathway to complete the signal relay from outside the cells to the nucleus. GNB2 is a ras family member. MAP kinase 1 c-rel GNB2 influences MAP kinase 1, which in turn influences c-rel, an NF B component.

32 32 Such relationships should also be validated experimentally. Such relationships should also be validated experimentally. The networks built from our models provide valuable theoretical guidance for further experiments. The networks built from our models provide valuable theoretical guidance for further experiments.

33 33 IGFBP2 is overexpressed in high- grade gliomas IGFBP2 contributes to increased cell invasion.

34 34 IGFBP2 is elevated in advanced stage of gliomas Confirmation and localization by tissue microarray

35 35 Vector Low IGFBP2 clone High IGFBP2 clone 1 High IGFBP2 clone 2 IGFBP2 promotes glioma cell invasion in vitro

36 36 A. Niemistö, L. Hu, O. Yli-Harja, W. Zhang, I. Shmulevich, "Quantification of in vitro cell invasion through image analysis," International Conference of the IEEE Engineering in Medicine and Biology Society (EMBS'04), San Francisco, California, USA, Sep. 1-5, 2004.

37 37 +1 -561 c-Myc AP2 NF B NF B IGFBP2 A review of the literature showed that Cazals et al. (1999) indeed demonstrated that NF B activated the IGFBP2 promoter in lung alveolar epithelial cells.

38 38 Higher NF B activity in IGFBP2 overexpressing cells was also found. Transient transfection of IGFBP2 expressing vector together with NF B promoter reporter gene construct did not lead to increased NF B activity, suggesting an indirect effect of IGFBP2 on NF B NF B IGFBP2 TNFR2 ILK Our real-time PCR data showed that in stable IGFBP2- overexpressing cell lines, IGFBP2 indeed enhances ILK expression. In addition, IGFBP2 contains an RGD domain, implying its interaction with integrin molecules. ILK is in the integrin signal transduction pathway. Studies also showed that IGFBP2 affects cell apoptosis and TNFR2 is a known regulator of apoptosis

39 39 PBN web page http://personal.systemsbiology.net/ilya/PBN/PBN.htm Reprints Software (BN/PBN MATLAB Toolbox) Posters/Presentations Workshops Links PBN People

40 40 PBN Collaborators Wei Zhang Harri Lähdesmäki Olli Yli-Harja Jaakko Astola Edward Dougherty Ronaldo Hashimoto Marcel Brun Seungchan Kim Edward Suh Huai Li Michael Bittner Support NIH/NIGMS R21 GM070600-01 NIH/NIGMS R01 GM072855-01

41 Part II

42 42 Joint work with

43 43 Order/Chaos A broad body of work over the past 35 years has shown that a variety of model genetic regulatory networks behave in two broad regimes, ordered and chaotic, with an analytically and numerically demonstrated phase transition between the two. A broad body of work over the past 35 years has shown that a variety of model genetic regulatory networks behave in two broad regimes, ordered and chaotic, with an analytically and numerically demonstrated phase transition between the two.

44 44 Edge of chaos The boundary between order and chaos is called the complex regime or the critical phase. The boundary between order and chaos is called the complex regime or the critical phase. –The system can undergo a kind of phase transition. –Networks are most evolvable at the edge of chaos. Living system in a variable environment: Living system in a variable environment: –Strike a balance: malleability vs. stability –Must be stable, but not so stable that it remains forever static. –Must be malleable, but not so malleable that it is fragile in the face of perturbations.

45 45 Plausible and long-standing hypothesis: Real cells lie in the ordered regime or are critical. Life at the edge of chaos There has been no experimental data supporting this hypothesis.

46 46 Ordered networks Homeostasis Homeostasis A modest number of small recurrent patterns of gene activity (attractors) A modest number of small recurrent patterns of gene activity (attractors) –plausible models of the diverse cell types (or cell fates) of an organism –the phenotypic traits of the organism are encoded in the dynamical attractors of its underlying genetic regulatory network Confined avalanches of gene activity changes following transient perturbations in the activity of single genes Confined avalanches of gene activity changes following transient perturbations in the activity of single genes –i.e. confined damage spreading

47 47 Chaotic networks Nearby states lie on trajectories that diverge Nearby states lie on trajectories that diverge –hence, fail to exhibit a natural basis for homeostasis Have enormous attractors whose sizes scale exponentially with the number of genes Have enormous attractors whose sizes scale exponentially with the number of genes Exhibit vast avalanches of gene activity alterations following transient perturbations to single gene activities Exhibit vast avalanches of gene activity alterations following transient perturbations to single gene activities

48 48 The model class Random Boolean Networks (RBNs) - Kauffman (1969) ensemble approach Random Boolean Networks (RBNs) - Kauffman (1969) ensemble approach –One of the most intensively studied models of discrete dynamical systems. –Sustained interest from biology and physics communities. –Considered for many years as prototypes of nonlinear dynamical systems. RBNs are: RBNs are: –Structurally simple yet capable of remarkably rich complex behavior!

49 49 Connectivity (e.g. scale-free) Mean number of input variables

50 50 Bias The bias p of a random function is the probability that it takes on the value 1. The bias p of a random function is the probability that it takes on the value 1. If p = 0.5, then the function is unbiased. If p = 0.5, then the function is unbiased.

51 51 Connectivity, bias, and the phase transition Critical Phase Average Network Sensitivity Shmulevich & Kauffman (2004) Physical Review Letters, 93(4): 048701

52 52 Phase transition RBNs can be tuned to undergo a phase transition by RBNs can be tuned to undergo a phase transition by –tuning the connectivity K –tuning the bias p –tuning the scale-free exponent γ Aldana & Cluzel (2003) PNAS, 100(15):8710-4. Aldana & Cluzel (2003) PNAS, 100(15):8710-4. –tuning abundance of functional classes Shmulevich et al. (2003) PNAS 100(19):10734-9. Shmulevich et al. (2003) PNAS 100(19):10734-9.

53 53 Our approach Measure and compare the complexity of time series data of HeLa cells with that of mock data generated by RBNs operating in the ordered, critical, and chaotic regimes. Measure and compare the complexity of time series data of HeLa cells with that of mock data generated by RBNs operating in the ordered, critical, and chaotic regimes. We use the Lempel-Ziv (LZ) measure of complexity. We use the Lempel-Ziv (LZ) measure of complexity. Dataset: Whitfield et al. (2002) Mol. Biol. Cell. 13, 1977-2000. Dataset: Whitfield et al. (2002) Mol. Biol. Cell. 13, 1977-2000. –synchronized HeLa cells; 48 time points at 1-hour time intervals; 29,621 distinct genes

54 54 01100101101100100110 Lempel-Ziv Complexity The algorithm parses the sequence into shortest words that have not occurred previously and the complexity is defined as the number of such words. Words are unique, except possibly the last one. LZ Complexity = 7 01010101010101010101 LZ Complexity = 3

55 55 Lempel-Ziv Complexity Example 0*1*10*010*1101*100100*110 LZ Complexity = 7

56 56 Lempel-Ziv Complexity: some remarks Universal complexity measure Universal complexity measure Basis of powerful lossless compression schemes (ZIP, GIF, etc.) Basis of powerful lossless compression schemes (ZIP, GIF, etc.) –by replacing words with a pointer to a previous occurrence of the same word Optimal: compression rate approaches the entropy of the random sequence Optimal: compression rate approaches the entropy of the random sequence Asymptotically Gaussian: can be used for statistical test of randomness. Asymptotically Gaussian: can be used for statistical test of randomness.

57 57 Intuition Genes in ordered networks have low LZ complexities. Genes in ordered networks have low LZ complexities. Genes in chaotic networks have high LZ complexities. Genes in chaotic networks have high LZ complexities.

58 58 Binarization We used the well-known k-means algorithm with two groups, corresponding to the two binary values (0,1).

59 59 Lempel-Ziv complexity distributions of binarized HeLa data vs. random binary data

60 60 HeLa time-series data RBN Binarize 01101001101001101011 10011001100100110110 ordered critical chaotic LZ complexities Compute distance Find minimum 29,621 genes by 48 time points) (29,621 genes by 48 time points)

61 61 Distance between LZ distributions Kullback-Leibler (KL) distance Euclidean distance

62 62 Three techniques to tune ordered, critical, and chaotic regimes. 1. Fix p = 0.5, let K = 1, 2, 3, 4. 2. Fix K = 4, let p = 0.93301, 0.85355, 0.75, 0.5. 3. Scale-free topology with connectivity K( γ ). Vary scale-free exponent γ such that average network sensitivity is equal to the cases above. (Aldana & Cluzel (2003) PNAS, 100(15):8710-4)

63 63 But what about noise? Wouldnt noise make things look more chaotic? Wouldnt noise make things look more chaotic? There are two issues: There are two issues: –In the binary domain, the compound effect of noise amounts to a certain percentage of values in the time series data being flipped from zero to one or vice versa. –Many genes are expressed at levels that are below those corresponding to pure noise. Fortunately, using the HeLa data, it is possible to estimate both the binary noise probability and the global noise floor level as follows. Fortunately, using the HeLa data, it is possible to estimate both the binary noise probability and the global noise floor level as follows.

64 64 Estimate the noise floor There are 963 empty spots on the HeLa microarrays. There are 963 empty spots on the HeLa microarrays. As a conservative estimate, for each of the 48 microarrays, we used the 95th percentile of the values of the empty spots as the noise floor level for that array. As a conservative estimate, for each of the 48 microarrays, we used the 95th percentile of the values of the empty spots as the noise floor level for that array. Only those genes whose values exceed this global threshold at all time points are included for further analysis. Only those genes whose values exceed this global threshold at all time points are included for further analysis. –Hence our criteria are very stringent.

65 65 Estimate the noise probability q We made use of the replicated probes available on the arrays. We made use of the replicated probes available on the arrays. –2001 duplicate gene profiles of 48 time points. Keeping only those that exceeded the global threshold, we binarized each of the duplicate profiles and computed the normalized Hamming distance. Keeping only those that exceeded the global threshold, we binarized each of the duplicate profiles and computed the normalized Hamming distance. with a 95% bootstrap confidence interval of [0.32, 0.38].

66 66 Euclidean (fix p = 0.5, tune K) Shmulevich et al. (2005) PNAS 102(38):13439.

67 67 Kullback-Leibler (fix p = 0.5, tune K) Shmulevich et al. (2005) PNAS 102(38):13439.

68 68 Euclidean (fix K = 4, tune p) Shmulevich et al. (2005) PNAS 102(38):13439.

69 69 Kullback-Leibler (fix K = 4, tune p) Shmulevich et al. (2005) PNAS 102(38):13439.

70 70 Euclidean, Scale-free (tune γ ) Shmulevich et al. (2005) PNAS 102(38):13439.

71 71 Kullback-Leibler, Scale-free (tune γ ) Shmulevich et al. (2005) PNAS 102(38):13439.

72 72 Concluding remarks The results strongly suggest that HeLa cells are in the ordered regime or are critical, but not chaotic. The results strongly suggest that HeLa cells are in the ordered regime or are critical, but not chaotic. We cannot statistically distinguish between ordered and critical with these data. We cannot statistically distinguish between ordered and critical with these data. Critical networks appear to predict the distribution of genes whose activities are altered in several hundred knock-out mutants of yeast. (Serra et al. (2004) J. Theor. Biol. 227, 149-157) Critical networks appear to predict the distribution of genes whose activities are altered in several hundred knock-out mutants of yeast. (Serra et al. (2004) J. Theor. Biol. 227, 149-157) It will be important to use more realistic ensembles of model genetic networks to test whether our conclusions hold. It will be important to use more realistic ensembles of model genetic networks to test whether our conclusions hold.


Download ppt "Insights from Boolean Modeling of Genetic Regulatory Networks ilya shmulevich."

Similar presentations


Ads by Google