Predicting Gene Expression using Logic Modeling and Optimization Abhimanyu Krishna New Challenges in the European Area: Young Scientist’s 1st International Baku Forum
Gene Regulatory Network reconstruction R A TR B TR C p A p A p A BC Input Stimuli C R C B p What is Gene Expression? -> Regulation? -> Gene Regulatory Network? Introduction:
Literature based Gene Regulatory Network Experimental expression data + Missing expression values in grey How to contextualize literature to our experimental conditions Objective
4 Stable state Unstable transient state Biological processes represented as transitions in a landscape “Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states” Introduction: Networks of interactions
5 Why these predictions are not trivial? Noisy network reconstruction process “Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states”
6 Problem: Inconsistency between network and experimental expression data Solution: Contextualize the Network using experimental expression data “Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states”
7 Why is this an optimization problem? “Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states”
8 Why is this an optimization problem? Local consistency “Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states”
9 Why is this an optimization problem? Local consistency Edge removal “Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states”
10 Why is this an optimization problem? Local consistency Global consistency “Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states”
11 Stable state Unstable transient state “Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states” Which property are we going to use in the optimization? Network stability
12 Objective function: This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) Iterative network pruning
Objective function: This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) Iterative network pruning
14 Objective function: This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) Iterative network pruning
15 Objective function: This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) Iterative network pruning
16 Objective function: This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) Iterative network pruning
17 Objective function: This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) Iterative network pruning
18 Objective function: This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) Iterative network pruning
19 Objective function: This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) Iterative network pruning
20 But the contribution of interactions to the network stability it is not linearly independent. The evaluation of one specific link is highly dependent of the links already removed or, in other words, the order of removal. We are going to capture interdependencies between variables considering sequentially both the probability distribution of positive circuits and separated edges. Positive circuit Negative circuit “Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states” Thomas R, Thieffry D, Kaufman M: DYNAMICAL BEHAVIOR OF BIOLOGICAL REGULATORY NETWORKS.1. BIOLOGICAL ROLE OF FEEDBACK LOOPS AND PRACTICAL USE OF THE CONCEPT OF THE LOOP-CHARACTERISTIC STATE. Bulletin of Mathematical Biology 1995, 57: Positive circuits are necessary condition to have several fixed points
21 Objective function: This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) Iterative network pruning Positive Circuit 1
22 Objective function: This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) Iterative network pruning Positive Circuit 2
23 Objective function: This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) Iterative network pruning Positive Circuit 3
24 Which property are we going to use in the optimization? Network stability “Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states”
25 Biological scope targeted by this approach: transitions between long term expression patterns or stable states Epithelial-mesenchymal transition Epithelial Mesenchymal “Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states” Example:
26 Computing attractors in a discrete dynamical system (Boolean) Based on logic functions and the assumption of only 2 possible gene states: active (ON or 1) and inactive (OFF or 0). Logic functions: The state of the node x i at time t+1 depends on the state of its regulators at time t. Updating scheme: Synchronous Types of attractors: fixed points and limit cycles Fixed point Limit cycle “Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states”
27 Consistency between expression data and network stable states “Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states”
28 Optimization of h(x) (objective function) h(x) = X 1 +X 2 +X 3 +X 4 +X 5 + x 6 X i = 0 or 1 Network topology optimized using an Estimation of Distribution Algorithm (EDA) Toy example: Iterative network pruning “Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states”
29 Top 10 solutions Initial population Next population EDA: toy example
30 EDA: toy example Top 10 solutions Initial population Next population
31 EDA: toy example Top 10 solutions Initial population Next population
32 EDA: toy example Top 10 solutions Initial population Next population
33 EDA: toy example Top 10 solutions Initial population Next population 0.7
34 EDA: toy example Top 10 solutions Initial population Next population
35 EDA: toy example Top 10 solutions Initial population Next population
36 EDA: toy example Top 10 solutions Initial population Next population
37 EDA: toy example Top 10 solutions Initial population Next population
38 EDA: toy example Top 10 solutions Initial population Next population
39 EDA: toy example Top 10 solutions Initial population Next population
40 EDA: toy example Top 10 solutions Initial population Next population STOP CRITERIA
41 Objective function: This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) Iterative network pruning
Objective function: This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) Iterative network pruning
43 Objective function: This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) Iterative network pruning
44 Objective function: This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) Iterative network pruning
45 Objective function: This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) Iterative network pruning
46 Objective function: This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) Iterative network pruning
47 Objective function: This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) Iterative network pruning
48 Objective function: This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) Iterative network pruning
49 But the contribution of interactions to the network stability it is not linearly independent. The evaluation of one specific link is highly dependent of the links already removed or, in other words, the order of removal. We are going to capture interdependencies between variables considering sequentially both the probability distribution of positive circuits and separated edges. Positive circuit Negative circuit “Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states” Thomas R, Thieffry D, Kaufman M: DYNAMICAL BEHAVIOR OF BIOLOGICAL REGULATORY NETWORKS.1. BIOLOGICAL ROLE OF FEEDBACK LOOPS AND PRACTICAL USE OF THE CONCEPT OF THE LOOP-CHARACTERISTIC STATE. Bulletin of Mathematical Biology 1995, 57: Positive circuits are necessary condition to have several fixed points
50 Objective function: This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) Iterative network pruning Positive Circuit 1
51 Objective function: This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) Iterative network pruning Positive Circuit 2
52 Objective function: This score S uses the normalized Hamming distance (h) to compare N Boolean gene expression values (σ) between all calculated steady states (α) of a pruned network and the two known phenotypes (φ1 and φ2) defined by the expression data, in order to identify the two best-matching phenotype/steady state couples (φα1 and φα2) Iterative network pruning Positive Circuit 3
53 “Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states” Algorithm:
54 Predictions based on the consensus between the familiy of alternative solutions “Predicting missing expression values in gene regulatory networks using a discrete logic modeling optimization guided by network stable states”
012/08/30/nar.gks785.full Software Paper Availability:
Isaac Crespo Computational Biology Unit (LCSB) Abhimanyu Krishna Bioinformatic core (LCSB) Antony Le Béchec Antonio del Sol Head of Computational Biology Unit (LCSB) Life sciences research unit (LSRU) Vital-IT (SIB) Thank you! Questions?
57