Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reverse engineering gene regulatory networks Dirk Husmeier Adriano Werhli Marco Grzegorczyk.

Similar presentations


Presentation on theme: "Reverse engineering gene regulatory networks Dirk Husmeier Adriano Werhli Marco Grzegorczyk."— Presentation transcript:

1 Reverse engineering gene regulatory networks Dirk Husmeier Adriano Werhli Marco Grzegorczyk

2 Systems biology Learning signalling pathways and regulatory networks from postgenomic data

3

4 unknown

5 high- throughput experiments postgenomic data

6 unknown data machine learning statistical methods

7 true network extracted network Does the extracted network provide a good prediction of the true interactions?

8 Reverse Engineering of Regulatory Networks Can we learn the network structure from postgenomic data themselves? Statistical methods to distinguish between –Direct interactions –Indirect interactions Challenge: Distinguish between –Correlations –Causal interactions Breaking symmetries with active interventions: –Gene knockouts (VIGs, RNAi)

9 direct interaction common regulator indirect interaction co-regulation

10

11 Relevance networks Graphical Gaussian models Bayesian networks

12 Relevance networks Graphical Gaussian models Bayesian networks

13

14 Relevance networks (Butte and Kohane, 2000) 1.Choose a measure of association A(.,.) 2.Define a threshold value t A 3.For all pairs of domain variables (X,Y) compute their association A(X,Y) 4. Connect those variables (X,Y) by an undirected edge whose association A(X,Y) exceeds the predefined threshold value t A

15 Association scores

16 12 X 21 X 21 ‘direct interaction’ ‘common regulator’ ‘indirect interaction’ X 21 12 strong correlation σ 12

17 Pairwise associations without taking the context of the system into consideration

18 Relevance networks Graphical Gaussian models Bayesian networks

19 Graphical Gaussian Models 2 2 1 1 direct interaction Partial correlation, i.e. correlation conditional on all other domain variables Corr(X 1,X 2 |X 3,…,X n ) strong partial correlation π 12

20 direct interaction common regulator indirect interaction co-regulation Distinguish between direct and indirect interactions A and B have a low partial correlation

21 Graphical Gaussian Models 2 2 1 1 direct interaction Partial correlation, i.e. correlation conditional on all other domain variables Corr(X 1,X 2 |X 3,…,X n ) Problem: #observations < #variables strong partial correlation π 12

22

23 Shrinkage estimation and the lemma of Ledoit-Wolf

24

25 Graphical Gaussian Models direct interaction common regulator indirect interaction P(A,B)=P(A)·P(B) But: P(A,B|C)≠P(A|C)·P(B|C)

26 Undirected versus directed edges Relevance networks and Graphical Gaussian models can only extract undirected edges. Bayesian networks can extract directed edges. But can we trust in these edge directions? It may be better to learn undirected edges than learning directed edges with false orientations.

27 Relevance networks Graphical Gaussian models Bayesian networks

28 A CB D EF NODES EDGES Marriage between graph theory and probability theory. Directed acyclic graph (DAG) representing conditional independence relations. It is possible to score a network in light of the data: P(D|M), D:data, M: network structure. We can infer how well a particular network explains the observed data.

29

30 Bayesian networks versus causal networks Bayesian networks represent conditional (in)dependence relations - not necessarily causal interactions.

31 Bayesian networks versus causal networks A CB A CB True causal graph Node A unknown

32 Bayesian networks versus causal networks A CB Equivalence classes: networks with the same scores: P(D|M). Equivalent networks cannot be distinguished in light of the data. A CB A CB A CB

33 Equivalence classes of BNs A B C A B A B A B C C C A B C completed partially directed graphs (CPDAGs) A C B v-structure P(A,B)=P(A)·P(B) P(A,B|C) ≠ P(A|C)·P(B|C) P(A,B)≠P(A)·P(B) P(A,B|C)=P(A|C)·P(B|C)

34 Symmetry breaking A CB Interventions Prior knowledge A CB A CB A CB

35 Symmetry breaking A CB Interventions Prior knowledge A CB A CB A CB

36 Interventional data AB AB AB inhibition of A AB down-regulation of Bno effect on B A and B are correlated

37 Learning Bayesian networks from data P(M|D) = P(D|M) P(M) / Z M: Network structure. D: Data

38

39

40 Learning Bayesian networks from data P(M|D) = P(D|M) P(M) / Z M: Network structure. D: Data

41

42

43

44

45 Evaluation On real experimental data, using the gold standard network from the literature On synthetic data simulated from the gold- standard network

46 Evaluation On real experimental data, using the gold standard network from the literature On synthetic data simulated from the gold- standard network

47 From Sachs et al., Science 2005

48 Evaluation: Raf signalling pathway Cellular signalling network of 11 phosphorylated proteins and phospholipids in human immune systems cell Deregulation  carcinogenesis Extensively studied in the literature  gold standard network

49 Raf regulatory network From Sachs et al Science 2005

50 Flow cytometry data Intracellular multicolour flow cytometry experiments: concentrations of 11 proteins 5400 cells have been measured under 9 different cellular conditions (cues) Downsampling to 100 instances (5 separate subsets): indicative of microarray experiments

51

52 Two types of experiments

53

54 Evaluation On real experimental data, using the gold standard network from the literature On synthetic data simulated from the gold- standard network

55 Comparison with simulated data 1

56 Raf pathway

57 Comparison with simulated data 2

58 Steady-state approximation

59

60 Real versus simulated data Real biological data: full complexity of biological systems. The “gold-standard” only represents our current state of knowledge; it is not guaranteed to represent the true network. Simulated data: Simplifications that might be biologically unrealistic. We know the true network.

61 How can we evaluate the reconstruction accuracy ?

62 true network extracted network biological knowledge (gold standard network) Evaluation of learning performance

63

64

65

66 Performance evaluation: ROC curves

67 We use the Area Under the Receiver Operating Characteristic Curve (AUC). 0.5<AUC<1 AUC=1 AUC=0.5 Performance evaluation: ROC curves

68 Alternative performance evaluation: True positive (TP) scores We set the threshold such that we obtain 5 spurious edges (5 FPs) and count the corresponding number of true edges (TP count).

69 5 FP counts BN GGM RN Alternative performance evaluation: True positive (TP) scores

70 data Directed graph evaluation - DGE true regulatory network Thresholding edge scores TP:1/2 FP:0/4 TP:2/2 FP:1/4 concrete network predictions lowhigh

71 data Undirected graph evaluation - UGE skeleton of the true regulatory network Thresholding undirected edge scores TP:1/2 FP:0/1 TP:2/2 FP:1/1 highlow concrete network (skeleton) predictions

72

73 Synthetic data, observations

74 Synthetic data, interventions

75 Cytometry data, interventions

76 How can we explain the difference between synthetic and real data ?

77 Simulated data are “simpler”. No mismatch between models used for data generation and inference.

78 Complications with real data Can we trust our gold-standard network?

79 Raf regulatory network From Sachs et al Science 2005

80 Regulation of Raf-1 by Direct Feedback Phosphorylation. Molecular Cell, Vol. 17, 2005 Dougherty et al Disputed structure of the gold- standard network

81 Stabilisation through negative feedback loops inhibition Complications with real data Interventions might not be “ideal” owing to negative feedback loops.

82 Conclusions 1 BNs and GGMs outperform RNs, most notably on Gaussian data. No significant difference between BNs and GGMs on observational data. For interventional data, BNs clearly outperform GGMs and RNs, especially when taking the edge direction (DGE score) rather than just the skeleton (UGE score) into account.

83 Conclusions 2 Performance on synthetic data better than on real data. Real data: more complex Real interventions are not ideal Errors in the gold-standard network

84 How do we model feedback loops?

85 Unfolding in time


Download ppt "Reverse engineering gene regulatory networks Dirk Husmeier Adriano Werhli Marco Grzegorczyk."

Similar presentations


Ads by Google