Presentation is loading. Please wait.

Presentation is loading. Please wait.

Inferring gene regulatory networks from transcriptomic profiles Dirk Husmeier Biomathematics & Statistics Scotland.

Similar presentations


Presentation on theme: "Inferring gene regulatory networks from transcriptomic profiles Dirk Husmeier Biomathematics & Statistics Scotland."— Presentation transcript:

1 Inferring gene regulatory networks from transcriptomic profiles Dirk Husmeier Biomathematics & Statistics Scotland

2 Overview Introduction Methodology Circadian regulation in Arabidopsis Application to synthetic biology DREAM

3 Network reconstruction from postgenomic data

4 Accuracy Computational complexity Methods based on correlation and mutual information Conditional independence graphs Mechanistic models Bayesian networks

5 Accuracy Computational complexity Methods based on correlation and mutual information Conditional independence graphs Mechanistic models Bayesian networks

6

7 direct interaction common regulator indirect interaction co-regulation Pairwise associations do not take the context of the systeminto consideration Shortcomings

8 Accuracy Computational complexity Methods based on correlation and mutual information Conditional independence graphs Mechanistic models Bayesian networks

9 Conditional independence graphs (CIGs) 2 2 1 1 Direct interaction Partial correlation, i.e. correlation conditional on all other domain variables Corr(X 1,X 2 |X 3,…,X n ) strong partial correlation π 12 Inverse of the covariance matrix

10 CorrelationPartial correlation high high high high low

11 Conditional Independence Graphs (CIGs) 2 2 1 1 Direct interaction Partial correlation, i.e. correlation conditional on all other domain variables Corr(X 1,X 2 |X 3,…,X n ) Problem: #observations < #variables  Covariance matrix is singular strong partial correlation π 12 Inverse of the covariance matrix

12 Accuracy Computational complexity Methods based on correlation and mutual information Conditional independence graphs Mechanistic models Bayesian networks

13 Regulatory network

14 Description with differential equations Rates Concentrations Kinetic parameters q

15 Model Parameters q Probability theory  Likelihood

16 1) Practical problem: numerical optimization q 2) Conceptual problem: overfitting ML estimate increases on increasing the network complexity

17 Overfitting problem True pathway Poorer fit to the data Equal or better fit to the data

18 Regularization E.g.: Bayesian information criterion (BIC) Maximum likelihood parameters Number of parameters Number of data points Data misfit term Regularization term

19 Complexity LikelihoodBIC

20 Model selection: find the best pathway Select the model with the highest posterior probability: This requires an integration over the whole parameter space:

21 MCMC based schemes q Problem: excessive computational costs

22 Accuracy Computational complexity Methods based on correlation and mutual information Conditional independence graphs Mechanistic models Bayesian networks

23 Friedman et al. (2000), J. Comp. Biol. 7, 601-620 Marriage between graph theory and probability theory

24 Bayes net ODE model

25 Model Parameters q Bayesian networks: integral analytically tractable!

26 UAI 1994

27 Example: 2 genes  16 different network structures Compute

28 Identify the best network structure Ideal scenario: Large data sets, low noise

29 Uncertainty about the best network structure Limited number of experimental replications, high noise

30 Sample of high-scoring networks

31 Feature extraction, e.g. marginal posterior probabilities of the edges

32 Sample of high-scoring networks Feature extraction, e.g. marginal posterior probabilities of the edges High-confident edge High-confident non-edge Uncertainty about edges

33 Number of structures Number of nodes Sampling with MCMC

34 UAI 1994

35 Model Parameters q Bayesian networks: integral analytically tractable!

36 [A]= w1[P1] + w2[P2] + w3[P3] + w4[P4] + noise Linearity assumption A P1 P2 P4 P3 w1 w4 w2 w3

37 Homogeneity assumption Parameters don’t change with time

38 Homogeneity assumption Parameters don’t change with time

39 Limitations of the homogeneity assumption

40 Overview Introduction Methodology Circadian regulation in Arabidopsis Application to synthetic biology DREAM

41 Accuracy Computational complexity Methods based on correlation and mutual information Conditional independence graphs Mechanistic models Bayesian networks

42 Example: 4 genes, 10 time points t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t9t9 t 10 X (1) X 1,1 X 1,2 X 1,3 X 1,4 X 1,5 X 1,6 X 1,7 X 1,8 X 1,9 X 1,10 X (2) X 2,1 X 2,2 X 2,3 X 2,4 X 2,5 X 2,6 X 2,7 X 2,8 X 2,9 X 2,10 X (3) X 3,1 X 3,2 X 3,3 X 3,4 X 3,5 X 3,6 X 3,7 X 3,8 X 3,9 X 3,10 X (4) X 4,1 X 4,2 X 4,3 X 4,4 X 4,5 X 4,6 X 4,7 X 4,8 X 4,9 X 4,10

43 t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t9t9 t 10 X (1) X 1,1 X 1,2 X 1,3 X 1,4 X 1,5 X 1,6 X 1,7 X 1,8 X 1,9 X 1,10 X (2) X 2,1 X 2,2 X 2,3 X 2,4 X 2,5 X 2,6 X 2,7 X 2,8 X 2,9 X 2,10 X (3) X 3,1 X 3,2 X 3,3 X 3,4 X 3,5 X 3,6 X 3,7 X 3,8 X 3,9 X 3,10 X (4) X 4,1 X 4,2 X 4,3 X 4,4 X 4,5 X 4,6 X 4,7 X 4,8 X 4,9 X 4,10 Standard dynamic Bayesian network: homogeneous model

44 Limitations of the homogeneity assumption

45 Our new model: heterogeneous dynamic Bayesian network. Here: 2 components t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t9t9 t 10 X (1) X 1,1 X 1,2 X 1,3 X 1,4 X 1,5 X 1,6 X 1,7 X 1,8 X 1,9 X 1,10 X (2) X 2,1 X 2,2 X 2,3 X 2,4 X 2,5 X 2,6 X 2,7 X 2,8 X 2,9 X 2,10 X (3) X 3,1 X 3,2 X 3,3 X 3,4 X 3,5 X 3,6 X 3,7 X 3,8 X 3,9 X 3,10 X (4) X 4,1 X 4,2 X 4,3 X 4,4 X 4,5 X 4,6 X 4,7 X 4,8 X 4,9 X 4,10

46 t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t9t9 t 10 X (1) X 1,1 X 1,2 X 1,3 X 1,4 X 1,5 X 1,6 X 1,7 X 1,8 X 1,9 X 1,10 X (2) X 2,1 X 2,2 X 2,3 X 2,4 X 2,5 X 2,6 X 2,7 X 2,8 X 2,9 X 2,10 X (3) X 3,1 X 3,2 X 3,3 X 3,4 X 3,5 X 3,6 X 3,7 X 3,8 X 3,9 X 3,10 X (4) X 4,1 X 4,2 X 4,3 X 4,4 X 4,5 X 4,6 X 4,7 X 4,8 X 4,9 X 4,10 Our new model: heterogeneous dynamic Bayesian network. Here: 3 components

47 Extension of the model q

48 q k h Number of components (here: 3) Allocation vector

49 Analytically integrate out the parameters q k h Number of components (here: 3) Allocation vector

50 Non-homogeneous model  Non-linear model

51 [A]= w1[P1] + w2[P2] + w3[P3] + w4[P4] + noise BGe: Linear model A P1 P2 P4 P3 w1 w4 w2 w3

52 Can we get an approximate nonlinear model without data discretization? y x

53 Idea: piecewise linear model y x

54 t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t9t9 t 10 X (1) X 1,1 X 1,2 X 1,3 X 1,4 X 1,5 X 1,6 X 1,7 X 1,8 X 1,9 X 1,10 X (2) X 2,1 X 2,2 X 2,3 X 2,4 X 2,5 X 2,6 X 2,7 X 2,8 X 2,9 X 2,10 X (3) X 3,1 X 3,2 X 3,3 X 3,4 X 3,5 X 3,6 X 3,7 X 3,8 X 3,9 X 3,10 X (4) X 4,1 X 4,2 X 4,3 X 4,4 X 4,5 X 4,6 X 4,7 X 4,8 X 4,9 X 4,10 Inhomogeneous dynamic Bayesian network with common changepoints

55 Inhomogenous dynamic Bayesian network with node-specific changepoints t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t9t9 t 10 X (1) X 1,1 X 1,2 X 1,3 X 1,4 X 1,5 X 1,6 X 1,7 X 1,8 X 1,9 X 1,10 X (2) X 2,1 X 2,2 X 2,3 X 2,4 X 2,5 X 2,6 X 2,7 X 2,8 X 2,9 X 2,10 X (3) X 3,1 X 3,2 X 3,3 X 3,4 X 3,5 X 3,6 X 3,7 X 3,8 X 3,9 X 3,10 X (4) X 4,1 X 4,2 X 4,3 X 4,4 X 4,5 X 4,6 X 4,7 X 4,8 X 4,9 X 4,10

56

57 Overview Introduction Methodology Circadian regulation in Arabidopsis Application to synthetic biology DREAM

58 Circadian regulation in Arabidopsis thaliana

59 Collaboration with the Institute of Molecular Plant Sciences at Edinburgh University (Andrew Millar’s group) - Focus on: 9 circadian genes: LHY, CCA1, TOC1, ELF4, ELF3, GI, PRR9, PRR5, and PRR3 - Four time series measured under constant light condition at 13 time points: 0h, 2h,…, 24h, 26h - Seedlings entrained with different light:dark cycles between 10h:10h (T 20 ) and 14h:14h (T 28 ). Circadian rhythms in Arabidopsis thaliana

60 Posterior probability of changepoints

61 Sample of high-scoring networks

62 Marginal posterior probabilities of the edges P=1 P=0 P=0.5 Predict an interaction if marginal posterior probability > 0.5

63 Plant Clockwork from the literature Review – Rob McClung, Plant Cell 2006 Two major gene classes… Morning genes e.g. LHY, CCA1 … repress evening genes e.g. TOC1, ELF3, ELF4, GI, LUX … which activate LHY and CCA1

64 CCA1 LHY PRR9 GI ELF3 TOC1 ELF4 PRR5 PRR3 False negative Which interactions from the literature are found? True positive Blue: activations Red: Inhibitions

65 CCA1 LHY PRR9 GI ELF3 TOC1 ELF4 PRR5 PRR3 False negative Which interactions from the literature are found? True positive Blue: activations Red: Inhibitions True positives (TP) = 8 False negatives (FN) = 5 Recall= 8/13= 62%

66 Which proportion of predicted interactions are confirmed by the literature? False positives Blue: activations Red: Inhibitions True positive

67 Which proportion of predicted interactions are confirmed by the literature? False positives Blue: activations Red: Inhibitions True positive True positives (TP) = 8 False positives (FP) = 13 Precision = 8/21= 38%

68 Precision= 38% CCA1 LHY PRR9 GI ELF3 TOC1 ELF4 PRR5 PRR3 Recall= 62%

69 True positives (TP) = 8 False positives (FP) = 13 False negatives (FN) = 5 True negatives (TN) = 9²-8-13-5= 55 Sensitivity = TP/[TP+FN] = 62% Specificity = TN/[TN+FP] = 81% Recall Proportion of avoided non-interactions

70 Core plant clock model X LHY/ CCA1 TOC1 Y (GI) PRR9/ PRR7 Morning Evening Locke et al. Mol. Syst. Biol. 2006

71 Core plant clock model X LHY/ CCA1 TOC1 Y (GI) PRR9/ PRR7 Morning Evening Locke et al. Mol. Syst. Biol. 2006 Yes

72 Non-stationarity in the regulatory process

73 Non-stationarity in the network structure

74 Flexible network structure.

75 Flexible network structure with regularization

76

77

78

79

80 ICML 2010

81 Morphogenesis in Drosophila melanogaster Gene expression measurements over 66 time steps of 4028 genes (Arbeitman et al., Science, 2002). Selection of 11 genes involved in muscle development. Zhao et al. (2006), Bioinformatics 22

82 Transition probabilities: flexible structure with regularization Morphogenetic transitions: Embryo  larva larva  pupa pupa  adult

83

84

85 Overview Introduction Methodology Circadian regulation in Arabidopsis Application to synthetic biology DREAM

86

87

88

89 Can we learn the switch Galactose  Glucose? Can we learn the network structure?

90 NIPS 2010

91 Node 1 Node i Node p Hierarchical Bayesian model Segment H

92 Exponential versus binomial prior distribution Exploration of various information sharing options

93 Task 1: Changepoint detection Switch of the carbon source: Galactose  Glucose

94

95 Task 2: Network reconstruction Precision Proportion of identified interactions that are correct Recall Proportion of true interactions that we successfully recovered

96 BANJO: Conventional homogeneous DBN TSNI: Method based on differential equations Inference: optimization, “best” network

97

98 Sample of high-scoring networks

99 Marginal posterior probabilities of the edges P=1 P=0 P=0.5

100 Keep interactions with a posterior probability > 0.5 Better evaluation: Consider all possible thresholds  Precision-recall curves

101 P=1 P=0 P=0.5 True network Thresh TP FP FN Prec Recall Precision= TP/(TP+FP) Recall= TP/(TP+FN)

102 P=1 P=0 P=0.5 True network Thresh0.9 TP1 FP0 FN1 Prec1 Recall1/2 Precision= TP/(TP+FP) Recall= TP/(TP+FN)

103 P=1 P=0 P=0.5 True network Thresh0.90.4 TP12 FP11 FN10 Prec12/3 Recall1/21 Precision= TP/(TP+FP) Recall= TP/(TP+FN)

104 P=1 P=0 P=0.5 True network Thresh0.90.4-0.01 TP122 FP112 FN100 Prec12/31/2 Recall1/211 Precision= TP/(TP+FP) Recall= TP/(TP+FN)

105 Galactose

106 Glucose

107

108 PriorCouplingAverage AUC None 0.70 ExponentialHard0.77 BinomialHard0.75 BinomialSoft0.75 Average performance over both phases: Galactose and glucose

109 How are we getting from here …

110 … to there ?!

111 Overview Introduction Methodology Circadian regulation in Arabidopsis Application to synthetic biology DREAM

112 DREAM: Dialogue for Reverse Engineering Assessments and Methods International network reconstruction competition: June-Sept 2010 Network# Transcription Factors # Genes# Chips Network 1 (in silico) 1951643805 Network 2992810160 Network 33344511805 Network 43335950536

113 Marco Grzegorczyk University of Dortmund Germany Frank Dondelinger BioSS / University of Edinburgh United Kingdom Sophie Lèbre Université de Strasbourg France Our team Andrej Aderhold BioSS / University of St Andrews United Kingdom

114 Our model: Developed for time series Data: Different experimental conditions, perturbations (e.g. ligand injection), interventions (e.g. gene knock-out, overexpression), time points

115 Change-point process Free allocation

116 Our model: Developed for time series Data: Different experimental conditions, perturbations (e.g. ligand injection), interventions (e.g. gene knock-out, overexpression), time points To limit computational complexity: Stick to a changepoint process How do we get an ordering of the genes?

117 PCA

118 SOM

119 No time series  Use 1-dim SOM to get a chip order

120 Ordering of chips  changepoint model

121 Slow MCMC convergence Network# Transcription Factors # Genes# Chips Network 1 (in silico) 1951643805 Network 2992810160 Network 33344511805 Network 43335950536

122 Problems with MCMC convergence Network# Transcription Factors # Genes# Chips Network 1 (in silico) 1951643805 Network 2992810160 Network 33344511805 Network 43335950536 PNAS 2009

123 Problems with MCMC convergence Network# Transcription Factors # Genes# Chips Network 1 (in silico) 1951643805 Network 2992810160 Network 33344511805 Network 43335950536 PNAS 2009

124 Methods competing in the competition Area under the precision-recall curve

125

126

127 Room for improvement: Higher-dimensional changepoint process Perturbations Experimental conditions

128 Marco Grzegorczyk University of Dortmund Germany Frank Dondelinger BioSS / University of Edinburgh United Kingdom Sophie Lèbre Université de Strasbourg France Acknowledgements Andrej Aderhold BioSS / University of St Andrews United Kingdom


Download ppt "Inferring gene regulatory networks from transcriptomic profiles Dirk Husmeier Biomathematics & Statistics Scotland."

Similar presentations


Ads by Google