# Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

## Presentation on theme: "Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland."— Presentation transcript:

Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland

James Watson & Francis Crick, 1953

Frederick Sanger, 1980

Network reconstruction from postgenomic data

Model Parameters q

Friedman et al. (2000), J. Comp. Biol. 7, 601-620 Marriage between graph theory and probability theory

Bayes net ODE model

Model Parameters q Probability theory  Likelihood

Model Parameters q Bayesian networks: integral analytically tractable!

UAI 1994

Identify the best network structure Ideal scenario: Large data sets, low noise

Uncertainty about the best network structure Limited number of experimental replications, high noise

Sample of high-scoring networks

Feature extraction, e.g. marginal posterior probabilities of the edges High-confident edge High-confident non-edge Uncertainty about edges

Number of structures Number of nodes Sampling with MCMC

Madigan & York (1995), Guidici & Castello (2003)

Overview Introduction Limitations Methodology Application to morphogenesis Application to synthetic biology

Homogeneity assumption Interactions don’t change with time

Limitations of the homogeneity assumption

Example: 4 genes, 10 time points t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t9t9 t 10 X (1) X 1,1 X 1,2 X 1,3 X 1,4 X 1,5 X 1,6 X 1,7 X 1,8 X 1,9 X 1,10 X (2) X 2,1 X 2,2 X 2,3 X 2,4 X 2,5 X 2,6 X 2,7 X 2,8 X 2,9 X 2,10 X (3) X 3,1 X 3,2 X 3,3 X 3,4 X 3,5 X 3,6 X 3,7 X 3,8 X 3,9 X 3,10 X (4) X 4,1 X 4,2 X 4,3 X 4,4 X 4,5 X 4,6 X 4,7 X 4,8 X 4,9 X 4,10

Supervised learning. Here: 2 components t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t9t9 t 10 X (1) X 1,1 X 1,2 X 1,3 X 1,4 X 1,5 X 1,6 X 1,7 X 1,8 X 1,9 X 1,10 X (2) X 2,1 X 2,2 X 2,3 X 2,4 X 2,5 X 2,6 X 2,7 X 2,8 X 2,9 X 2,10 X (3) X 3,1 X 3,2 X 3,3 X 3,4 X 3,5 X 3,6 X 3,7 X 3,8 X 3,9 X 3,10 X (4) X 4,1 X 4,2 X 4,3 X 4,4 X 4,5 X 4,6 X 4,7 X 4,8 X 4,9 X 4,10

Changepoint model Parameters can change with time

Changepoint model Parameters can change with time

t1t1 t2t2 t3t3 t4t4 t5t5 t6t6 t7t7 t8t8 t9t9 t 10 X (1) X 1,1 X 1,2 X 1,3 X 1,4 X 1,5 X 1,6 X 1,7 X 1,8 X 1,9 X 1,10 X (2) X 2,1 X 2,2 X 2,3 X 2,4 X 2,5 X 2,6 X 2,7 X 2,8 X 2,9 X 2,10 X (3) X 3,1 X 3,2 X 3,3 X 3,4 X 3,5 X 3,6 X 3,7 X 3,8 X 3,9 X 3,10 X (4) X 4,1 X 4,2 X 4,3 X 4,4 X 4,5 X 4,6 X 4,7 X 4,8 X 4,9 X 4,10 Unsupervised learning. Here: 3 components

Extension of the model q

q

q k h Number of components (here: 3) Allocation vector

Analytically integrate out the parameters q k h Number of components (here: 3) Allocation vector

P(network structure | changepoints, data) P(changepoints | network structure, data) Birth, death, and relocation moves RJMCMC within Gibbs

Dynamic programming, complexity N 2

Collaboration with the Institute of Molecular Plant Sciences at Edinburgh University (Andrew Millar’s group) - Focus on: 9 circadian genes: LHY, CCA1, TOC1, ELF4, ELF3, GI, PRR9, PRR5, and PRR3 - Transcriptional profiles at 4*13 time points in 2h intervals under constant light for - 4 experimental conditions Circadian rhythms in Arabidopsis thaliana

Comparison with the literature Precision Proportion of identified interactions that are correct Recall = Sensitivity Proportion of true interactions that we successfully recovered Specificity Proportion of non-interactions that are successfully avoided

CCA1 LHY PRR9 GI ELF3 TOC1 ELF4 PRR5 PRR3 False negative Which interactions from the literature are found? True positive Blue: activations Red: Inhibitions True positives (TP) = 8 False negatives (FN) = 5 Recall= 8/13= 62%

Which proportion of predicted interactions are confirmed by the literature? False positives Blue: activations Red: Inhibitions True positive True positives (TP) = 8 False positives (FP) = 13 Precision = 8/21= 38%

Precision= 38% CCA1 LHY PRR9 GI ELF3 TOC1 ELF4 PRR5 PRR3 Recall= 62%

True positives (TP) = 8 False positives (FP) = 13 False negatives (FN) = 5 True negatives (TN) = 9²-8-13-5= 55 Sensitivity = TP/[TP+FN] = 62% Specificity = TN/[TN+FP] = 81% Recall Proportion of avoided non-interactions

Model extension So far: non-stationarity in the regulatory process

Non-stationarity in the network structure

Flexible network structure.

Model Parameters q

Use prior knowledge!

Flexible network structure.

Flexible network structure with regularization Hyperparameter Normalization factor

Flexible network structure with regularization Exponential prior versus Binomial prior with conjugate beta hyperprior

NIPS 2010

Overview Introduction Limitations Methodology Application to morphogenesis Application to synthetic biology

Morphogenesis in Drosophila melanogaster Gene expression measurements at 66 time points during the life cycle of Drosophila (Arbeitman et al., Science, 2002). Selection of 11 genes involved in muscle development. Zhao et al. (2006), Bioinformatics 22

Can we learn the morphogenetic transitions: embryo  larva larva  pupa pupa  adult ?

Average posterior probabilities of transitions Morphogenetic transitions: Embryo  larva larva  pupa pupa  adult

Can we learn changes in the regulatory network structure ?

Overview Introduction Limitations Methodology Application to morphogenesis Application to synthetic biology

Can we learn the switch Galactose  Glucose? Can we learn the network structure?

Task 1: Changepoint detection Switch of the carbon source: Galactose  Glucose

Task 2: Network reconstruction Precision Proportion of identified interactions that are correct Recall Proportion of true interactions that we successfully recovered

BANJO: Conventional homogeneous DBN TSNI: Method based on differential equations Inference: optimization, “best” network

Sample of high-scoring networks

Marginal posterior probabilities of the edges P=1 P=0 P=0.5

P=1 True network Thresh0.9 Prec1 Recall1/2 Precision Recall

P=1 P=0.5 True network Thresh0.90.4 Prec12/3 Recall1/21 Precision Recall

P=1 P=0 P=0.5 True network Thresh0.90.4-0.01 Prec12/31/2 Recall1/211 Precision Recall

Future work

How are we getting from here …

… to there ?!

Input: Learn: MCMC Prior knowledge

Download ppt "Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland."

Similar presentations