Reverse engineering gene regulatory networks Dirk Husmeier Adriano Werhli Marco Grzegorczyk.

Slides:

Advertisements

Similar presentations

Systems biology SAMSI Opening Workshop Algebraic Methods in Systems Biology and Statistics September 14, 2008 Reinhard Laubenbacher Virginia Bioinformatics.

Advertisements

Bayesian network for gene regulatory network construction

Evidence for Complex Causes

DREAM4 Puzzle – inferring network structure from microarray data Qiong Cheng.

Network biology Wang Jie Shanghai Institutes of Biological Sciences.

Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.

CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.

The IMAP Hybrid Method for Learning Gaussian Bayes Nets Oliver Schulte School of Computing Science Simon Fraser University Vancouver, Canada

Mechanistic models and machine learning methods for TIMET Dirk Husmeier.

. Inferring Subnetworks from Perturbed Expression Profiles D. Pe’er A. Regev G. Elidan N. Friedman.

Introduction of Probabilistic Reasoning and Bayesian Networks

Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman.

Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.

Networks are useful for describing systems of interacting objects, where the nodes represent the objects and the edges represent the interactions between.

Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.

Reverse engineering gene and protein regulatory networks using Graphical Models. A comparative evaluation study. Marco Grzegorczyk Dirk Husmeier Adriano.

Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.

Simulation and Application on learning gene causal relationships Xin Zhang.

Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology.

Biological networks Construction and Analysis. Recap Gene regulatory networks –Transcription Factors: special proteins that function as “keys” to the.

6. Gene Regulatory Networks

Causal Modeling for Anomaly Detection Andrew Arnold Machine Learning Department, Carnegie Mellon University Summer Project with Naoki Abe Predictive Modeling.

Inferring subnetworks from perturbed expression profiles Dana Pe’er, Aviv Regev, Gal Elidan and Nir Friedman Bioinformatics, Vol.17 Suppl

Bayesian network models of Biological signaling pathways

Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks Dirk Husmeier Adriano V. Werhli.

Statistical Bioinformatics QTL mapping Analysis of DNA sequence alignments Postgenomic data integration Systems biology.

Cis-regulation Trans-regulation 5 Objective: pathway reconstruction.

Genetic network inference: from co-expression clustering to reverse engineering Patrik D’haeseleer,Shoudan Liang and Roland Somogyi.

Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.

Reverse Engineering of Genetic Networks (Final presentation)

Probabilistic Models that uncover the hidden Information Flow in Signalling Networks Achim Tresch.

Modeling and identification of biological networks Esa Pitkänen Seminar on Computational Systems Biology Department of Computer Science University.

Microarrays to Functional Genomics: Generation of Transcriptional Networks from Microarray experiments Joshua Stender December 3, 2002 Department of Biochemistry.

Data Analysis with Bayesian Networks: A Bootstrap Approach Nir Friedman, Moises Goldszmidt, and Abraham Wyner, UAI99.

Learning regulatory networks from postgenomic data and prior knowledge Dirk Husmeier 1) Biomathematics & Statistics Scotland 2) Centre for Systems Biology.

Statistical Bioinformatics Genomics Transcriptomics Proteomics Systems Biology.

Inferring gene regulatory networks from transcriptomic profiles Dirk Husmeier Biomathematics & Statistics Scotland.

Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.

1 Methods for evaluating inference algorithms June, 2005 Omer Berkman Tel Aviv University, Israel.

Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.

Probabilistic modelling in computational biology Dirk Husmeier Biomathematics & Statistics Scotland.

Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.

Learning With Bayesian Networks Markus Kalisch ETH Zürich.

Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.

Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.

Inferring gene regulatory networks with non-stationary dynamic Bayesian networks Dirk Husmeier Frank Dondelinger Sophie Lebre Biomathematics & Statistics.

Reconstructing gene regulatory networks with probabilistic models Marco Grzegorczyk Dirk Husmeier.

Learning Bayesian networks from postgenomic data with an improved structure MCMC sampling scheme Dirk Husmeier Marco Grzegorczyk 1) Biomathematics & Statistics.

Inferring gene regulatory networks from transcriptomic profiles Dirk Husmeier Biomathematics & Statistics Scotland.

MCMC in structure space MCMC in order space.

Bayesian networks and their application in circuit reliability estimation Erin Taylor.

BAYESIAN INFERENCE OF SIGNALING NETWORK TOPOLOGY IN A CANCER CELL LINE Steven M. Hill, Yiling Lu, Jennifer Molina, Laura M. Heiser, Paul T. Spellman, Terence.

1 Identifying Differentially Regulated Genes Nirmalya Bandyopadhyay, Manas Somaiya, Sanjay Ranka, and Tamer Kahveci Bioinformatics Lab., CISE Department,

Reverse engineering of regulatory networks Dirk Husmeier & Adriano Werhli.

Elucidating regulatory mechanisms downstream of a signaling pathway using informative experiments Discussion leader: Navneet Scribe: James Computational.

Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,

A comparative approach for gene network inference using time-series gene expression data Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques,

04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.

(c) M Gerstein '06, gerstein.info/talks 1 CS/CBB Data Mining Predicting Networks through Bayesian Integration #1 - Theory Mark Gerstein, Yale University.

Introduction on Graphic Models

Mechanistic models and machine learning methods for TIMET

Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.

Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.

Journal club Jun , Zhen.

Representation, Learning and Inference in Models of Cellular Networks

Learning gene regulatory networks in Arabidopsis thaliana

Markov Random Fields with Efficient Approximations

Regulation Analysis using Restricted Boltzmann Machines

Presentation transcript:

Reverse engineering gene regulatory networks Dirk Husmeier Adriano Werhli Marco Grzegorczyk

Systems biology Learning signalling pathways and regulatory networks from postgenomic data

unknown

high- throughput experiments postgenomic data

unknown data machine learning statistical methods

true network extracted network Does the extracted network provide a good prediction of the true interactions?

Reverse Engineering of Regulatory Networks Can we learn the network structure from postgenomic data themselves? Statistical methods to distinguish between –Direct interactions –Indirect interactions Challenge: Distinguish between –Correlations –Causal interactions Breaking symmetries with active interventions: –Gene knockouts (VIGs, RNAi)

direct interaction common regulator indirect interaction co-regulation

Relevance networks Graphical Gaussian models Bayesian networks

Relevance networks Graphical Gaussian models Bayesian networks

Relevance networks (Butte and Kohane, 2000) 1.Choose a measure of association A(.,.) 2.Define a threshold value t A 3.For all pairs of domain variables (X,Y) compute their association A(X,Y) 4. Connect those variables (X,Y) by an undirected edge whose association A(X,Y) exceeds the predefined threshold value t A

Association scores

12 X 21 X 21 ‘direct interaction’ ‘common regulator’ ‘indirect interaction’ X strong correlation σ 12

Pairwise associations without taking the context of the system into consideration

Relevance networks Graphical Gaussian models Bayesian networks

Graphical Gaussian Models direct interaction Partial correlation, i.e. correlation conditional on all other domain variables Corr(X 1,X 2 |X 3,…,X n ) strong partial correlation π 12

direct interaction common regulator indirect interaction co-regulation Distinguish between direct and indirect interactions A and B have a low partial correlation

Graphical Gaussian Models direct interaction Partial correlation, i.e. correlation conditional on all other domain variables Corr(X 1,X 2 |X 3,…,X n ) Problem: #observations < #variables strong partial correlation π 12

Shrinkage estimation and the lemma of Ledoit-Wolf

Graphical Gaussian Models direct interaction common regulator indirect interaction P(A,B)=P(A)·P(B) But: P(A,B|C)≠P(A|C)·P(B|C)

Undirected versus directed edges Relevance networks and Graphical Gaussian models can only extract undirected edges. Bayesian networks can extract directed edges. But can we trust in these edge directions? It may be better to learn undirected edges than learning directed edges with false orientations.

Relevance networks Graphical Gaussian models Bayesian networks

A CB D EF NODES EDGES Marriage between graph theory and probability theory. Directed acyclic graph (DAG) representing conditional independence relations. It is possible to score a network in light of the data: P(D|M), D:data, M: network structure. We can infer how well a particular network explains the observed data.

Bayesian networks versus causal networks Bayesian networks represent conditional (in)dependence relations - not necessarily causal interactions.

Bayesian networks versus causal networks A CB A CB True causal graph Node A unknown

Bayesian networks versus causal networks A CB Equivalence classes: networks with the same scores: P(D|M). Equivalent networks cannot be distinguished in light of the data. A CB A CB A CB

Equivalence classes of BNs A B C A B A B A B C C C A B C completed partially directed graphs (CPDAGs) A C B v-structure P(A,B)=P(A)·P(B) P(A,B|C) ≠ P(A|C)·P(B|C) P(A,B)≠P(A)·P(B) P(A,B|C)=P(A|C)·P(B|C)

Symmetry breaking A CB Interventions Prior knowledge A CB A CB A CB

Symmetry breaking A CB Interventions Prior knowledge A CB A CB A CB

Interventional data AB AB AB inhibition of A AB down-regulation of Bno effect on B A and B are correlated

Learning Bayesian networks from data P(M|D) = P(D|M) P(M) / Z M: Network structure. D: Data

Learning Bayesian networks from data P(M|D) = P(D|M) P(M) / Z M: Network structure. D: Data

Evaluation On real experimental data, using the gold standard network from the literature On synthetic data simulated from the gold- standard network

Evaluation On real experimental data, using the gold standard network from the literature On synthetic data simulated from the gold- standard network

From Sachs et al., Science 2005

Evaluation: Raf signalling pathway Cellular signalling network of 11 phosphorylated proteins and phospholipids in human immune systems cell Deregulation  carcinogenesis Extensively studied in the literature  gold standard network

Raf regulatory network From Sachs et al Science 2005

Flow cytometry data Intracellular multicolour flow cytometry experiments: concentrations of 11 proteins 5400 cells have been measured under 9 different cellular conditions (cues) Downsampling to 100 instances (5 separate subsets): indicative of microarray experiments

Two types of experiments

Evaluation On real experimental data, using the gold standard network from the literature On synthetic data simulated from the gold- standard network

Comparison with simulated data 1

Raf pathway

Comparison with simulated data 2

Steady-state approximation

Real versus simulated data Real biological data: full complexity of biological systems. The “gold-standard” only represents our current state of knowledge; it is not guaranteed to represent the true network. Simulated data: Simplifications that might be biologically unrealistic. We know the true network.

How can we evaluate the reconstruction accuracy ?

true network extracted network biological knowledge (gold standard network) Evaluation of learning performance

Performance evaluation: ROC curves

We use the Area Under the Receiver Operating Characteristic Curve (AUC). 0.5<AUC<1 AUC=1 AUC=0.5 Performance evaluation: ROC curves

Alternative performance evaluation: True positive (TP) scores We set the threshold such that we obtain 5 spurious edges (5 FPs) and count the corresponding number of true edges (TP count).

5 FP counts BN GGM RN Alternative performance evaluation: True positive (TP) scores

data Directed graph evaluation - DGE true regulatory network Thresholding edge scores TP:1/2 FP:0/4 TP:2/2 FP:1/4 concrete network predictions lowhigh

data Undirected graph evaluation - UGE skeleton of the true regulatory network Thresholding undirected edge scores TP:1/2 FP:0/1 TP:2/2 FP:1/1 highlow concrete network (skeleton) predictions

Synthetic data, observations

Synthetic data, interventions

Cytometry data, interventions

How can we explain the difference between synthetic and real data ?

Simulated data are “simpler”. No mismatch between models used for data generation and inference.

Complications with real data Can we trust our gold-standard network?

Raf regulatory network From Sachs et al Science 2005

Regulation of Raf-1 by Direct Feedback Phosphorylation. Molecular Cell, Vol. 17, 2005 Dougherty et al Disputed structure of the gold- standard network

Stabilisation through negative feedback loops inhibition Complications with real data Interventions might not be “ideal” owing to negative feedback loops.

Conclusions 1 BNs and GGMs outperform RNs, most notably on Gaussian data. No significant difference between BNs and GGMs on observational data. For interventional data, BNs clearly outperform GGMs and RNs, especially when taking the edge direction (DGE score) rather than just the skeleton (UGE score) into account.

Conclusions 2 Performance on synthetic data better than on real data. Real data: more complex Real interventions are not ideal Errors in the gold-standard network

How do we model feedback loops?

Unfolding in time