Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.

Slides:



Advertisements
Similar presentations
Bayesian network for gene regulatory network construction
Advertisements

Trustworthy Service Selection and Composition CHUNG-WEI HANG MUNINDAR P. Singh A. Moini.
A Tutorial on Learning with Bayesian Networks
. Context-Specific Bayesian Clustering for Gene Expression Data Yoseph Barash Nir Friedman School of Computer Science & Engineering Hebrew University.
The multi-layered organization of information in living systems
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
. Inferring Subnetworks from Perturbed Expression Profiles D. Pe’er A. Regev G. Elidan N. Friedman.
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Gene Co-expression Network Analysis BMI 730 Kun Huang Department of Biomedical Informatics Ohio State University.
Lecture 5: Learning models using EM
Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
6. Gene Regulatory Networks
Module Networks Discovering Regulatory Modules and their Condition Specific Regulators from Gene Expression Data Cohen Jony.
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Probabilistic Graphical models for molecular networks Sushmita Roy BMI/CS 576 Nov 11 th, 2014.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Probabilistic methods for phylogenetic trees (Part 2)
Cristina Manfredotti D.I.S.Co. Università di Milano - Bicocca An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data Cristina Manfredotti.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Inferring Cellular Networks Using Probabilistic Graphical Models Jianlin Cheng, PhD University of Missouri 2009.
Bayesian integration of biological prior knowledge into the reconstruction of gene regulatory networks Dirk Husmeier Adriano V. Werhli.
Dependency networks Sushmita Roy BMI/CS 576 Nov 26 th, 2013.
Detecting robust time-delayed regulation in Mycobacterium tuberculosis Iti Chaturvedi and Jagath C Rajapakse INCOB 2009.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Using Bayesian Networks to Analyze Expression Data By Friedman Nir, Linial Michal, Nachman Iftach, Pe'er Dana (2000) Presented by Nikolaos Aravanis Lysimachos.
Reconstructing gene networks Analysing the properties of gene networks Gene Networks Using gene expression data to reconstruct gene networks.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
Lectures 2 – Oct 3, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Class review Sushmita Roy BMI/CS 576 Dec 11 th, 2014.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Lectures 9 – Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
Bayesian Inference, Review 4/25/12 Frequentist inference Bayesian inference Review The Bayesian Heresy (pdf)pdf Professor Kari Lock Morgan Duke University.
Computational methods to inferring cellular networks Stat 877 Apr 15 th 2014 Sushmita Roy.
Conditional Probability Distributions Eran Segal Weizmann Institute.
IMPROVED RECONSTRUCTION OF IN SILICO GENE REGULATORY NETWORKS BY INTEGRATING KNOCKOUT AND PERTURBATION DATA Yip, K. Y., Alexander, R. P., Yan, K. K., &
Dependency networks Sushmita Roy BMI/CS 576 Nov 25 th, 2014.
Introduction to biological molecular networks
1 Parameter Learning 2 Structure Learning 1: The good Graphical Models – Carlos Guestrin Carnegie Mellon University September 27 th, 2006 Readings:
Module Networks BMI/CS 576 Mark Craven December 2007.
Learning disjunctions in Geronimo’s regression trees Felix Sanchez Garcia supervised by Prof. Dana Pe’er.
Elucidating regulatory mechanisms downstream of a signaling pathway using informative experiments Discussion leader: Navneet Scribe: James Computational.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
A comparative approach for gene network inference using time-series gene expression data Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques,
Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Network applications Sushmita Roy BMI/CS 576 Dec 9 th, 2014.
Identifying submodules of cellular regulatory networks Guido Sanguinetti Joint work with N.D. Lawrence and M. Rattray.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
Hierarchical Agglomerative Clustering on graphs
CSCI2950-C Lecture 12 Networks
Multi-task learning approaches to modeling context-specific networks
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Evaluation of inferred networks
Network Inference Chris Holmes Oxford Centre for Gene Function, &,
Presentation transcript:

Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy

RECAP from last time A regulatory network has structure and parameters Network reconstruction – Identify structure and parameters from data Classes of methods for network reconstruction – Per-gene vs Per-module – Sparse candidates is an example of per-gene Key idea: restrict the parent set to a skeleton defined by “good” candidates Good candidates: high mutual information OR high predictive power

Goals for today Per-module methods – Module network Incorporating priors in graph structure learning – Combining per-gene and per-module methods Assessing confidence in networks

Module Networks Motivation: – Most complex systems have too many variables – Not enough data to robustly learn dependencies among them – Large networks are hard to interpret Key idea: Group similarly behaving variables into “modules” and learn parameters for each module Relevance to gene regulatory networks – Genes that are co-expressed are likely regulated in similar ways Segal et al 2005

An expression module Set of genes that behave similarly across conditions Modules Gasch & Eisen, 2002 Genes

Modeling questions in Module Networks What is the mathematical definition of a module? – All variables in a module have the same conditional probability distributions How to model the CPD between parent and children? – Regression Tree How to learn module networks?

Defining a Module Network Denoted by : Structure, specifying the parents of each module : Assignment of X i to module k, : Parameterizing CPD P(M j |Pa Mj ), Pa Mj are parents of module M j – Each Variable X i in M j has the same conditional distribution

Bayesian network vs Module network Each variable takes three values: UP, DOWN, SAME

Bayesian network vs Module network Bayesian network – CPD per random variable – Learning only requires to search for parents Module network – CPD per module – Learning requires parent search and module membership assignment

Learning a Module Network Given – training dataset D={x 1,..,x N }, – number of modules Learn – Module assignment of each X i to a module – CPDs Θ – The parents of each module

Score of a Module network Module network Data K : number of modules, X j : j th module Pa Mj Parents of module M j Likelihood of module j

Module network learning algorithm

Module initialization as clustering of variables for module network

Module re-assignment Two requirements – Must preserve the acyclic structure – Must improve score Perform sequential update: – The delta score of moving a variable from one module to another while keeping the other variables fixed

Module re-assignment via sequential update

Regression tree to capture CPD X 1 > e 1 X 2 > e 2 YES NO YES Each path captures a mode of regulation of X 3 by X 1 and X 2 Expression of target modeled using Gaussians at each leaf node X3X3 X1X1 X2X2

Assessing the value of using Module Networks Generate data, D from a known module network, M true –M true was in turn learned from real data – 10 modules, 500 variables Learn a module network, M from D Assess M ’s quality using: – Test data likelihood (higher is better) – Agreement in parent-child relationships between M and M true

Test data likelihood Each line type represents size of training data

Recovery of graph structure

Module networks has better performance than simple Bayesian network Gain in test data likelihood over Bayesian network

Application of Module networks to yeast expression data Segal, Regev, Pe’er, Gasch, Nature Genetics 2005

The Respiration and Carbon Module Regulation tree

Global View of Modules modules for common processes often share common – regulators – binding site motifs

Goals for today Per-module methods – Module network Incorporating priors in graph structure learning – Combining per-gene and per-module methods Assessing confidence in networks

Per-gene vs per-module Per-gene methods – Precise regulatory programs per gene – No modular organization revealed/captured Per-module methods – Modular organization-> simpler representation – Gene-specific regulatory information is lost

Can we combine the strengths of both approaches? Per gene Y1Y1 X1X1 X2X2 Y2Y2 X3X3 X4X4 Y2Y2 Y1Y1 X1X1 X2X2 MERLIN: Per gene module-constrained Per module Y2Y2 Y1Y1 X1X1 X2X2 Module X4X4

Bayesian formulation of network inference is an unknown random variable Optimize posterior distribution of graph given data Graph prior Data

Let distribute independently over edges Define prior probability of edge presence A prior to combine per-gene and per-module methods Present edgesAbsent edges Module Prior strength Graph structure complexity Module support for an edge

Behavior of graph structure prior Probability of edge

Quantifying module support For each candidate X j for X i ’s regulator set

MERLIN: Learning upstream regulators of regulatory modules Targets Initial modules Measurements from multiple conditions Final reconstructed network Module Revisit modules using expression & regulatory programs Update regulators using new modules ATF1 RAP1.. Candidate regulators MCK1 HOG1.. Transcription factors Signaling proteins EXPRESSION CLUSTERING Roy et al, Plos Comp bio, 2013

MERLIN correctly infers edges between true and inferred networks on simulated data ? True networkInferred network GENIE3 MERLIN MODNET LINEAR-REGRESSION Precision Recall Precision= # of correct edges # of predicted edges Recall= # of correct edges # of true edges

Goals for today Per-module methods – Module network Incorporating priors in graph structure learning – Combining per-gene and per-module methods Assessing confidence in networks

Assessing confidence in the learned network Typically the number of training samples is not sufficient to reliably determine the “right” network One can however estimate the confidence of specific features of the network – Graph features f(G) Examples of f(G) – An edge between two random variables – Order relations: Is X, Y ’s ancestor?

How to assess confidence in graph features? What we want is P(f(G)|D), which is But it is not feasible to compute this sum Instead we will use a “bootstrap” procedure

Bootstrap to assess graph feature confidence For i=1 to m – Construct dataset D i by sampling with replacement N samples from dataset D, where N is the size of the original D – Learn a network B i For each feature of interest f, calculate confidence

Does the bootstrap confidence represent real relationships? Compare the confidence distribution to that obtained from randomized data Shuffle the columns of each row (gene) separately. Repeat the bootstrap procedure randomize each row independently genes Experimental conditions

Bootstrap-based confidence differs between real and actual data f f Random Real

Example of a high confidence sub-network One learned Bayesian networkBootstrapped confidence Bayesian network Highlights a subnetwork associated with yeast mating

Summary Biological systems are complex with many components Learning networks from global expression data is challenging We have seen three strategies to learn these networks – Sparse candidate – Module networks – Strategies to assess network structure confidence

Other problems in regulatory network inference Combining different types of datasets to improve network structure – E.g. Motif and ChIP binding Modeling dynamics in networks Incorporate perturbations on regulatory nodes Integrating upstream signaling networks with transcriptional networks Learning context-specific networks – Differential wiring