Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.

Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy

RECAP from last time A regulatory network has structure and parameters Network reconstruction – Identify structure and parameters from data Classes of methods for network reconstruction – Per-gene vs Per-module – Sparse candidates is an example of per-gene Key idea: restrict the parent set to a skeleton defined by “good” candidates Good candidates: high mutual information OR high predictive power

Goals for today Per-module methods – Module network Incorporating priors in graph structure learning – Combining per-gene and per-module methods Assessing confidence in networks

Module Networks Motivation: – Most complex systems have too many variables – Not enough data to robustly learn dependencies among them – Large networks are hard to interpret Key idea: Group similarly behaving variables into “modules” and learn parameters for each module Relevance to gene regulatory networks – Genes that are co-expressed are likely regulated in similar ways Segal et al 2005

An expression module Set of genes that behave similarly across conditions Modules Gasch & Eisen, 2002 Genes

Modeling questions in Module Networks What is the mathematical definition of a module? – All variables in a module have the same conditional probability distributions How to model the CPD between parent and children? – Regression Tree How to learn module networks?

Defining a Module Network Denoted by : Structure, specifying the parents of each module : Assignment of X i to module k, : Parameterizing CPD P(M j |Pa Mj ), Pa Mj are parents of module M j – Each Variable X i in M j has the same conditional distribution

Bayesian network vs Module network Each variable takes three values: UP, DOWN, SAME

Bayesian network vs Module network Bayesian network – CPD per random variable – Learning only requires to search for parents Module network – CPD per module – Learning requires parent search and module membership assignment

Learning a Module Network Given – training dataset D={x 1,..,x N }, – number of modules Learn – Module assignment of each X i to a module – CPDs Θ – The parents of each module

Score of a Module network Module network Data K : number of modules, X j : j th module Pa Mj Parents of module M j Likelihood of module j

Module network learning algorithm

Module initialization as clustering of variables for module network

Module re-assignment Two requirements – Must preserve the acyclic structure – Must improve score Perform sequential update: – The delta score of moving a variable from one module to another while keeping the other variables fixed

Module re-assignment via sequential update

Regression tree to capture CPD X 1 > e 1 X 2 > e 2 YES NO YES Each path captures a mode of regulation of X 3 by X 1 and X 2 Expression of target modeled using Gaussians at each leaf node X3X3 X1X1 X2X2

Assessing the value of using Module Networks Generate data, D from a known module network, M true –M true was in turn learned from real data – 10 modules, 500 variables Learn a module network, M from D Assess M ’s quality using: – Test data likelihood (higher is better) – Agreement in parent-child relationships between M and M true

Test data likelihood Each line type represents size of training data

Recovery of graph structure

Module networks has better performance than simple Bayesian network Gain in test data likelihood over Bayesian network

Application of Module networks to yeast expression data Segal, Regev, Pe’er, Gasch, Nature Genetics 2005

The Respiration and Carbon Module Regulation tree

Global View of Modules modules for common processes often share common – regulators – binding site motifs

Per-gene vs per-module Per-gene methods – Precise regulatory programs per gene – No modular organization revealed/captured Per-module methods – Modular organization-> simpler representation – Gene-specific regulatory information is lost

Can we combine the strengths of both approaches? Per gene Y1Y1 X1X1 X2X2 Y2Y2 X3X3 X4X4 Y2Y2 Y1Y1 X1X1 X2X2 MERLIN: Per gene module-constrained Per module Y2Y2 Y1Y1 X1X1 X2X2 Module X4X4

Bayesian formulation of network inference is an unknown random variable Optimize posterior distribution of graph given data Graph prior Data

Let distribute independently over edges Define prior probability of edge presence A prior to combine per-gene and per-module methods Present edgesAbsent edges Module Prior strength Graph structure complexity Module support for an edge

Behavior of graph structure prior Probability of edge

Quantifying module support For each candidate X j for X i ’s regulator set

MERLIN: Learning upstream regulators of regulatory modules Targets Initial modules Measurements from multiple conditions Final reconstructed network Module Revisit modules using expression & regulatory programs Update regulators using new modules ATF1 RAP1.. Candidate regulators MCK1 HOG1.. Transcription factors Signaling proteins EXPRESSION CLUSTERING Roy et al, Plos Comp bio, 2013

MERLIN correctly infers edges between true and inferred networks on simulated data ? True networkInferred network GENIE3 MERLIN MODNET LINEAR-REGRESSION Precision Recall Precision= # of correct edges # of predicted edges Recall= # of correct edges # of true edges

Assessing confidence in the learned network Typically the number of training samples is not sufficient to reliably determine the “right” network One can however estimate the confidence of specific features of the network – Graph features f(G) Examples of f(G) – An edge between two random variables – Order relations: Is X, Y ’s ancestor?

How to assess confidence in graph features? What we want is P(f(G)|D), which is But it is not feasible to compute this sum Instead we will use a “bootstrap” procedure

Bootstrap to assess graph feature confidence For i=1 to m – Construct dataset D i by sampling with replacement N samples from dataset D, where N is the size of the original D – Learn a network B i For each feature of interest f, calculate confidence

Does the bootstrap confidence represent real relationships? Compare the confidence distribution to that obtained from randomized data Shuffle the columns of each row (gene) separately. Repeat the bootstrap procedure randomize each row independently genes Experimental conditions

Bootstrap-based confidence differs between real and actual data f f Random Real

Example of a high confidence sub-network One learned Bayesian networkBootstrapped confidence Bayesian network Highlights a subnetwork associated with yeast mating

Summary Biological systems are complex with many components Learning networks from global expression data is challenging We have seen three strategies to learn these networks – Sparse candidate – Module networks – Strategies to assess network structure confidence

Other problems in regulatory network inference Combining different types of datasets to improve network structure – E.g. Motif and ChIP binding Modeling dynamics in networks Incorporate perturbations on regulatory nodes Integrating upstream signaling networks with transcriptional networks Learning context-specific networks – Differential wiring

Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.

Similar presentations

Presentation on theme: "Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.

Similar presentations

Presentation on theme: "Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy."— Presentation transcript:

Similar presentations

About project

Feedback