Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.

Similar presentations


Presentation on theme: "Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy."— Presentation transcript:

1 Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy

2 RECAP from last time A regulatory network has structure and parameters Network reconstruction – Identify structure and parameters from data Classes of methods for network reconstruction – Per-gene vs Per-module – Sparse candidates is an example of per-gene Key idea: restrict the parent set to a skeleton defined by “good” candidates Good candidates: high mutual information OR high predictive power

3 Goals for today Per-module methods – Module network Incorporating priors in graph structure learning – Combining per-gene and per-module methods Assessing confidence in networks

4 Module Networks Motivation: – Most complex systems have too many variables – Not enough data to robustly learn dependencies among them – Large networks are hard to interpret Key idea: Group similarly behaving variables into “modules” and learn parameters for each module Relevance to gene regulatory networks – Genes that are co-expressed are likely regulated in similar ways Segal et al 2005

5 An expression module Set of genes that behave similarly across conditions Modules Gasch & Eisen, 2002 Genes

6 Modeling questions in Module Networks What is the mathematical definition of a module? – All variables in a module have the same conditional probability distributions How to model the CPD between parent and children? – Regression Tree How to learn module networks?

7 Defining a Module Network Denoted by : Structure, specifying the parents of each module : Assignment of X i to module k, : Parameterizing CPD P(M j |Pa Mj ), Pa Mj are parents of module M j – Each Variable X i in M j has the same conditional distribution

8 Bayesian network vs Module network Each variable takes three values: UP, DOWN, SAME

9 Bayesian network vs Module network Bayesian network – CPD per random variable – Learning only requires to search for parents Module network – CPD per module – Learning requires parent search and module membership assignment

10 Learning a Module Network Given – training dataset D={x 1,..,x N }, – number of modules Learn – Module assignment of each X i to a module – CPDs Θ – The parents of each module

11 Score of a Module network Module network Data K : number of modules, X j : j th module Pa Mj Parents of module M j Likelihood of module j

12 Module network learning algorithm

13 Module initialization as clustering of variables for module network

14 Module re-assignment Two requirements – Must preserve the acyclic structure – Must improve score Perform sequential update: – The delta score of moving a variable from one module to another while keeping the other variables fixed

15 Module re-assignment via sequential update

16 Regression tree to capture CPD X 1 > e 1 X 2 > e 2 YES NO YES Each path captures a mode of regulation of X 3 by X 1 and X 2 Expression of target modeled using Gaussians at each leaf node X3X3 X1X1 X2X2

17 Assessing the value of using Module Networks Generate data, D from a known module network, M true –M true was in turn learned from real data – 10 modules, 500 variables Learn a module network, M from D Assess M ’s quality using: – Test data likelihood (higher is better) – Agreement in parent-child relationships between M and M true

18 Test data likelihood Each line type represents size of training data

19 Recovery of graph structure

20 Module networks has better performance than simple Bayesian network Gain in test data likelihood over Bayesian network

21 Application of Module networks to yeast expression data Segal, Regev, Pe’er, Gasch, Nature Genetics 2005

22 The Respiration and Carbon Module Regulation tree

23 Global View of Modules modules for common processes often share common – regulators – binding site motifs

24 Goals for today Per-module methods – Module network Incorporating priors in graph structure learning – Combining per-gene and per-module methods Assessing confidence in networks

25 Per-gene vs per-module Per-gene methods – Precise regulatory programs per gene – No modular organization revealed/captured Per-module methods – Modular organization-> simpler representation – Gene-specific regulatory information is lost

26 Can we combine the strengths of both approaches? Per gene Y1Y1 X1X1 X2X2 Y2Y2 X3X3 X4X4 Y2Y2 Y1Y1 X1X1 X2X2 MERLIN: Per gene module-constrained Per module Y2Y2 Y1Y1 X1X1 X2X2 Module X4X4

27 Bayesian formulation of network inference is an unknown random variable Optimize posterior distribution of graph given data Graph prior Data

28 Let distribute independently over edges Define prior probability of edge presence A prior to combine per-gene and per-module methods Present edgesAbsent edges Module Prior strength Graph structure complexity Module support for an edge

29 Behavior of graph structure prior Probability of edge

30 Quantifying module support For each candidate X j for X i ’s regulator set

31 MERLIN: Learning upstream regulators of regulatory modules Targets Initial modules Measurements from multiple conditions Final reconstructed network Module Revisit modules using expression & regulatory programs Update regulators using new modules ATF1 RAP1.. Candidate regulators MCK1 HOG1.. Transcription factors Signaling proteins EXPRESSION CLUSTERING Roy et al, Plos Comp bio, 2013

32 MERLIN correctly infers edges between true and inferred networks on simulated data ? True networkInferred network GENIE3 MERLIN MODNET LINEAR-REGRESSION Precision Recall Precision= # of correct edges # of predicted edges Recall= # of correct edges # of true edges

33 Goals for today Per-module methods – Module network Incorporating priors in graph structure learning – Combining per-gene and per-module methods Assessing confidence in networks

34 Assessing confidence in the learned network Typically the number of training samples is not sufficient to reliably determine the “right” network One can however estimate the confidence of specific features of the network – Graph features f(G) Examples of f(G) – An edge between two random variables – Order relations: Is X, Y ’s ancestor?

35 How to assess confidence in graph features? What we want is P(f(G)|D), which is But it is not feasible to compute this sum Instead we will use a “bootstrap” procedure

36 Bootstrap to assess graph feature confidence For i=1 to m – Construct dataset D i by sampling with replacement N samples from dataset D, where N is the size of the original D – Learn a network B i For each feature of interest f, calculate confidence

37 Does the bootstrap confidence represent real relationships? Compare the confidence distribution to that obtained from randomized data Shuffle the columns of each row (gene) separately. Repeat the bootstrap procedure randomize each row independently genes Experimental conditions

38 Bootstrap-based confidence differs between real and actual data f f Random Real

39 Example of a high confidence sub-network One learned Bayesian networkBootstrapped confidence Bayesian network Highlights a subnetwork associated with yeast mating

40 Summary Biological systems are complex with many components Learning networks from global expression data is challenging We have seen three strategies to learn these networks – Sparse candidate – Module networks – Strategies to assess network structure confidence

41 Other problems in regulatory network inference Combining different types of datasets to improve network structure – E.g. Motif and ChIP binding Modeling dynamics in networks Incorporate perturbations on regulatory nodes Integrating upstream signaling networks with transcriptional networks Learning context-specific networks – Differential wiring


Download ppt "Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy."

Similar presentations


Ads by Google