Presentation is loading. Please wait.

Presentation is loading. Please wait.

BAYESIAN INFERENCE OF SIGNALING NETWORK TOPOLOGY IN A CANCER CELL LINE Steven M. Hill, Yiling Lu, Jennifer Molina, Laura M. Heiser, Paul T. Spellman, Terence.

Similar presentations


Presentation on theme: "BAYESIAN INFERENCE OF SIGNALING NETWORK TOPOLOGY IN A CANCER CELL LINE Steven M. Hill, Yiling Lu, Jennifer Molina, Laura M. Heiser, Paul T. Spellman, Terence."— Presentation transcript:

1 BAYESIAN INFERENCE OF SIGNALING NETWORK TOPOLOGY IN A CANCER CELL LINE Steven M. Hill, Yiling Lu, Jennifer Molina, Laura M. Heiser, Paul T. Spellman, Terence P. Speed, Joe W. Gray, Gordon B. Mills and Sach Mukherjee Discussion Leader: Ashwin Scribe: Matthew Computational Network Biology BMI 826/Computer Sciences 838 https://compnetbiocourse.discovery.wisc.edu

2 Problem Overview Data-driven characterization of signaling networks, specific to a context of interest (such as a cell line or tissue type) Requires the ability to probe post-translational modification states in multiple proteins through time and across samples. Proteomic analyses on large scale are challenging Intrinsic and experimental noise Find trade-off between “fit-to-data” and “model parsimony”

3 Approach and its Motivation Learn network structure through a Dynamic Bayesian Network (DBN) What? Nodes represent proteins at different time points Edges represent probability relationships between the nodes DBNs are capable of modelling feedback loops, while these are not allowed in static BNs Why? Due to computational constraints, statistical inference approaches are limited to exploring only a few hypothesized networks, while this approach can explore a large number of candidate networks.

4 Fig.1: Data-driven characterization of signaling networks. Reverse-phase protein arrays interrogate signaling dynamics in samples of interest. Network structure is inferred using DBNs, with primary phospho-proteomic data integrated with existing biology, using informative priors objectively weighted by an empirical Bayes approach. Edge probabilities then allow the generation and prioritization of hypotheses for experimental validation

5 Method: Preview Biological signaling information is incorporated by assigning prior distributions to links Priors assigned based on frequency of links appearing in high-scoring topologies Bayesian Model Averaging carried out Maximal marginal likelihood analysis Obtain a closed form score for network topologies using ‘g- prior’

6 Assumptions Specific Context: breast cancer cell line MDA-MB-468 First-order Markov and Stationarity assumption Dependence may be sparse, with each node at time t depending only on a subset of nodes at time t-1 Edges are directed forward in time, hence no cycle- checking required

7 Joint Probability Distribution in a Bayesian Network where X = (X 1,...,X T ) is all data, π G ( i ) ⊆ {1,..., p} is an index set for parents of protein i according to graph G X t π G(i) = { X t j | j ∈ π G ( i ) } is data for the parents of protein i at time t θ i ⊆ are parameters for the conditional distribution of X t i ψ i ⊆ are parameters for X 1 i Joint Probability Distribution can be factorized into a product of conditional distributions :

8 Regression model The conditionals are taken to be Gaussian and describe the dependence of child nodes on parents and can be thought of as regression models S For ease of notation, X + i = ( X 2 i X 3 i … X T i ) and X - i = ( X 1 i X 2 i … X T-1 i )

9 Parameters Dependence of a node on products of parents as well as parents themselves. For example, if π G ( i ) = {1, 2, 3} then the mean for variable X t i is a linear combination of the three parents X t-1 j, the three possible pairwise products of parents X t-1 j X t-1 k and the product of all parents X t-1 1 X t-1 2 X t-1 3. For each protein i, let B i denote a [ n x 2 |π G ( i )| - 1 ] design matrix, with columns being the regression coefficients corresponding to all possible combinations of parents, pairwise combinations,…,product of all parents. The 2 |π G ( i )| - 1 regression coefficients, forming a vector β i and variance σ 2 i constitute parameters θ i

10 Marginal Likelihood Integrating out the prior parameters from the expression for marginal likelihood a closed form expression for the same is obtained: Requires inverse of B i T B i, which may be ill-conditioned or singular, especially when n << 2 |π G ( i )| - 1 Thus the in-degree |π G ( i )| < d max is assumed Ridge regularization done for nodes with large |π G ( i )| by adding a ‘ + ɑ I ’ term to B i T B i

11 Network Priors Priors are assumed as P(G) exp( λ f(G) ), where λ is a strength parameter for f(G), which is the score of each network according to existing biological knowledge from literature λ chosen by empirically maximizing marginal likelihood P(data | λ)

12 Posterior Probabilities of Edges Posterior probability of an edge is given by: where P(G|X) is a posterior distribution over graphs Instead of averaging over full graph, score subsets of potential parents and average over them

13 Results Simulation study. Average ROC curves. True- positive rate (for network edges) plotted against false-positive rate across a range of edge probability thresholds. Simulated data were generated from known graph structures by ancestral sampling. Graph structures were created to be in only partial agreement with the network prior (Supplementary Fig. S3). Results shown are averages obtained from 25 iterations. See text for full details of simulation and for description of methods shown. (For ‘Lasso’, curve produced by thresholding absolute regression coefficients, while marker ‘X’ is single graph obtained by taking non-zero coefficients to be edges)

14 Results Table 1. Synthetic yeast network study. Inference methods assessed on time- series gene expression data generated from a synthetically constructed gene regulatory network in yeast (Cantone et al., 2009). Results shown are area under the ROC curve (AUC). See text for description of methods shown. [The regimes using a network prior are mean AUCSD over 25 prior network structures. Prior networks were generated to be in partial agreement with the true, underlying network structure (see text for details).]

15 Results

16 Fig. 4. Validation of predictions by targeted inhibition in breast cancer cell line MDA-MB-468. (a) MAPK-STAT3 crosstalk. Network inference (Fig. 3a) predicted an unexpected link between phospho-MAPK (MAPKp) and STAT3p(S727) in the breast cancer cell line MDA-MB- 468. The hypothesis of MAPK-STAT3 crosstalk was tested by MEK inhibition: this successfully reduced MAPK phosphorylation and resulted in a corresponding decrease in STAT3p(S727). (b) AKTp → p70S6Kp, AKT- MAPK crosstalk and AKT-JNK/JUN crosstalk. AKTp is linked to p70S6kp, MEKp and cJUNp. In line with these model predictions, use of an AKT inhibitor reduced both p70S6K and MEK phosphorylation and increased JNK phosphorylation. (RPPA data; MEK inhibitor GSK1120212 and AKT inhibitor GSK690693B at 0 uM, 0.625 uM, 2.5 uMand 10 uM; measurements taken 0, 5, 15, 30, 60, 90, 120 and 180min after EGF stimulation; average values over 3 replicates shown, error bars indicate SEM)

17 Discussion Advantages: Efficient and fast for moderate datasets Takes advantage of existing information in the form of priors Exact calculation of Posterior edge probabilities Variable selection approach factorizes the problem and enables computations to be parallelized Limitations: Parameter prior used (g-prior) can suffer from matrix ill-conditioning Assumes homogeneity of parameters and network structure through time, which can be unrealistic Does not take into account ‘latent variables’


Download ppt "BAYESIAN INFERENCE OF SIGNALING NETWORK TOPOLOGY IN A CANCER CELL LINE Steven M. Hill, Yiling Lu, Jennifer Molina, Laura M. Heiser, Paul T. Spellman, Terence."

Similar presentations


Ads by Google