Probabilistic Graphical models for molecular networks Sushmita Roy BMI/CS 576 Nov 11 th, 2014.

Probabilistic Graphical models for molecular networks Sushmita Roy BMI/CS 576 www.biostat.wisc.edu/bmi576 sroy@biostat.wisc.edu Nov 11 th, 2014

RECAP Many different types of molecular networks – Networks are defined by the semantics of the vertex and edges Computational problems in networks – Network reconstruction Infer the structure and parameters of networks We will examine this problem in the context of “expression-based network inference” – Network applications Properties of networks Interpretation of gene sets Using networks to infer function of a gene

Plan for next lectures Representing networks as probabilistic graphical models – Bayesian networks (Today) – Module networks – Dependency networks Other methods for expression-based network inference – Classes of methods – Strengths and weaknesses of different methods

Readings Learning Bayesian Network Structure from Massive Datasets: The "Sparse Candidate" Algorithm In Proceedings of the 15th Annual Conference on Uncertainty in Artificial Intelligence 1999 Using Bayesian networks to analyze expression data. Journal of Computational Biology 7(3-4):601-620, 2000. Inferring Cellular Networks: A review. BMC Bioinformatics 2007

Modeling a regulatory network HSP12 Sko1 Hot1 Sko1 Structure HSP12 Hot1 Who are the regulators? ψ(X 1,X 2 ) Function X1X1 X2X2 X3X3 BOOLEAN LINEAR DIFF. EQNS PROBABILISTIC …. How they determine expression levels? Hot1 regulates HSP12 HSP12 is a target of Hot1

Mathematical representations of networks X1X1 X2X2 X3X3 f Output expression of node Models differ in the function that maps input system state to output state Input expression of neighbors Rate equations Probability distributions Boolean NetworksDifferential equationsProbabilistic graphical models X1X2 00 01 10 11 X3 0 1 1 1 Input Output X1X2 X3

Network reconstruction Given – A set of attributes associated with network nodes – Typically attributes are mRNA levels Do – Infer what nodes interact with each other Algorithms for network reconstruction vary based on their meaning of interaction – Statistical dependence Mutual information Correlation – Predictive ability

Computational methods to infer networks We will focus on transcriptional regulatory networks These networks are inferred from gene expression data Many methods to do network inference – We will focus on probabilistic graphical models

Notation Assume we have N genes Random variable encoding the expression level of i th gene set of N random variables, one for each gene Joint assignment to all N random variables; d th data sample dataset Graph Parameters

Bayesian networks (BN) A special type of probabilistic graphical model Has two parts: – A graph which is directed and acyclic – A set of conditional distributions Directed Acyclic Graph (DAG) – The nodes denote random variables X 1 … X N – The edges encode statistical dependencies between the random variables Establish parent child relationships Each node X i has a conditional probability distribution (CPD) representing P(X i | Parents(X i ) ) Provides a tractable way to represent large joint distributions

An example Bayesian network Cloudy (C) Rain (R) Sprinkler (S) Adapted from Kevin Murphy: Intro to Graphical models and Bayes networks: http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html WetGrass (W) P(C=F) P(C=T) 0.5 P(R=F) P(R=T) 0.8 0.2 T 0.2 0.8 P(S=F) P(S=T) 0.5 F T 0.9 0.1 P(W=F) P(W=T) 1 0 F T F 0.1 0.9 C F T T 0.1 0.9 0.01 0.99 C F S R

Bayesian network representation of a transcriptional network Hot1 Sko1 HSP12 Random variables encode expression levels T ARGET ( CHILD ) R EGULATORS (P ARENTS ) X1X1 X2X2 X3X3 X1X1 X2X2 X3X3 P(X 3 |X 1,X 2 ) GenesRandom variables HSP12 Sko1 Hot1 P(X 2 ) P(X 1 ) An example Bayesian network Assume HSP12’s expression is dependent upon Hot1 and Sko1 binding to HSP12’s promoter HSP12 ON HSP12 Sko1 HSP12 OFF

Bayesian networks compactly represent joint distributions CPD

Example Bayesian network of 5 variables X1X1 X2X2 X3X3 X5X5 X4X4 P(X 3 |X 1,X 2 ) P(X 2 ) P(X 1 ) P(X 4 ) P(X 5 |X 3, X 4 )

CPD in Bayesian networks CPD: Conditional probability distributions are central to Bayesian networks We have a CPD for each random variable in our graph CPDs describe the distribution of a Child variable given the state of its parents. The same structure can be parameterized in different ways For example for discrete variables we can have table or tree representations

Consider the following case with Boolean variables X 1, X 2, X 3, X 4 where X 1, X 2 and X 3 are the parents of X 4 Representing CPDs as tables X1X1 X2X2 X3X3 tf ttt0.90.1 ttf0.90.1 tft0.90.1 tff0.90.1 ftt0.80.2 ftf0.5 fft fff P( X 4 | X 1, X 2, X 3 ) as a table X4X4 X1X1 X2X2 X4X4 X3X3 P( X 4 | X 1, X 2, X 3 )

A tree representation of a CPD P( X 4 | X 1, X 2, X 3 ) as a tree P(X 4 = t ) = 0.9 X1X1 f t X2X2 P(X 4 = t ) = 0.5 ft X3X3 P(X 4 = t ) = 0.8 ft X1X1 X2X2 X4X4 X3X3 Allows more compact representation of CPDs. For example, we can ignore some quantities.

The learning problems Parameter learning on known structure – Given training data estimate parameters of the CPDs Structure learning – Given training data, find the statistical dependency structure, and that best describe – Subsumes parameter learning For every candidate graph, we need to estimate the parameters

Example of estimating CPD table from data Consider the four random variables X 1, X 2, X 3, X 4 Assume we observe the following samples of assignments to these variables To estimate P(X 4 |X 1,X 2,X 3 ), we need consider all configurations of X 1,X 2, X 3 and estimate the probability of X 4 being T or F TFTT TTFT TTFT TFTT TFTF TFTF FFTF X1X1 X2X2 X3X3 X4X4 For example, consider X 1 =T, X 2 =F, X 3 =T P(X 4 =T|X 1 =T, X 2 =F, X 3 =T)=2/4 P(X 4 =F|X 1 =T, X 2 =F, X 3 =T)=2/4

Structure learning using score-based search... Bayesian network Maximum likelihood parameters Data

Scoring a Bayesian network The score of a Bayesian network (BN) is determined by how well the BN describes the data This in turn is a function of the data likelihood Given data The score of a BN is therefore Parents of X i Assignment to parents of X i in the d th sample

Scoring a Bayesian network Score of a graph G decomposes over individual variables Which can be re-arranged to be written as the outer sum over variables This enables us to efficiently compute the score effect of local changes – That is changes to the parent set of individual random variables

Learning network structure is computationally expensive For N variables there are possible networks Set of possible networks grows super exponentially NNumber of networks 38 464 51024 632768 Need approximate methods to search the space of networks

Heuristic search of Bayesian network structures Make local operations to the graph structure – Add an edge – Delete an edge – Reverse an edge Evaluate score and select the network configuration with best score We just need to check for cycles Working with gene expression data requires additional considerations

Structure search operators A B CD A B C D add an edge A B C D delete an edge Current network Check for cycles

Bayesian network search: hill-climbing given: data set D, initial network B 0 i = 0 B best  B 0 while stopping criteria not met { for each possible operator application a { B new  apply(a, B i ) if score(B new ) > score(B best ) B best  B new } ++i B i  B best }

Network inference from expression data is difficult This is because – Lots of variables and not enough measurements for different variable configurations Good heuristics to prune the search space are highly desirable

Extensions to Bayesian networks to handle large number of random variables Sparse candidate algorithm Bootstrap-based ideas to score high confidence network Module networks (subsequent lecture)

The Sparse candidate Structure learning in Bayesian networks Key idea: Prune the potential parents for each node Identify k promising “candidate” parents for each network based on measures of statistical dependence – k<<N, N : number of random variables. Restrict networks to only include a subset of the “candidate” set. Possible pitfall – Early choices might exclude other good parents – Resolve using an iterative algorithm Friedman, 1999

Sparse candidate algorithm notation B n : Bayesian network at iteration n C i n : Candidate parent set for node X i at iteration n Pa n (X i ): Parents of X i in B n

Sparse candidate algorithm Input: – A data set – An initial network B 0 – A parameter k : number of parents Output: – Network B Loop until convergence – Restrict Based on D and B n-1 select candidate parents C i n (|C i n | ≤ k) for variable X i This defines a possibly cyclic directed network H n = {X,E} such that all edges – Maximize Find network B n that maximizes the score Score(B n ;D) among networks satisfying Termination: Return B n

The Restrict Step Measures of relevance

Information theoretic concepts Kullback Leibler (KL) Divergence – Dissimilarity between two distributions Mutual information – Mutual information between two random variables X and Y measures statistical dependence between X and Y – Also called the KL Divergence between the P(X,Y) and P(X)P(Y) Conditional Mutual information – Measures the information between two variables given a third

KL Divergence P(X), Q(X) are two probability distributions over X

Mutual Information Measure of statistical dependence between two random variables, X and Y Also the KL divergence between the joint and product of marginals – D KL (P(X,Y)||P(X)P(Y) )

Conditional Mutual Information Measures the mutual information between X and Y, given Z If Z captures everything about X, knowing Y gives no more information about X. Thus the conditional mutual information of X and Y given Z would be zero.

Measuring relevance of candidate parents in the Restrict Step A good parent for node X i is one that has a strong statistical dependence with X i Mutual information provides a good measure of statistical dependence I(X i ; X j ) Mutual information should be used only as a first approximation – Candidate parents need to be iteratively refined to avoid missing important dependences

Mutual information can miss some parents Consider the following true network If I(A;C)>I(A;D)>I(A;B) and we are selecting two candidate parents, B will never be selected as a parent How do we get B as a candidate parent? Note if we used mutual information alone to select candidates, we might be stuck with C and D A B C D

Sparse candidate restrict step Three strategies to handle the effect of greedy choices in the beginning Estimate the discrepancy between the (in)dependencies in the BN vs those in the data – KL Divergence between P(A,D) in the data vs P B (A,D) from the network B. Measure how much the current parent set shields A from D – Conditional mutual information between A and D given the current parent set of A. Measure how much the score improves on adding D

Measuring relevance of X i to X j M Disc ( X i,X j ) – Discrepancy between two joint distributions P( X i,X j ) : represented in the training data P B ( X i,X j ): represented by the BN B – D KL (P( X i,X j )||P B ( X i,X j )) M Shield ( X i,X j ) – Based on conditional mutual information – I( X i ;X j | Pa(X i ) ) M score ( X i,X j ): – Score when adding Xj to Xi’s current parent set Pa(X i ) – Score( X i ;X j,Pa(X i ),D )

Performance of Sparse candidate over simple hill-climbing Dataset 1 Dataset 2 100 variables 200 variables Score 15 seems to perform the best

Summary Sparse candidate algorithm was developed to handle structure learning of Bayesian networks with large number of variables The main heuristic is to discard parents that are not likely to be good Different ways to rank parents were based on statistical dependence – Mutual information – Conditional mutual information – Increase in score when adding a new parent

Assessing confidence in the learned network Given the large number of variables and small datasets, the data is not sufficient to reliably determine the “best” network One can however estimate the confidence of specific properties of the network – Graph features f(G) Examples of f(G) – An edge between two random variables – Order relations: Is X Y’s ancestor? – Is X in the Markov blanket of Y Markov blanket of Y is defined as those variables that render Y independent from the rest of the network Includes Y’s parents, children and parents of Y’s children

Markov blanket If MB(X) is the Markov blanket of X then P(X|MB(X),Y)=P(X|MB(X)). X A B FE C D X’s Markov blanket

How to assess confidence in graph features? What we want is P(f(G)|D), which is But it is not feasible to compute this sum Instead we will use a “bootstrap” procedure

Bootstrap to assess graph feature confidence For i=1 to m – Construct dataset D i by sampling with replacement N samples from dataset D, where N is the size of the original D – Learn a network G i For each feature of interest f, calculate confidence

Does the confidence estimated from bootstrap procedure represent real relationships? Compare the confidence distribution to that obtained from randomized data Shuffle the columns of each row (gene) separately. Repeat the bootstrap procedure randomize each row independently genes conditions

Application of Bayesian network to yeast expression data 76 experiments/microarrays 800 genes Bootstrap procedure on 200 subsampled datasets Sparse candidate as the Bayesian network learning algorithm

Bootstrap-based confidence differs between original and randomized data --- Randomized data Original data

Example of a high confidence sub-network One learned Bayesian networkBootstrapped confidence Bayesian network Highlights a subnetwork associated with yeast mating

Summary Network inference from expression provides a promising approach to identify cellular networks Bayesian networks are one representation of networks that have a probabilistic and graphical component Network inference naturally translates to learning problems in Bayesian networks. – Network inference is computationally challenge Successful application of Bayesian network learning algorithms to expression data requires additional considerations – Reduce potential parents: statistically or using biological knowledge – Bootstrap based confidence estimation – Permutation based assessment of confidence

Probabilistic Graphical models for molecular networks Sushmita Roy BMI/CS 576 Nov 11 th, 2014.

Similar presentations

Presentation on theme: "Probabilistic Graphical models for molecular networks Sushmita Roy BMI/CS 576 Nov 11 th, 2014."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Probabilistic Graphical models for molecular networks Sushmita Roy BMI/CS 576 Nov 11 th, 2014.

Similar presentations

Presentation on theme: "Probabilistic Graphical models for molecular networks Sushmita Roy BMI/CS 576 Nov 11 th, 2014."— Presentation transcript:

Similar presentations

About project

Feedback