Using Bayesian Networks to Analayze Expression Data

Slides:



Advertisements
Similar presentations
A Tutorial on Learning with Bayesian Networks
Advertisements

Bayesian Network and Influence Diagram A Guide to Construction And Analysis.
. Inferring Subnetworks from Perturbed Expression Profiles D. Pe’er A. Regev G. Elidan N. Friedman.
Bayesian Networks VISA Hyoungjune Yi. BN – Intro. Introduced by Pearl (1986 ) Resembles human reasoning Causal relationship Decision support system/ Expert.
Introduction of Probabilistic Reasoning and Bayesian Networks
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
From: Probabilistic Methods for Bioinformatics - With an Introduction to Bayesian Networks By: Rich Neapolitan.
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
PGM 2003/04 Tirgul 3-4 The Bayesian Network Representation.
Cs726 Modeling regulatory networks in cells using Bayesian networks Golan Yona Department of Computer Science Cornell University.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Bayesian Network Representation Continued
Probabilistic Graphical Models Tool for representing complex systems and performing sophisticated reasoning tasks Fundamental notion: Modularity Complex.
Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
6. Gene Regulatory Networks
1 gR2002 Peter Spirtes Carnegie Mellon University.
Today Logistic Regression Decision Trees Redux Graphical Models
Bayesian Networks Alan Ritter.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
. DAGs, I-Maps, Factorization, d-Separation, Minimal I-Maps, Bayesian Networks Slides by Nir Friedman.
Causal Models, Learning Algorithms and their Application to Performance Modeling Jan Lemeire Parallel Systems lab November 15 th 2006.
Cristina Manfredotti D.I.S.Co. Università di Milano - Bicocca An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data Cristina Manfredotti.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Bayes Net Perspectives on Causation and Causal Inference
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
1 Institute of Engineering Mechanics Leopold-Franzens University Innsbruck, Austria, EU H.J. Pradlwarter and G.I. Schuëller Confidence.
Using Bayesian Networks to Analyze Expression Data By Friedman Nir, Linial Michal, Nachman Iftach, Pe'er Dana (2000) Presented by Nikolaos Aravanis Lysimachos.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Data Analysis with Bayesian Networks: A Bootstrap Approach Nir Friedman, Moises Goldszmidt, and Abraham Wyner, UAI99.
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
Lectures 2 – Oct 3, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Course files
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Cognitive Computer Vision Kingsley Sage and Hilary Buxton Prepared under ECVision Specific Action 8-3
Probability Course web page: vision.cis.udel.edu/cv March 19, 2003  Lecture 15.
Conditional Probability Distributions Eran Segal Weizmann Institute.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Lecture 2: Statistical learning primer for biologists
Exploiting Structure in Probability Distributions Irit Gat-Viks Based on presentation and lecture notes of Nir Friedman, Hebrew University.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
Machine Learning 5. Parametric Methods.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
A comparative approach for gene network inference using time-series gene expression data Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques,
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
Introduction on Graphic Models
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
1 BN Semantics 3 – Now it’s personal! Parameter Learning 1 Graphical Models – Carlos Guestrin Carnegie Mellon University September 22 nd, 2006 Readings:
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11 CS479/679 Pattern Recognition Dr. George Bebis.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
CS 2750: Machine Learning Directed Graphical Models
Qian Liu CSE spring University of Pennsylvania
Markov Properties of Directed Acyclic Graphs
Building and Analyzing Genome-Wide Gene Disruption Networks
Read R&N Ch Next lecture: Read R&N
Center for Causal Discovery: Summer Short Course/Datathon
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Parametric Methods Berlin Chen, 2005 References:
BN Semantics 3 – Now it’s personal! Parameter Learning 1
Read R&N Ch Next lecture: Read R&N
Presentation transcript:

Using Bayesian Networks to Analayze Expression Data Shelly Bar-Nahor

Today 1. Introduction to Bayesian Networks. 2. Describe a method for recovering gene interactions from microarray data, using tools for learning Bayesian Networks. 3. Apply the method to the S. cerevisiae cell-cycle measurements.

Bayesian Networks – Compact representation of probability distributions via conditional independence. The representation consists of two components: G - a directed acyclic graph (DAG) Nodes – Random Variables (X1, …,Xn ). Vertices – direct influence. 2. Θ – a set of conditional probability distributions. Together, these 2 components specify a unique distribution on (X1, …,Xn ). A C B P(C|A) c0 a0 a1 c1 0.95 0.05 0.1 0.9

The graph G encodes the markov assumption: Each Variable Xi is independent of it’s non-descendants given it’s parents in G. By applying the chain rull of probabilities - P(X1 ,…,Xn) = ∏ P(Xi | PaG(Xi)) P(Xi | PaG(Xi)) is the conditional distribution for each variable Xi. We denote the parameters that specify these distributions by Θ. n

To do so, we will use the ‘Likelihood Function’: The Learning Problem: Let m be the number of samples and n the number of variables. Given a training set D=(X1, …,Xm), where Xi = (Xi1, …,Xin) , find a network B=<G,Θ> that best matches D. Let’s assume we have the graph’s structure. We now want to estimate the parameter Θ given the data D. To do so, we will use the ‘Likelihood Function’: L(Θ:D) = P(D| Θ) = P(X1=x[1], … ,Xm=x[m]| Θ) = P(x[i] | Θ) ∏ m

Learning Parameters - Likelihood Function E[1] B[1] A[1] C[1] : : : : E[m] B[m] A[m] C[m] D = B E A C Assume i.i.d samples, the Likelihood functions is: L(Θ:D) = P(E[m],B[m],A[m],C[m] | Θ) ∏ m

Learning Parameters - Likelihood Function L(Θ:D) = P(E[m],B[m],A[m],C[m] : Θ) = ∏ m B E A C ∏ m P(E[m] : Θ) . P(B[m] : Θ) . P(A[m] | B[m],E[m] : Θ) . P(C[m] | A[m] : Θ) = P(E[m]:Θ) P(B[m]:Θ) P(A[m]|B[m],E[m]:Θ) . P(C[m]|A[m]:Θ) ∏ m

Learning Parameters - Likelihood Function General Bayesian Networks L(Θ:D) = P(X1=x[1], … ,Xm=x[m]| Θ) = P(Xi[m]|Pai[m]: Θi) = Li(Θi:D) Decomposition – Independent estimation problems. ∏ m ∏ ∏ ∏ i m i

MLE - maximum likelihood estimator Learning Parameters MLE - maximum likelihood estimator Bayesian Inference – Learning using bayes rule Represent uncertainty about the sampling using a bayesian network: Θ X[1] X[2] X[m] . . . . . . . The values of X are independent given Θ P(X[m] | Θ ) = Θ Bayesian prediction is inference in this network

Θ P(Θ|x[1],…,x[m]) = Bayesian prediction is inference in this network . . . . . . . X[m+1] Observed Data query Bayesian prediction is inference in this network P(x[m+1]| x[1],…,x[m]) = ∫ P(x[m+1]| Θ,x[1],…,x[m])P(Θ|x[1],…x[m])d Θ = ∫ P(x[m+1]| Θ)P(Θ|x[1],…,x[m])d Θ 1 1 Likelihood prior P(x[1],…,x[m]| Θ) P(Θ) P(Θ|x[1],…,x[m]) = P(x[1],…,x[m]) Bayes rule Probability of data

Equivalence Classes of Bayesian Netorks Problem: The joint probability represented by a graph can equaly be represented by another one. A C B P(x) = P(A)P(C|A)P(B|C) P(x) = P(C)P(A|C)P(B|C) P(C|A)P(A)= P(A|C)P(C) In the same way Ind(G) – set of independence statements that hold in all distributions the markov assumption. G and G’ are aquivalent if Ind(G) = Ind(G’)

Equivalence Classes of Bayesian Networks Two DAGs are equivalent if and only if they have the same underlying undirected graph and the same v-structure. We will represent an equivealence class of netwrok structures by a partially DAG (PDAG), where a directed edge denotes all members of the class contain the same directed arc, and an undirected edge otherwise.

Learning Causal Patterns We want to model the mechanism that generate the adependencies (e.g. gene transcriptions). A Causal Networks is a model of such causal processes. representation: a DAG where each node represents a random variable with a local probabilty model. The parents of a variable are its immediate cause.

Learning Causal Patterns Observations – a passive mesurement of out domain (I.e. a sample from X) Intervention – setting the values of some variable using forces outside the causal model . A causal Network models not only the distribution of the observation, but also the effects of interventions.

Learning Causal Patterns If X causes Y, then manipulating the value of X affect the value of Y. If Y couses X, then manipulating the value of X will not affect Y. X Y and Y X are equivalent Bayseian networks. Causal Markov Assumption: given the values of a variable’s immdeiate causes, it is independent of it’s earlier causes.

Learning Causal Patterns When making the causal Markov assumption a causal network can be interpreted as a Bayesian Network. From obeservations alone we cannot distinguish between causal networks that belong to the same equivalence class. From a directed edge in the PDAG we can infer a causal direction.

So Far… The likelihood Function Parameter estimation and the decomposition principle Equivalence Classes of Bayesian Netorks. Learning Causal Patterns

Analyzing Expression Data Present modeling assumptions Find high scoring networks. Characterize features. Estimate Statistical Confidence in features. Present local probability models.

Modeling assumptions We consider probability distributions over all possible states of the system. A state is discribed using random variables. Random Variables – expression level of individual gene expreimental conditions temporal indicators (time/stage the sample was taken). background variables (which clinical procedure was used to take the sample)

Find high scoring networks. Analyzing Expression Data Present modeling assumptions Find high scoring networks. Characterize features. Estimate Statistical Confidence in features. Present local probability models.

Priors over parameters The Lerning Problem: Given a training set D=(X1, …,XN) of independent instances X, find an equivalence class of networks B=<G, Θ> that best matches D. We will use a scoring system: S(G:D) = logP(G|D) = … = logP(D|G) + logP(G) + C P(D|G) = ∫P(D|G, Θ)P(Θ|G)d Θ Likelihood Priors over parameters Marginal Likelihood prior

The learning problem - Scoring, cont. Properties of the selected priors: structure equivalent decomposable S(G:D) = ∑ ScoreContribution(Xi,Pa(Xi)) : D) Now, learning amounts to finding structure G that maximize the score. This problem is NP-hard We resort to huristic search

Local Search Strategy Using decomposition, we can change one arc and evaluate the gains made by this change Initial structure G Neighboring structures G’ A B A A B B C C C If and arc to Xi is added or deleted, only score(Xi, Pa(Xi)) needs to be evaluated. If an arc is reversed only score(Xi, Pa(Xi)) and score(Xj, Pa(Xj)) need to be evaluated.

Find High-Scoring Networks Problem: Small data sets are not sufficiently informative to determine that a single model is the “right” model (or equivalence class of models). Solution: analyze a set of high-scoring networks. Attempt to characterize features that are common to most of these networks and focus on learning them.

Characterize features. Analyzing Expression Data Present modeling assumptions Find high scoring networks. Characterize features. Estimate Statistical Confidence in features. Present local probability models.

Features We will use two classes of features involving pairs of variables. Markov Relations – Is Y in the Markov Blanket of X? Y is in X’s Markov Blanket if and only if there is either an edge between them, or both are parents of another variable. A Markov Realation indicates that the two genes are related in some joint biological interaction or process.

Features 2. Order Realtions – Is X an ancestor of Y in all the networks of a given equivalence class? Does the PDAG contain a path from X to Y in which all the edges are directed? This is an indication that X might be a couse of Y.

Estimate Statistical Confidence in features. Analyzing Expression Data Present modeling assumptions Find high scoring networks. Characterize features. Estimate Statistical Confidence in features. Present local probability models.

Estimate Statistical Confidence in Features We want to estimate to what extent the data support a given feature. We use the Bootstrap Method: We generate “purturbed” versions of the original data, and learn from them. We should be more confident on features that would still be induced from the “purturbed” data.

Estimate Statistical Confidence in Features We use the Bootstrap as foolows: For i=1 … m (in our experiments m=200) Re-sample with replacement N instance of D. denote Di the resulting dataset. Apply the learning procedure on Di induce a network structure Gi. For each feature f of interest calculate: conf(f) = 1/m ∑f(Gi) f(G) is 1 if f is a feature in G, and 0 otherwise.

Estimate Statistical Confidence in Features Features induced with high confidence are rarely false positives. The bootstrap procedure is especially robust fot the Markov and order features. The conclusions that can be established on high confidence features are reliable even in cases where the data sets are small for the model being induced.

Present local probability models. Analyzing Expression Data Present modeling assumptions Find high scoring networks. Characterize features. Estimate Statistical Confidence in features. Present local probability models.

Local Probability Models In order to specify a Bayesian network model we need to choose the type of local probability model we use. The choice of representation depends on the type of variables we use: Discrete Variables – can be represented with a table. Continuous variables – no representation for all possible densities.

Local Probability Models We consider two approaches: Multinomial Model – treat each variable as discrete and learn a multinomial distribution that describes the probability of each possible state of a child variable given the state of it’s parent. We descretize by setting a threshold to the ratio between measured expression and control: values lower than 2-0.5 are under_expressed(-1), and higher then 20.5 are over_expressed(1).

Local Probability Models 2. Linear Gaussian model – Learn a linear regression model for child variable give it’s parents. If U1,…,Uk are parent of variable X, then P(X|u1, …,uk) ~ N(a0 + ∑ai·ui, σ2). That is, X is normally distributed aroud a mean that depends linearly on the values of its parents.

Local Probability Models In the multinomial model, By Discretizing the mesaured expression levels we loose information. The linear-Gaussian model can only detect dependencies that are close to linear. In particular it is not likely to discover combinatorial effects (e.g. a gene is over expressed if and only if certain several genes are jointly over expressed).

Application to Cell Cycle Expression Patterns Data from Spellman et al. ‘Comprehensive identification of cell cycle-regulates genes of the yeast sacccharomyces cerevisia by microarray hybridization’, Mullecular Biology of the Cell. Contains 76 gene expression measurements of the mRNA levels of yeast. Spellman et al identified 800 genes whose expression varied over the different cell-cycle stages.

Application to Cell Cycle Expression Patterns Treat each mesaurement as an independent sample, and do not take into account the temporal aspect of the mesaurement. Compensate by adding to the root of all learned networks, a variable denoting the cell cycle phase. Performed 2 experiments: one with the multinomial distribution, and the other with linear Gausian distribution. http://www.cs.huji.ac.il/~nirf/GeneExpression/top800/

Robustness Analysis – Credibility of confidence assessment Linear Gaussian - Order Number of features with confidence equal or higer then the x-value Confidence threshold

Robustness Analysis – Credibility of confidence assessment Multinomial - Order Number of features with confidence equal or higer then the x-value Confidence threshold

Robustness Analysis – Adding more genes Multinomial model x: confidence with 250 genes, y: with 800

Robustness Analysis – discretization The descritization methos penalizes genes whose natural range of variation is small since a fixed threshold is used. Avoid the problem by normalizing the expression of genes in the data. The top 20 Markov relations highlighted by this method were a bit different, and the order relation were more robust. Possibly because order relations depend on the network structure and not local.

Robustness Analysis – compare between the linear-Gausian and multinomial experiments. X-axis: confidence in the multinomial experiment Y-axis: confidence in the Linear Gaussian experiment

Biological Analysis – Order Relations Found existence of dominant genes. Out of all 800 genes only few seem to dominant the order. Among them are genes that are: directly involove in initiation of the cell-cycle and its control. components of pre-replication complexes. involved in DNA repair ,which are associated with transcription initiation.

Biological Analysis - Markov Relations Among top scoring relations, all involving two known genes make sense biology. Several of the unknown pairs are phsically adjacent on the chromozome, and persumably regulatedby the same mechanism Some relations are beyond limitations of clustering.

Example: CLN2,RNR3,SVS1,SRO4,RAD51 all appear in the same cluster by spellman et al. In our network CLN2 is a parent of the other 4, while no links were found between them. This suit biological knowledege: CLN2 is a central and early cell-cycle control, while there is no clear biological relationship between the others.

discussion The approach is capable of handling noise and estimating the confidence in the different features in the network. managed to extract many biologically plausible conclusions. Is capable of learning rich structures from the data, such as discovering causal relationships, and interaction between genes other then positive correlation.

discussion Ides: learn models over “clustered” genes. recover all relationships in one analysis. Improving the confidence estimation. Incorporating biological knowledge as prior knowledge to the analysis. Learn causal patterns, while adding intervantional data.

The End