Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Bayesian network for gene regulatory network construction
A Tutorial on Learning with Bayesian Networks
Slides from: Doug Gray, David Poole
Bayesian Networks CSE 473. © Daniel S. Weld 2 Last Time Basic notions Atomic events Probabilities Joint distribution Inference by enumeration Independence.
BAYESIAN NETWORKS. Bayesian Network Motivation  We want a representation and reasoning system that is based on conditional independence  Compact yet.
. The sample complexity of learning Bayesian Networks Or Zuk*^, Shiri Margel* and Eytan Domany* *Dept. of Physics of Complex Systems Weizmann Inst. of.
Introduction of Probabilistic Reasoning and Bayesian Networks
Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman.
© 1998, Nir Friedman, U.C. Berkeley, and Moises Goldszmidt, SRI International. All rights reserved. Learning I Excerpts from Tutorial at:
Graphical Models - Learning -
Visual Recognition Tutorial
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
. Learning Bayesian networks Slides by Nir Friedman.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
6. Gene Regulatory Networks
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
1 gR2002 Peter Spirtes Carnegie Mellon University.
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
Bayesian Networks Alan Ritter.
Artificial Intelligence Term Project #3 Kyu-Baek Hwang Biointelligence Lab School of Computer Science and Engineering Seoul National University
Rutgers CS440, Fall 2003 Introduction to Statistical Learning Reading: Ch. 20, Sec. 1-4, AIMA 2 nd Ed.
Cristina Manfredotti D.I.S.Co. Università di Milano - Bicocca An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data Cristina Manfredotti.
Bayes Net Perspectives on Causation and Causal Inference
Dependency networks Sushmita Roy BMI/CS 576 Nov 26 th, 2013.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
Bayesian Networks for Data Mining David Heckerman Microsoft Research (Data Mining and Knowledge Discovery 1, (1997))
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
1 Instance-Based & Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.
Bayesian Learning Chapter Some material adapted from lecture notes by Lise Getoor and Ron Parr.
Lectures 2 – Oct 3, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.
1 CMSC 671 Fall 2001 Class #25-26 – Tuesday, November 27 / Thursday, November 29.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Computing & Information Sciences Kansas State University Data Sciences Summer Institute Multimodal Information Access and Synthesis Learning and Reasoning.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Marginalization & Conditioning Marginalization (summing out): for any sets of variables Y and Z: Conditioning(variant of marginalization):
Lecture 2: Statistical learning primer for biologists
Exploiting Structure in Probability Distributions Irit Gat-Viks Based on presentation and lecture notes of Nir Friedman, Hebrew University.
Learning and Acting with Bayes Nets Chapter 20.. Page 2 === A Network and a Training Data.
1 Param. Learning (MLE) Structure Learning The Good Graphical Models – Carlos Guestrin Carnegie Mellon University October 1 st, 2008 Readings: K&F:
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
Crash Course on Machine Learning Part VI Several slides from Derek Hoiem, Ben Taskar, Christopher Bishop, Lise Getoor.
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11 CS479/679 Pattern Recognition Dr. George Bebis.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
CS 2750: Machine Learning Directed Graphical Models
Probability Theory and Parameter Estimation I
Qian Liu CSE spring University of Pennsylvania
Ch3: Model Building through Regression
Learning Bayesian Network Models from Data
Irina Rish IBM T.J.Watson Research Center
CSCI 5822 Probabilistic Models of Human and Machine Learning
Bayesian Learning Chapter
Learning Bayesian networks
Presentation transcript:

Regulatory Network (Part II) 11/05/07

Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman et al. 2000; Friedman 2004)

Cell-cycle network Data (Spellman et al. 1998) 76 arrays 7 time points 6177 yeast genes 800 cell-cycle related genes identified

PCA Raychaudhuri et al. 2000

The PCA components identify the dominant modes of variation.

Limitations of PCA Does not directly associate regulators with their target genes. Alternatively, it can be interpreted as the network is fully connected. The expression of each gene is regulated by the linear combination of all other genes.

NIR Idea: The dynamics of gene activities can be approximated by gene expression levels approximately reach steady state. perturbation

NIR Solve for A This is unidentifiable since M << N. Add constraint that there are at most k- connections for any given gene (k < M). For each row, use multiple regression to find a linear combination of k-genes so that the least square error is minimal. #genes#perturbations

Application of NIR repression activation Known E Coli SOS pathway

Application of NIR Regression coefficients

Limitation of NIR True dynamics is nonlinear. The choice of k is ad hoc. Steady state approximation does not apply to oscillatory genes.

Bayesian network Directed acyclic graph (DAG) Nodes: random variables Edges: direct effect --- conditional dependency Friedman 2004

An example EarthquakeBurglary Radio Alarm Call

This is not a Bayesian network A B C

A B C D E Tree: a special kind of DAG Each node has only one parent node.

Advantage Intuitive --- popular among biologists Graph structure is easy to interpret Well-established probabilistic tools for DAG models. Support all the features for probabilistic learning –Model selection criteria –Handling of missing data

Known Structure, complete data E B A.9.1 e b e be b b e BEP(A | E,B) ?? e b e ?? ? ? ?? be b b e BE E B A Network structure is specified –Inducer needs to estimate parameters Data does not contain missing values Learner E, B, A. (Nir Friedman)

Unknown Structure, Complete Data E B A.9.1 e b e be b b e BEP(A | E,B) ?? e b e ?? ? ? ?? be b b e BE E B A Network structure is not specified –Inducer needs to select arcs & estimate parameters Data does not contain missing values E, B, A. Learner (Nir Friedman)

Learning parameters E B A C Training data has the form:

Likelihood Function E B A C Assume i.i.d. samples Likelihood function is

Likelihood Function E B A C By definition of network, we get

Likelihood Function E B A C Rewriting terms, we get

General Bayesian Networks Generalization for any Bayesian network: Parameters can be estimated independently!

Bayesian Inference Represent uncertainty about parameters using a probability distribution over parameters, data Using Bayes rule Common prior distributions: –Dirichlet (discrete) –Normal (continuous)

Why Struggle for Accurate Structure? Increases the number of parameters to be estimated Wrong assumptions about domain structure Cannot be compensated for by fitting parameters Wrong assumptions about domain structure EarthquakeAlarm Set Sound Burglary EarthquakeAlarm Set Sound Burglary Earthquake Alarm Set Sound Burglary Adding an arc Missing an arc

Score­based Learning E, B, A. E B A E B A E B A Search for a structure that maximizes the score Define scoring function that evaluates how well a structure matches the data G1G1 S(G 1 ) = 10S(G 2 ) = 1.5S(G 3 ) = 0.01 G2G2 G3G3

Max likelihood params Structure Score Likelihood score: Bayesian score: –Average over all possible parameter values Likelihood Prior over parameters Marginal Likelihood

Search for Optimal Network Structure Start with a given network –empty network –best tree –a random network At each iteration –Evaluate all possible changes –Apply change based on score Stop when no modification improves score

Typical operations: S C E D Reverse C  E Delete C  E Add C  D S C E D S C E D S C E D Search for Optimal Network Structure

Typical operations: S C E D Reverse C  E Delete C  E Add C  D S C E D S C E D S C E D  score = S({C,E}  D) - S({E}  D) Search for Optimal Network Structure At each iteration only need to score the site that is being updated !

Structure Discovery Task: Discover structural properties –Is there a direct connection between X & Y –Does X separate between two “subsystems” –Does X causally effect Y Example: scientific data mining –Disease properties and symptoms –Interactions between the expression of genes

Discovering Structure –There may be many high scoring models –Answer should not be based on any single model –Want to average over many models E R B A C E R B A C E R B A C E R B A C E R B A C P(G|D)

Cell-cycle network Friedman et al 2000

Limitations for Bayesian network Computationally costly –It is NP hard problem to identify the globally optimal network structure Heuristic approaches may be trapped to local maxima. Prior distribution for DAGs is tricky. In practice, failure to find more difficult network structures than cell-cycle data.

Equivalence of graphs When two DAGs can represent the same set of conditional independence assertions, we say that these DAGs are equivalent YZYZ Are these graphs equivalent?

X YZ X YZ

Therefore, the exact graph is unidentifiable!

Reading List Raychaudhuri et al –Apply PCA to analyze gene expression Gardner et al –Developed NIR to find regulatory network Friedman et al –Applied Bayesian network to analysis cell- cycle network. Friedman 2004 –Review of probabilistic graphic models.

Acknowledgement Some of the slides are obtained from Nir Friedman