Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

Linear Models for Microarray Data
Active Appearance Models
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
Dynamic Bayesian Networks (DBNs)
CAVEAT 1 MICROARRAY EXPERIMENTS ARE EXPENSIVE AND COMPLICATED. MICROARRAY EXPERIMENTS ARE THE STARTING POINT FOR RESEARCH. MICROARRAY EXPERIMENTS CANNOT.
. Inferring Subnetworks from Perturbed Expression Profiles D. Pe’er A. Regev G. Elidan N. Friedman.
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Cs726 Modeling regulatory networks in cells using Bayesian networks Golan Yona Department of Computer Science Cornell University.
Statistical Analysis of Microarray Data
Exploring Network Inference Models Math-in-Industry Camp & Workshop: Michael Grigsby: Cal Poly, Pomona Mustafa Kesir: Northeastern University Nancy Rodriguez:
6. Gene Regulatory Networks
Bayesian Networks Alan Ritter.
CPSC 422, Lecture 18Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 18 Feb, 25, 2015 Slide Sources Raymond J. Mooney University of.
Data Mining – Intro.
Cristina Manfredotti D.I.S.Co. Università di Milano - Bicocca An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data Cristina Manfredotti.
Inferring subnetworks from perturbed expression profiles Dana Pe’er, Aviv Regev, Gal Elidan and Nir Friedman Bioinformatics, Vol.17 Suppl
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Learning In Bayesian Networks. Learning Problem Set of random variables X = {W, X, Y, Z, …} Training set D = { x 1, x 2, …, x N }  Each observation specifies.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
CAUSAL SEARCH IN THE REAL WORLD. A menu of topics  Some real-world challenges:  Convergence & error bounds  Sample selection bias  Simpson’s paradox.
Bayes Net Perspectives on Causation and Causal Inference
1 Part 2 Automatically Identifying and Measuring Latent Variables for Causal Theorizing.
B. RAMAMURTHY EAP#2: Data Mining, Statistical Analysis and Predictive Analytics for Automotive Domain CSE651C, B. Ramamurthy 1 6/28/2014.
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
Gene Set Enrichment Analysis (GSEA)
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Learning Structure in Bayes Nets (Typically also learn CPTs here) Given the set of random variables (features), the space of all possible networks.
Using Bayesian Networks to Analyze Expression Data By Friedman Nir, Linial Michal, Nachman Iftach, Pe'er Dana (2000) Presented by Nikolaos Aravanis Lysimachos.
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
1 Robot Environment Interaction Environment perception provides information about the environment’s state, and it tends to increase the robot’s knowledge.
Geo597 Geostatistics Ch9 Random Function Models.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Bayesian Statistics and Belief Networks. Overview Book: Ch 13,14 Refresher on Probability Bayesian classifiers Belief Networks / Bayesian Networks.
Class Prediction and Discovery Using Gene Expression Data Donna K. Slonim, Pablo Tamayo, Jill P. Mesirov, Todd R. Golub, Eric S. Lander 발표자 : 이인희.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
CHAPTER 5 Probability Theory (continued) Introduction to Bayesian Networks.
Suppose we have T genes which we measured under two experimental conditions (Ctl and Nic) in n replicated experiments t i * and p i are the t-statistic.
Chapter 16 Social Statistics. Chapter Outline The Origins of the Elaboration Model The Elaboration Paradigm Elaboration and Ex Post Facto Hypothesizing.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 8- 1.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
04/21/2005 CS673 1 Being Bayesian About Network Structure A Bayesian Approach to Structure Discovery in Bayesian Networks Nir Friedman and Daphne Koller.
Using Bayesian Networks to Analayze Expression Data
Non-parametric Approaches The Bootstrap. Non-parametric? Non-parametric or distribution-free tests have more lax and/or different assumptions Properties:
Gaussian Process Networks Nir Friedman and Iftach Nachman UAI-2K.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Fast Exact Bayes Net Structure Learning Daniel Eaton Tuesday Oct 31, 2006 relatively-speakingly-
Statistical Concepts Basic Principles An Overview of Today’s Class What: Inductive inference on characterizing a population Why : How will doing this allow.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
A Simple Approach to Ranking Differentially Expressed Gene Expression Time Courses through Gaussian Process Regression By Alfredo A Kalaitzis and Neil.
Bud Mishra Professor of Computer Science and Mathematics 12 ¦ 3 ¦ 2001
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Bayesian Statistics and Belief Networks
Network Inference Chris Holmes Oxford Centre for Gene Function, &,
Statistical Data Analysis
Machine Learning: Lecture 6
Machine Learning: UNIT-3 CHAPTER-1
Presentation transcript:

Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University

What I will cover Domain background Overview of their work Causal networks vs. Bayes networks Application Results

BACKGROUND INFORMATION

What are gene expressions? – It is the process in which information is used in the synthesis of a functional gene product (protein or Rna). Think of it as a menu for a dinner given a certain holiday. – Need certain ingredients / food to pull it off right. – Too much or too little of something can lead to odd results.

Advancement in technology lead to DNA Microarrays. – Snapshot of internals of a cell at a given moment in time. – No more having to look at one gene at a time for comparison. Most computational analysis has focused on clustering algorithms. – Cluster like genes with like genes. – Useful for finding co-regulated genes but not really for finding the structure of the regulation process.

OVERVIEW

Overview How to discover key relations in cellular systems given large amounts of micro array data. Propose a Bayesian Network framework for gene interaction discovery from micro array data. – Trying to build statistical dependencies. – Understand interactions from multiple expression measurements.

Overview Want to uncover properties of the network by examining the dependence and conditional dependence of the gene data. – How does one gene interact with another etc. – Can use this information to determine causal influence.

BAYES NETS

Bayesian Network

Useful for a few reasons – Great for describing locally interacting entities. – Well understood array of algorithms and successful use in many areas. – Can be used to infer a causal network even though they are not mathematically defined as such. – Able to handle noise fairly well.

Causal Network Very similar to a typical Bayesian net. Bayesian network with a strict requirement that the relationships are causal. – X causes something about Y. Learning multiple networks with the same directed path could mean there is a causal indication between X and Y.

Bayes vs Causal Bayesian Network generally deals with dependence. Causal Networks deal with strict relationships. Bayesian Network can have equivalent networks. – X  Y is equivalent to Y  X Causal Network – The above cannot hold due to the definition of Causal networks.

Learning Causal Patterns Need to determine a causal interpretation of the network. Observation – Passive domain measurement. Intervention – Setting variable values using outside forces.

Causal Markov assumption Given the values of a variables immediate causes, it is independent of its earlier causes. – Once we know the makeup of the genes parents, we don’t care about the ancestors anymore in terms of the current gene.

Analyzing Expression Data Consider distributions over all possible states ( can include environmental states etc) State of the system is a series of random variables. – Each random variable denotes expression level of each gene. Take all of these variables and build the joint distribution.

Difficult to learn from expression data due to involving transcript levels from thousands of genes! However these gene networks are sparse so Bayes Nets are still well suited.

Learning the model Markov relations are a feature that indicates if two genes are related in a joint biological process. Order relations are a feature that captures a global property about the network. – Used as an indication of some causality between X and Y. Its not certain though.

Confidence of features Produce m different networks and for each feature of interest calculate its confidence. Where f(G) is 1 if f is a feature of G, 0 otherwise.

Learning the network structure Issues – Extremely large search space (super-exponential in the number of variables) Need to id potential parents for each gene using simple statistics to build the network. – Reduces search space to networks that only contain the candidate parents as parents of some variables X i.

Different local probability models Multinomial Model – Treat each variable as discrete and learn multinomial distribution to describe the possible state of each child given the stat of the parents. Linear Gaussian Model – Linear regression model for the child given its parents.

Results Applied Cell Cycle Expression patterns. 76 gene expression measurements. Treat each measurement as an independent sample. Performed the boot strapping algorithm along with the sparse search algorithm to extract learned features. – Performed on only 250 genes

Test robustness Tested their confidence assessment by using a randomly created data set. Random permutation of the order of experiments per gene. – Found that random data did not perform well due to not finding real features that correspond in the data. – Tells us that the learned features are not artifacts of the boot strapping estimation.

Managed to extract plausible biological knowledge without use of priors. Framework builds a much “richer” structure from the data compared to clustering techniques. Capable of discovering causal relationships between genes from expression data.