Bayesian network for gene regulatory network construction

Slides:



Advertisements
Similar presentations
Towards Data Mining Without Information on Knowledge Structure
Advertisements

TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Analysis of Computer Algorithms
Energy-Efficient Distributed Algorithms for Ad hoc Wireless Networks Gopal Pandurangan Department of Computer Science Purdue University.
Introduction to Algorithms 6.046J/18.401J/SMA5503
Sampling Research Questions
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
Year 6 mental test 5 second questions
Reductions Complexity ©D.Moshkovitz.
THERMAL-AWARE BUS-DRIVEN FLOORPLANNING PO-HSUN WU & TSUNG-YI HO Department of Computer Science and Information Engineering, National Cheng Kung University.
1 WHY MAKING BAYESIAN NETWORKS BAYESIAN MAKES SENSE. Dawn E. Holmes Department of Statistics and Applied Probability University of California, Santa Barbara.
ABC Technology Project
1 Learning Causal Structure from Observational and Experimental Data Richard Scheines Carnegie Mellon University.
1. 2 No lecture on Wed February 8th Thursday 9 th Feb 14: :00 Thursday 9 th Feb 14: :00.
Squares and Square Root WALK. Solve each problem REVIEW:
Routing and Congestion Problems in General Networks Presented by Jun Zou CAS 744.
CPSC 422, Lecture 11Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 11 Jan, 29, 2014.
© 2012 National Heart Foundation of Australia. Slide 2.
Chapter 5 Test Review Sections 5-1 through 5-4.
GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.
1 Using Bayesian Network for combining classifiers Leonardo Nogueira Matos Departamento de Computação Universidade Federal de Sergipe.
Addition 1’s to 20.
25 seconds left…...
Week 1.
We will resume in: 25 Minutes.
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
Local Search Jim Little UBC CS 322 – CSP October 3, 2014 Textbook §4.8
CPSC 322, Lecture 14Slide 1 Local Search Computer Science cpsc322, Lecture 14 (Textbook Chpt 4.8) Oct, 5, 2012.
IP, IST, José Bioucas, Probability The mathematical language to quantify uncertainty  Observation mechanism:  Priors:  Parameters Role in inverse.
TASK: Skill Development A proportional relationship is a set of equivalent ratios. Equivalent ratios have equal values using different numbers. Creating.
9. Two Functions of Two Random Variables
1 Undirected Graphical Models Graphical Models – Carlos Guestrin Carnegie Mellon University October 29 th, 2008 Readings: K&F: 4.1, 4.2, 4.3, 4.4,
Basics of Statistical Estimation
A Tutorial on Learning with Bayesian Networks
all-pairs shortest paths in undirected graphs
1 ECE 776 Project Information-theoretic Approaches for Sensor Selection and Placement in Sensor Networks for Target Localization and Tracking Renita Machado.
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Bayesian Networks VISA Hyoungjune Yi. BN – Intro. Introduced by Pearl (1986 ) Resembles human reasoning Causal relationship Decision support system/ Expert.
Introduction of Probabilistic Reasoning and Bayesian Networks
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.
Simulation and Application on learning gene causal relationships Xin Zhang.
Causal Models, Learning Algorithms and their Application to Performance Modeling Jan Lemeire Parallel Systems lab November 15 th 2006.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
A Brief Introduction to Graphical Models
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
Selected Topics in Graphical Models Petr Šimeček.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Learning the Structure of Related Tasks Presented by Lihan He Machine Learning Reading Group Duke University 02/03/2006 A. Niculescu-Mizil, R. Caruana.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
K2 Algorithm Presentation KDD Lab, CIS Department, KSU
Lecture 2: Statistical learning primer for biologists
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.
Bayes network inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y 
Introduction on Graphic Models
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
An Algorithm to Learn the Structure of a Bayesian Network Çiğdem Gündüz Olcay Taner Yıldız Ethem Alpaydın Computer Engineering Taner Bilgiç Industrial.
CS 2750: Machine Learning Directed Graphical Models
Qian Liu CSE spring University of Pennsylvania
Learning Bayesian Network Models from Data
Presented by Uroš Midić
Presentation transcript:

Bayesian network for gene regulatory network construction Jin Chen CSE891-001 2012 Fall

Layout Bayesian network learning Scalability and Precision Large-scale learning algorithms Integrative approaches

Bayesian network - concept A Bayesian network X is a probabilistic graphical model that represents a set of random variables and their conditional dependencies via a directed acyclic graph nodes  variable; edges  conditional dependency Disconnected nodes  variables are conditionally independent of each other Each node is associated with a probability function that takes as input a set of values for the node's parent and gives the probability of the variable represented by the node X is a Bayesian network with respect to G if its joint probability density function can be written as: adopted from Wikipedia

Bayesian network - example Bayesian network Structure: there are 2 events which could cause grass to be wet: either the sprinkler is on or it's raining. The rain has a direct effect on the use of the sprinkler. The conditional probability tables (CPT) are learned from historical data. Then the joint probability is P(G,S,R) = P(G|S,R)P(S|R)P(R) adopted from Wikipedia

Bayesian network - example What is the probability that it is raining, given the grass is wet? What is the probability that it is raining, given the grass is wet? adopted from Wikipedia

Bayesian network – structure learning In the simplest case, a Bayesian network structure is specified by an expert and is then used to perform inference In the cases that the task of defining the network structure is too complex for humans, the network structure and the parameters of the local distributions must be learned from data Automatically learning the structure of a Bayesian network is a challenge pursued within machine learning Methods of structural learning usually uses optimization based search, which requires a scoring function and a search strategy The time requirement of an exhaustive search returning back a structure that maximizes the score is super-exponential in the number of variables A common scoring function is posterior probability of the structure given the training data.

Bayesian network learning for gene regulatory networks Bayesian networks are well suited to model relationships between genes because: BN uses an acyclic direct graph to denote the relationship between the variables of interest (genes), thus can naturally model causal relationships between genes BN has a solid theoretical foundation and offers a probabilistic approach to accommodate the variations typically observed in microarray experiments BN can accommodate missing data and incorporate prior knowledge through prior distribution of the parameters

Gene regulatory network construction Various GRN structure learning approaches Pair-wise comparison Differential equation estimation Bayesian network learning Common problem: only a relatively small number of genes were included into the network Recent studies have been targeted at deriving the large-scale or even complete networks using heterogeneous functional genomics data as well as gene expression data

Gene regulatory network construction Use a combination of scoring approaches and K2 algorithm to maximize the computational efficiency of network inference Step 1. Construct an undirected network based on mutual information (MI). This allows us to search the best DAG in a reduced space Step 2. Assign directions to the edges. The undirected network is split into sub-networks. Given the node ordering information, the sub-networks are trained with K2 algorithm sequentially. For each sub-network, the directions of edges can be identified based on the BDe score The degree of dependency between random variables will give us an approximate estimate of how the variables in the network are related

Constructing undirected networks Construct undirected networks based on mutual information (MI). MI between two variables X & Y, denoted by I(X; Y), is defined as the amount of information shared between the two variables. It is used to detect general dependencies in data where

Constructing undirected networks MI measures the dependency between two random variables The greater the MI values I(X; Y), the more closely the two variables are related If there is a direct edge in GRN between X and Y, there exists a strong dependency between X and Y This allows us to search the best DAG only in a reduced space

Graph splitting Every node and all its neighbors form a sub-network For each sub-network, K2 algorithm is used to find the optimal edge orientations that maximize BDe score (Bayesian Dirichlet equivalence) This is reasonable because to maximize the BDe for the whole network, one only need to find all the sub-networks with the best BDe scores Cooper,G.F. and Herskovits,E. (1992) A Bayesian method for the induction of probabilistic networks from data. Mach. Learn., 9, 309–347.

Decide the order of sub-networks In each sub-network, K2 algorithm is run to obtain the best directed sub-network structure The K2 result of one sub-network may affect the topology of other sub-networks. Thus we need to decide the order of the sub-networks for K2 algorithm Ordering: for each node in the whole undirected network, the number of edges connecting to it is counted; nodes are then sorted in descending order

K2 algorithm http://web.cs.wpi.edu/~cs539/s05/Projects/k2_algorithm.pdf

Scoring function

http://web.cs.wpi.edu/~cs539/s05/Projects/k2_algorithm.pdf

Performance Correct Edges Miss Wrong orientation Wrong connection

Small network

Large network

Further improvement Ko et al further developed a new Bayesian network, in which Gaussian mixture models is used to describe continuous gene expression data and learn gene pathways Data discretization is often required since many approaches to learn network structures were developed for binary or discrete input data The discretization of continuous values can result in loss of information and different discretizations can substantially change the input values and the inferred network Ko et al. Inference of Gene Pathways Using Gaussian Mixture Models. IEEE International Conference on Bioinformatics and Biomedicine. pp 362-367. 2007

Integrative approaches Tamada et al. Bioinformatics Vol. 19 Suppl. 2 2003, pages ii227–ii236

Dynamic approaches Reconstruct gene regulatory networks from expression data using dynamic Bayesian network (DBN) Zou M, Conzen SD: A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data. Bioinformatics 2005, 21(1):71-79.