Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.

Slides:



Advertisements
Similar presentations
Yinyin Yuan and Chang-Tsun Li Computer Science Department
Advertisements

Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
. Inferring Subnetworks from Perturbed Expression Profiles D. Pe’er A. Regev G. Elidan N. Friedman.
EE462 MLCV Lecture Introduction of Graphical Models Markov Random Fields Segmentation Tae-Kyun Kim 1.
Learning Module Networks Eran Segal Stanford University Joint work with: Dana Pe’er (Hebrew U.) Daphne Koller (Stanford) Aviv Regev (Harvard) Nir Friedman.
Clustering approaches for high- throughput data Sushmita Roy BMI/CS 576 Nov 12 th, 2013.
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Overview Full Bayesian Learning MAP learning
Review: Bayesian learning and inference
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Goal: Reconstruct Cellular Networks Biocarta. Conditions Genes.
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
Module Networks Discovering Regulatory Modules and their Condition Specific Regulators from Gene Expression Data Cohen Jony.
Probabilistic Graphical models for molecular networks Sushmita Roy BMI/CS 576 Nov 11 th, 2014.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Learning HMM parameters Sushmita Roy BMI/CS 576 Oct 21 st, 2014.
Probabilistic methods for phylogenetic trees (Part 2)
1 Bayesian Networks Chapter ; 14.4 CS 63 Adapted from slides by Tim Finin and Marie desJardins. Some material borrowed from Lise Getoor.
Cristina Manfredotti D.I.S.Co. Università di Milano - Bicocca An Introduction to the Use of Bayesian Network to Analyze Gene Expression Data Cristina Manfredotti.
Bayesian networks More commonly called graphical models A way to depict conditional independence relationships between random variables A compact specification.
Ensemble Learning (2), Tree and Forest
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Inferring Cellular Networks Using Probabilistic Graphical Models Jianlin Cheng, PhD University of Missouri 2009.
Dependency networks Sushmita Roy BMI/CS 576 Nov 26 th, 2013.
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Outline Who regulates whom and when? Model Learning algorithm Evaluation Wet lab experiments Perspective: why does it work? Reg. ACGTGC.
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
Lectures 2 – Oct 3, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Class review Sushmita Roy BMI/CS 576 Dec 11 th, 2014.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
Learning Linear Causal Models Oksana Kohutyuk ComS 673 Spring 2005 Department of Computer Science Iowa State University.
Lectures 9 – Oct 26, 2011 CSE 527 Computational Biology, Fall 2011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 12:00-1:20 Johnson Hall.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Copyright © 2010 SAS Institute Inc. All rights reserved. Decision Trees Using SAS Sylvain Tremblay SAS Canada – Education SAS Halifax Regional User Group.
Conditional Probability Distributions Eran Segal Weizmann Institute.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Review: Bayesian inference  A general scenario:  Query variables: X  Evidence (observed) variables and their values: E = e  Unobserved variables: Y.
Dependency networks Sushmita Roy BMI/CS 576 Nov 25 th, 2014.
Introduction to biological molecular networks
Cluster validation Integration ICES Bioinformatics.
Bayesian networks and their application in circuit reliability estimation Erin Taylor.
Flat clustering approaches
Module Networks BMI/CS 576 Mark Craven December 2007.
Learning disjunctions in Geronimo’s regression trees Felix Sanchez Garcia supervised by Prof. Dana Pe’er.
1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Sporadic model building for efficiency enhancement of the hierarchical BOA Genetic Programming and Evolvable Machines (2008) 9: Martin Pelikan, Kumara.
Probabilistic Approaches to Phylogenies BMI/CS 576 Sushmita Roy Oct 2 nd, 2014.
Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Network applications Sushmita Roy BMI/CS 576 Dec 9 th, 2014.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Dependency Networks for Inference, Collaborative filtering, and Data Visualization Heckerman et al. Microsoft Research J. of Machine Learning Research.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
Hierarchical Agglomerative Clustering on graphs
Multi-task learning approaches to modeling context-specific networks
Inferring Models of cis-Regulatory Modules using Information Theory
Hierarchical clustering approaches for high-throughput data
Finding modules on graphs
Clustering.
Evaluation of inferred networks
Class #19 – Tuesday, November 3
Class #16 – Tuesday, October 26
Presentation transcript:

Module networks Sushmita Roy BMI/CS Nov 18 th & 20th, 2014

RECAP Probabilistic graphical models (PGMs) provide a natural representation of molecular networks Learning PGMs from data requires us to solve two problems – Parameter estimation given a graph structure – Structure learning which requires us to learn both structure and parameters Bayesian networks are a type of PGMs popular for representing molecular networks Learning Bayesian networks from gene expression data requires additional considerations – Sparse candidate algorithm: heuristics to reduce the number of candidate parents – Bootstrap to assess model confidence – Module networks

What you should know Motivation of module networks What is a module network? Compare and contrast module network learning algorithm to other network inference algorithms – E.g. Sparse candidate

Module Networks Motivation: – Most complex systems have too many variables – Not enough data to robustly learn networks – Large networks are hard to interpret Key idea: Group similarly behaving variables into “modules” and learn the same parents and parameters for each module Relevance to gene regulatory networks – Genes that are co-expressed are likely regulated in similar ways Segal et al 2005

Definition of a module Statistical definition (specific to module networks by Segal 2005) – A set of random variables that share a statistical model Biological definition of a module – Set of genes that are co-expressed and co- regulated

An expression module Set of genes that behave similarly across conditions Modules Gasch & Eisen, 2002 Genes

Key questions of Module Networks How to represent the Conditional Probability Distributions (CPD) for children? – Regression Tree How to learn module networks?

Defining a Module Network A probabilistic graphical model over N random variables Set of module variables M 1.. M K Module assignments A that specifies the module (1-to-K) for each X i CPD per module P(M j |Pa Mj ), Pa Mj are parents of module M j – Each variable X i in M j has the same conditional distribution

Bayesian network vs Module network Each variable takes three values: UP, DOWN, SAME

Bayesian network vs Module network Bayesian network – Different CPD per random variable – Learning only requires to search for parents Module network – CPD per module Same CPD for all random variables in the same module – Learning requires parent search and module membership assignment

Learning a Module Network Given training dataset, fixed number of modules ( K ) Learn – Module assignments A of each variable to a module – The parents of each module

Score of a Module network Module network Data K : number of modules, X j : j th module Pa Mj Parents of module M j Likelihood of module j

Module network learning algorithm

Module assignment search Happens in two places Module initialization – Interpret as clustering of the random variables Module re-assignment

Module initialization as clustering of variables for module network

Module re-assignment Must preserve the acyclic graph structure Must improve score Module re-assignment happens using a sequential update procedure: – Update only one variable at a time – The change in score of moving a variable from one module to another while keeping the other variables fixed

Module re-assignment via sequential update

Representing the Conditional probability distribution X i are continuous variables How to represent the distribution of X i given the state of its parents? How to capture context-specific dependencies? Module networks use a regression tree

Modeling the relationship between regulators and targets suppose we have a set of (8) genes that all have in their upstream regions the same activator/repressor binding sites

A regression tree A rooted binary tree T Each node in the tree is either an interior node or a leaf node Interior nodes are labeled with a binary test X i <u, u is a real number observed in the data Leaf nodes are associated with univariate distributions of the child

A regression tree to capture a CPD X 1 > e 1 X 2 > e 2 YES NO YES Leaf Interior node X3X3 X1X1 X2X2 Expression of gene represented by X 3 modeled using Gaussians at each leaf node e 1, e 2 are values seen in the data

An example regression tree for a Module network

A very simple regression tree X2X2 X3X3 e1e1 e2e2 X 2 > e 1 X 2 > e 2 YES NO YES X3X3

Algorithm for growing a regression tree Input: dataset D, child variable X i, candidate parents C i of X i Output: Tree T Initialize T to a leaf node, estimated from all samples of X i While not converged – For every leaf node l in T Find with the best split at l If split improves score – add two leaf nodes, i and j below l – Update samples and parameters associated with, i and j

Learning a regression tree Assume we are searching for the parents of a variable X 3 and it already has two parents X 1 and X 2 X 4 will be considered using “split” operations of existing leaf nodes X 1 > e 1 X 2 > e 2 YES NO YES X 4 > e 3 NOYES X 1 > e 1 X 2 > e 2 YES NO YES N1N1 N2N2 N3N3 N l: Gaussian associated with leaf l N2N2 N3N3 N4N4 N5N5

Convergence in regression tree Depth of tree Improvement in score Maximum number of parents Minimum number of samples per leaf node

Assessing the value of using Module Networks Using simulated data – Generate data from a known module network – Known module network was in turn learned from real data 10 modules, 500 variables – Evaluate using Test data likelihood Recovery of true parent-child relationships are recovered in learned module network Using gene expression data – External validation of modules (Gene ontology, motif enrichment) – Cross-check with literature

Test data likelihood Each line type represents size of training data 10 Modules is the best for almost all training data set sizes

Recovery of graph structure

Application of Module networks to yeast expression data Segal et al, Regev, Pe’er, Gasch 2005

Module networks has better performance than simple Bayesian network Gain in test data likelihood over Bayesian network using expression data

The Respiration and Carbon Module Regulation tree

Global View of Modules modules for common processes often share common – regulators – binding site motifs

Summary Module networks – A type of Bayesian network – Identifies modules (sets of similarly behaving random variables) and learns parents for each module – Conditional probability distributions capture “rules” of regulatory relationships – Learning requires inferring parent->module relationships and module assignments In practice give more realistic networks compared to Bayesian networks