BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks.

Slides:



Advertisements
Similar presentations
DREAM4 Puzzle – inferring network structure from microarray data Qiong Cheng.
Advertisements

Outlines Background & motivation Algorithms overview
Brief introduction on Logistic Regression
Inferring Quantitative Models of Regulatory Networks From Expression Data Iftach Nachman Hebrew University Aviv Regev Harvard Nir Friedman Hebrew University.
Computational discovery of gene modules and regulatory networks Ziv Bar-Joseph et al (2003) Presented By: Dan Baluta.
Experimental Design, Response Surface Analysis, and Optimization
CSE Fall. Summary Goal: infer models of transcriptional regulation with annotated molecular interaction graphs The attributes in the model.
A Probabilistic Dynamical Model for Quantitative Inference of the Regulatory Mechanism of Transcription Guido Sanguinetti, Magnus Rattray and Neil D. Lawrence.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
x – independent variable (input)
Mutual Information Mathematical Biology Seminar
Dynamic Network Inference Most statistical work is done on gene regulatory networks, while inference of metabolic pathways and signaling networks are done.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
CISC667, F05, Lec26, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Genetic networks and gene expression data.
Statistical Methods Chichang Jou Tamkang University.
Functional genomics and inferring regulatory pathways with gene expression data.
Integrated analysis of regulatory and metabolic networks reveals novel regulatory mechanisms in Saccharomyces cerevisiae Speaker: Zhu YANG 6 th step, 2006.
6. Gene Regulatory Networks
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Assigning Numbers to the Arrows Parameterizing a Gene Regulation Network by using Accurate Expression Kinetics.
Engineering Gene Networks: Integrating Synthetic Biology & Systems Biology James J. Collins Center for BioDynamics and Department of Biomedical Engineering.
Epistasis Analysis Using Microarrays Chris Workman.
國立陽明大學生資學程 陳虹瑋. Genetic Algorithm Background Fitness function ……. population selection Cross over mutation Fitness values Random cross over.
Classification and Prediction: Regression Analysis
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
Gaussian Processes for Transcription Factor Protein Inference Neil D. Lawrence, Guido Sanguinetti and Magnus Rattray.
Genome of the week - Deinococcus radiodurans Highly resistant to DNA damage –Most radiation resistant organism known Multiple genetic elements –2 chromosomes,
PATTERN RECOGNITION AND MACHINE LEARNING
Genetic network inference: from co-expression clustering to reverse engineering Patrik D’haeseleer,Shoudan Liang and Roland Somogyi.
What Is a Gene Network?. Gene Regulatory Systems “Programs built into the DNA of every animal.” Eric H. Davidson.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
BsysE595 Lecture Basic modeling approaches for engineering systems – Summary and Review Shulin Chen January 10, 2013.
Analytical vs. Numerical Minimization Each experimental data point, l, has an error, ε l, associated with it ‣ Difference between the experimentally measured.
Genetic modification of flux (GMF) for flux prediction of mutants Kyushu Institute of Technology Quanyu Zhao, Hiroyuki Kurata.
Modeling and identification of biological networks Esa Pitkänen Seminar on Computational Systems Biology Department of Computer Science University.
Microarrays to Functional Genomics: Generation of Transcriptional Networks from Microarray experiments Joshua Stender December 3, 2002 Department of Biochemistry.
ResponseNet revealing signaling and regulatory networks linking genetic and transcriptomic screening data CSE Fall.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Reconstruction of Transcriptional Regulatory Networks
Combinatorial State Equations and Gene Regulation Jay Raol and Steven J. Cox Computational and Applied Mathematics Rice University.
BIOINFORMATICS ON NETWORKS Nick Sahinidis University of Illinois at Urbana-Champaign Chemical and Biomolecular Engineering.
Analysis of the yeast transcriptional regulatory network.
1 Departament of Bioengineering, University of California 2 Harvard Medical School Department of Genetics Metabolic Flux Balance Analysis and the in Silico.
Metabolic Network Inference from Multiple Types of Genomic Data Yoshihiro Yamanishi Centre de Bio-informatique, Ecole des Mines de Paris.
IMPROVED RECONSTRUCTION OF IN SILICO GENE REGULATORY NETWORKS BY INTEGRATING KNOCKOUT AND PERTURBATION DATA Yip, K. Y., Alexander, R. P., Yan, K. K., &
Introduction to biological molecular networks
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
A comparative approach for gene network inference using time-series gene expression data Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques,
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
MathematicalMarketing Slide 5.1 OLS Chapter 5: Ordinary Least Square Regression We will be discussing  The Linear Regression Model  Estimation of the.
Ch 1. Introduction (Latter) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by J.W. Ha Biointelligence Laboratory, Seoul National.
Analyzing circadian expression data by harmonic regression based on autoregressive spectral estimation Rendong Yang and Zhen Su Division of Bioinformatics,
1 CISC 841 Bioinformatics (Fall 2008) Review Session.
Methods of Presenting and Interpreting Information Class 9.
Inferring Regulatory Networks from Gene Expression Data BMI/CS 776 Mark Craven April 2002.
Journal club Jun , Zhen.
Chapter 7. Classification and Prediction
State Space Representation
Bud Mishra Professor of Computer Science and Mathematics 12 ¦ 3 ¦ 2001
Overview of Supervised Learning
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
CISC 841 Bioinformatics (Spring 2006) Inference of Biological Networks
1 Department of Engineering, 2 Department of Mathematics,
State Space Analysis UNIT-V.
Network Inference Chris Holmes Oxford Centre for Gene Function, &,
CISC 667 Intro to Bioinformatics (Spring 2007) Genetic networks and gene expression data CISC667, S07, Lec24, Liao.
Label propagation algorithm
Presentation transcript:

BENG 203: Genomics, Proteomics & Network Biology Trey Ideker Vineet Bafna Inferring gene regulatory networks

Gardner, di Bernardo, Lorenz, and Collins. Inferring Genetic Networks and Identifying Compound Mode of Action via Expression Profiling. Science 301, pp (2003) Cell Jan 21;144(2): Densely interconnected transcriptional circuits control cell states in human hematopoiesis. Novershtern N, Subramanian A, Lawton LN, Mak RH, Haining WN, McConkey ME, Habib N, Yosef N, Chang CY, Shay T, Frampton GM, Drake AC, Leskov I, Nilsson B, Preffer F, Dombkowski D, Evans JW, Liefeld T, Smutko JS, Chen J, Friedman N, Young RA, Golub TR, Regev A, Ebert BL. Reading assignment

Early efforts for network inference: OUTLINE Boolean Networks –Gene expression state space –Discrete Dynamical Systems Reverse Engineering of Networks –Entropy –Mutual Information –REVEAL algorithm Other methods: –Linear and non-linear regression –Bayesian inference methods

From tutorial by D’Haeseleer, Liang, and Somogyi PSB (2001)

Where should state 4 appear?

State space and attractor basins

What are some biological interpretations of basins and attractors?

Entropy “H” Measures amount of information in a signal High information content = high disorder Low Low High H(x) = – p 1 log 2 p 1 – p 2 log 2 p 2 – … – p m log 2 p m Arguments p 1, p 2, …, p m are the probabilities (frequencies) for the m possible values of a signal x (probs. must sum to 1) Maximum entropy is obtained when all values are equally likely; it approaches 0 when one value dominates What is the maximum entropy for a binary signal?

H p(x)p(x) Entropy for m = 2

What is the maximum entropy possible in this case?

Mutual Information Indicates ability to predict value of one variable given the value of the other. A low value = low predictive ability (independence) A high value = high predictive ability

H(Y)H(X) M(X,Y) H(Y|X)H(X|Y) H(X,Y) Venn diagram representation

ABC A′B′C′C′ A′ = B B′ = A or C C′ = (A and B) or (B and C) or (A and C)

REVEAL Algorithm To determine the gene input(s) for gene output Y′, identify any gene X for which: H(X,Y′) = H(X) Considering the entropy of the joint output/input is no greater than the input alone—i.e., output is completely determined from input. An alternate view in terms of mutual information: M(X,Y′) / H(Y′) = 1

Example: A′ is completely predicted by B What about other gene outputs?

REVEAL (continued)

Results with REVEAL Liang, Furman and Somogyi (1998). REVEAL: A general reverse engineering algorithm for inference of genetic network architectures, Pac Symp Biocomp. Shown to correctly infer small (simulated) networks if given sufficient number of examples Data requirements growth exponentially, but can still provide likely results with limited data Correlation Metric Construction is a related method that is based on correlation instead of entropy (Adam Arkin and John Ross)

Modeling expression with differential equations Assumes network behavior can be modeled as a system of linear differential equations of the form: dx/dt = Ax + u x is a vector representing the continuous-valued levels (concentrations) of each network component A is the network model: an N x N matrix of coefficients describing how each x i is controlled by upstream genes x j, x k, etc. u is a vector representing an external additive perturbation to the system

An example: From discrete- to continuous-valued networks dx/dt = Ax + u dx 1 /dt = a 12 x 2  a 13 x 3 dx 2 /dt = a 21 x 1 dx 3 /dt = a 32 x 2 x1x2x3 x1x2x3 Three genes: x 1, x 2, x 3 x1 activates x2 x2 activates x1 and x3 x3 inhibits x1

The steady state assumption Near a steady-state point, expression levels do not change over time. Under the steady-state assumption, the model reduces to 0 = Ax + u  Ax =  u A straightforward method to infer A would be to apply N perturbations, u, to the network, in each case measuring steady-state expression levels for the x. However, in larger networks it may be impractical to apply so many perturbations As a simplifying assumption, consider that each gene has a maximum of k non-zero regulatory inputs.

The inference procedure Ax =  u Infer inputs to each gene separately For the given gene, consider all possible combinations of the k regulatory inputs For each combination, use multiple linear regression to determine optimal values of the k coefficients Choose the combination that fits the observed data with the least error

Multiple regression x u u =  Ax A is the fit x1x1 x2x2 u

Review of paper by Gardner et al: Gardner, di Bernardo, Lorenz, and Collins. Inferring Genetic Networks and Identifying Compound Mode of Action via Expression Profiling. Science 301, pp (2003)

Overview Systems study of the Escherichia coli DNA damage (SOS) response, a well-studied pathway. Systematic transcriptional perturbations (overexpressions) to nine SOS genes, characterized by the steady-state gene expression response of each. Use of multiple linear regression to determine a network of causal relations (connections) among these genes. A recent example from a larger body of work using systems linear equations to infer regulatory networks.

Application to E. coli SOS pathway The SOS pathway regulates cell survival and repair after exposure to DNA damage The known pathway involves three tiers of transcription factors (TFs): 1)lexA and recA 2)ssb, recF, dinI, umuDC 3)rpoD, rpoH, rpoS (so-called ‘sigma’ factors) And more than 30 downstream regulated genes… The known network involving these nine core genes was chosen as a proof-of-principle of the linear regression inference approach

Diagram of SOS pathway interactions

Experimental perturbations pBADX53 plasmid In each perturbation, a different gene (of the nine) was overexpressed with an arabinose-controlled expression plasmid RBS = Ribosome Binding Site

Experimental perturbations

Experimental measurements For each perturbation and for each of nine transcripts, steady- state expression levels were measured with quantitative real-time polymerase chain reaction (qPCR). The ratio of these perturbed levels to the unperturbed levels was computed. Mean and std error was computed over 16 replicate measurements.

Model inference These data were used as a training set to solve for the coefficients in the matrix A, i.e. the regulatory interaction model. The assumed number of inputs k = 5

Diagram of SOS pathway interactions

Actual method performance The inferred network was compared to the known ‘test’ network. Performance was evaluated as the number of connections in the test network that were resolved in the inferred network. Here, resolved means that there was a path between the two genes in the inferred network and that the overall sign (+ or , activation or inhibition) was also correct. Wait a minute, what are the implications of this? Coverage = identified connections/total true connections False Positive Rate = incorrectly identified connections/ total identified connections

Simulated algorithm performance for 9 perturbations 7 perturbation subset Actual experimental data noise = S x /  x

Simulated algorithm performance for 9 perturbations 7 perturbation subset Actual experimental data

Using the model predictively To what extent can the model predict expression changes that fall outside of the training set used to build it? Along these lines, Gardner et al. use the model to distinguish expression levels of genes that are directly targeted by a drug (the mode of action or MOA) vs. secondary effects. The direct targets represent the minimal set of genes that produce the observed expression pattern when externally perturbed.

Procedure of identifying drug MOA Measure expression changes x p resulting from treatment with drug The drug effect is an unknown external perturbation u p that produces the changes: u p =  Ax p As proof-of-concept, the following experiments were performed: 1)lexA/recA double perturbation 2)Mitomycin C (MMC) perturbation, known to activate recA through DNA damage

Identifying compound mode of action recA/lexAMMCPerturbation: The model is much more predictive than are expression data alone…