Inferring gene regulatory networks from multiple microarray datasets (Wang 2006) Tiffany Ko ELE571 Spring 2009.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Bayesian Belief Propagation
Yinyin Yuan and Chang-Tsun Li Computer Science Department
DREAM4 Puzzle – inferring network structure from microarray data Qiong Cheng.
Slides from: Doug Gray, David Poole
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
Computer vision: models, learning and inference Chapter 8 Regression.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
Face Alignment with Part-Based Modeling
Chapter 2: Lasso for linear models
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
Clustered alignments of gene- expression time series data Adam A. Smith, Aaron Vollrath, Cristopher A. Bradfield and Mark Craven Department of Biosatatistics.
Active Calibration of Cameras: Theory and Implementation Anup Basu Sung Huh CPSC 643 Individual Presentation II March 4 th,
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Principal Component Analysis
Mutual Information Mathematical Biology Seminar
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Jana van Greunen - 228a1 Analysis of Localization Algorithms for Sensor Networks Jana van Greunen.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems
Assigning Numbers to the Arrows Parameterizing a Gene Regulation Network by using Accurate Expression Kinetics.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
Computer vision: models, learning and inference
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Using Bayesian Networks to Analyze Expression Data N. Friedman, M. Linial, I. Nachman, D. Hebrew University.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Comparison of methods for reconstruction of models for gene expression regulation A.A. Shadrin 1, *, I.N. Kiselev, 1 F.A. Kolpakov 2,1 1 Technological.
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
Microarrays to Functional Genomics: Generation of Transcriptional Networks from Microarray experiments Joshua Stender December 3, 2002 Department of Biochemistry.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Fast and incoherent dictionary learning algorithms with application to fMRI Authors: Vahid Abolghasemi Saideh Ferdowsi Saeid Sanei. Journal of Signal Processing.
Analysis of the yeast transcriptional regulatory network.
A ROBUST B AYESIAN TWO - SAMPLE TEST FOR DETECTING INTERVALS OF DIFFERENTIAL GENE EXPRESSION IN MICROARRAY TIME SERIES Oliver Stegle, Katherine Denby,
GRNmap and GRNsight June 24, Systems Biology Workflow DNA microarray data: wet lab-generated or published Generate gene regulatory network Modeling.
Efficient computation of Robust Low-Rank Matrix Approximations in the Presence of Missing Data using the L 1 Norm Anders Eriksson and Anton van den Hengel.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
Problem Limited number of experimental replications. Postgenomic data intrinsically noisy. Poor network reconstruction.
Sparse Signals Reconstruction Via Adaptive Iterative Greedy Algorithm Ahmed Aziz, Ahmed Salim, Walid Osamy Presenter : 張庭豪 International Journal of Computer.
IMPROVED RECONSTRUCTION OF IN SILICO GENE REGULATORY NETWORKS BY INTEGRATING KNOCKOUT AND PERTURBATION DATA Yip, K. Y., Alexander, R. P., Yan, K. K., &
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
Evaluation of gene-expression clustering via mutual information distance measure Ido Priness, Oded Maimon and Irad Ben-Gal BMC Bioinformatics, 2007.
GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function Sara Mostafavi, Debajyoti Ray, David Warde-Farley,
Globally Consistent Range Scan Alignment for Environment Mapping F. LU ∗ AND E. MILIOS Department of Computer Science, York University, North York, Ontario,
Learning Chaotic Dynamics from Time Series Data A Recurrent Support Vector Machine Approach Vinay Varadan.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
September 28, 2000 Improved Simultaneous Data Reconciliation, Bias Detection and Identification Using Mixed Integer Optimization Methods Presented by:
A comparative approach for gene network inference using time-series gene expression data Guillaume Bourque* and David Sankoff *Centre de Recherches Mathématiques,
1 Using Graph Theory to Analyze Gene Network Coherence José A. Lagares Jesús S. Aguilar Norberto Díaz-Díaz Francisco A. Gómez-Vela
Finding Motifs Vasileios Hatzivassiloglou University of Texas at Dallas.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
An unsupervised conditional random fields approach for clustering gene expression time series Chang-Tsun Li, Yinyin Yuan and Roland Wilson Bioinformatics,
Jianchao Yang, John Wright, Thomas Huang, Yi Ma CVPR 2008 Image Super-Resolution as Sparse Representation of Raw Image Patches.
Compressive Coded Aperture Video Reconstruction
Multi-task learning approaches to modeling context-specific networks
Multiplicative updates for L1-regularized regression
Learning Sequence Motif Models Using Expectation Maximization (EM)
Bud Mishra Professor of Computer Science and Mathematics 12 ¦ 3 ¦ 2001
RECURRENT NEURAL NETWORKS FOR VOICE ACTIVITY DETECTION
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
1 Department of Engineering, 2 Department of Mathematics,
SEG5010 Presentation Zhou Lanjun.
Predicting Gene Expression from Sequence
Presentation transcript:

Inferring gene regulatory networks from multiple microarray datasets (Wang 2006) Tiffany Ko ELE571 Spring 2009

O UTLINE Introduction Gene Regulatory Networks DNA Microarrays Objectives Methods Approach: SVD GNR Algorithm Confidence Evaluation Results Simulated Data Experimental Data Discussion Limitations Conclusions

I NTRO

G ENE R EGULATORY N ETWORKS

G ENE R EGULATORY N ETWORKS

G ENE R EGULATORY N ETWORKS

DNA M ICROARRAYS Y-direction: genes X-direction: data points M x N matrix S M genes, N experiments Expression (color magnitude) representative of the number of probes which have bound to present complementary DNA templates. High number of genes, low number of samples/data points.

O BJECTIVES Purpose Construct a novel method of gene network reconstruction (GNR) which able to process a variety of multiple microarray datasets from difference experiments for inferring the most consistent gene network (GN) while taking into consideration sparsity of connections. Motivation Multiple datasets: addresses data scarcity and the “dimensionality problem” Improve inferred gene network reliability Derive gene networks with higher biologically plausible sparsity

M ETHODS

A PPROACH 1. Express Gene Networks (GN) as differential equations. 2. Derive a solution for a single time-course dataset using singular value decomposition (SVD). 3. Find the most consistent network structure with respect to all datasets. Optimal solution has minimal connections (edges).

A PPROACH 1. Express Gene Networks (GN) as differential equations. Gene regulation dynamics typically nonlinear, however linear equations capture main features of the network.

A PPROACH 2. Derive a solution for a single time-course dataset using singular value decomposition (SVD).

A PPROACH 2. Derive a solution for a single time-course dataset using singular value decomposition (SVD). SVD : nonzero elements of e k listed last, s.t. e 1 = … = e l, e l+1, …, e n ≠ 0. Allows for particular solution with the smallest L2 norm for the connectivity matrix, Ĵ.

A PPROACH 2. Derive a solution for a single time-course dataset using singular value decomposition (SVD).

A PPROACH 3. Find the most consistent network structure with respect to all datasets. Multiple, N, microarray datasets for one organism exists; each corresponds to its own general solution, J. J k is already normalized in time due to definition of X’. LP problem posed:

A PPROACH 3. Find the most consistent network structure with respect to all datasets. Matching Term Match most consistent solution with k’s solution Weighted by reliability Sparsity Term Forces sparsity by minimizing the L 1 norm Relative importance balanced by Matching Term Sparsity Term

GNR A LGORITHM When J is fixed, problem can be divided into N independent subproblems. Through iteration, J will then be updated based on results of Y. STEP 0: Initialize; set iteration index q = 1. STEP 1: Fix J (q-1) STEP 2: Fix J(q) STEP 3: Check for convergence; else return to STEP 1.

A LGORITHM : S TEP 0 Initialize: Using SVD, solve for the particular solution Set initial values: Ensure given parameters are positive. q = Iteration index, set

A LGORITHM : S TEP 1 Update J: At iteration q, with fixed, solve LP:

A LGORITHM : S TEP 2 & 3 STEP 2: Having solved for, fix all of and solve for J(q) : STEP 3: Check for convergence. Is ? Yes  Terminate computation. No  Return to STEP 1.

GNR A LGORITHM O VERVIEW

C ONFIDENCE E VALUATION Given the optimal solution is, we can compute for each element J ij : Variance Deviation Overall average deviation:

R ESULTS

S IMULATED D ATA Constructed a small simulated network with five genes, and noise function  (t) : Randomly chose 3 initial starting conditions. Produced 3 datasets with 4, 4, and 3 time points, respectively.

S IMULATED D ATA Assessed network recovering ability (Yeung 2002 criterion): Assessed accuracy of GNR

S IMULATED D ATA True Network = 0,  = 0 1 dataset = 0,  = 0 2 datasets = 0,  = 0 3 datasets No sparsity or noise factor Variant: # of data sets

= 0 1 dataset = datasets S IMULATED D ATA Gaussian noise distribution Variant: # of data sets, = 0 2 datasets = 0 3 datasets True Network

S IMULATED D ATA Adding datasets improves accuracy of network reconstruction GNR must balance between topology reconstruction accuracy and interaction strength accuracy. controls the trade-off between E0 and E1 (or E2). Adding datasets improves the confidence of network reconstruction.

S IMULATED D ATA GNR is able to accurately infer the GN solution to a highly under-determined problem given datasets with few time points and differing initial conditions. Network topology may still be correctly inferred in the presence of high noise by including a sparsity constraint at the expense of interaction strength accuracy. Larger simulated network structures were tested with similarly effective results.

E XPERIMENTAL D ATA Heat-Shock Response Data for Yeast 10 transcription factors 4 microarray datasets (Stanford Microarray Database)  7, 5, 5, 4 time points Correctly inferred 4 edges with documented, known regulation, and 1 edge with documented potential regulation.

E XPERIMENTAL D ATA Cell-cycle Data for Yeast 140 differentially expressed genes 4 datasets with differing experimental conditions Constructed sub-GN involving several genes with proven function within cell wall organization. (Circles in same color indicate same biological function.)

E XPERIMENTAL D ATA Stress Response Data for Arabidopsis Root experiments: 226 genes; Shoot experiments: 246 genes 9 datasets with 6+ time points for each root and shoot (

D ISCUSSION

L IMITATIONS Assumes the regulatory network remains stationary regardless of differing environmental conditions. Requires high resolution, high-quality, time-course datasets. Noise of gene expression data intrinsic to microarray technologies is a major source of error. Hidden regulatory factors may lead to implicit description errors. Inferred GN models predict, indiscriminately, both direct and indirect regulations due to hidden variables. Model edges correlate to net effect. Predicted regulatory relationship does not inherently correlate to regulation by a transcriptional factor.

C ONCLUSIONS Created a novel method to derive GN substructure using multiple microarray datasets instead of multiple inferred network alignment. Model can capture regulatory mechanisms at the protein and metabolite levels which cannot be physically measured. Capable of deriving a more global structure with dense connections, in addition to more local substructures with sparse connections by modifying the trade-off parameter,. Model is used most effectively in tandem with other information sources. FUTURE WORK: Extend GNR to identify conserved network patterns or motifs from the datasets of differing species.

T HE E ND Thank you for listening!