A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong.

Slides:



Advertisements
Similar presentations
Gene Correlation Networks
Advertisements

Using genetic markers to orient the edges in quantitative trait networks: the NEO software Steve Horvath dissertation work of Jason Aten Aten JE, Fuller.
Bayesian Factor Regression Models in the “Large p, Small n” Paradigm Mike West, Duke University Presented by: John Paisley Duke University.
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
Functional Organization of the Transcriptome in Human Brain Michael C. Oldham Laboratory of Daniel H. Geschwind, UCLA BIOCOMP ‘08, Las Vegas, NV July 15,
Andy Yip, Steve Horvath Depts Human Genetics and Biostatistics, University of California, Los Angeles The Generalized Topological.
University at BuffaloThe State University of New York Interactive Exploration of Coherent Patterns in Time-series Gene Expression Data Daxin Jiang Jian.
Weighted Gene Co-Expression Network Analysis of Multiple Independent Lung Cancer Data Sets Steve Horvath University of California, Los Angeles.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
衛資所 生物資訊組 陳俊宇 April 07, 03. graph nodeedge Chromosomegenepositional correlations Pathwayenzymefunctional correlations Gene expression genecoexpressed.
Steve Horvath University of California, Los Angeles
Steve Horvath University of California, Los Angeles
Steve Horvath, Andy Yip Depts Human Genetics and Biostatistics, University of California, Los Angeles The Generalized Topological.
Is Forkhead Box N1 (FOXN1) significant in both men and women diagnosed with Chronic Fatigue Syndrome? Charlyn Suarez.
Is my network module preserved and reproducible? PloS Comp Biol. 7(1): e Steve Horvath Peter Langfelder University of California, Los Angeles.
Consensus eigengene networks: Studying relationships between gene co-expression modules across networks Peter Langfelder Dept. of Human Genetics, UC Los.
Empirical evaluation of prediction- and correlation network methods applied to genomic data Steve Horvath University of California, Los Angeles.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Flexible and Robust Co-Regularized Multi-Domain Graph Clustering Wei Cheng 1 Xiang Zhang 2 Zhishan Guo.
Steve Horvath University of California, Los Angeles
Introduction to Bioinformatics Algorithms Clustering and Microarray Analysis.
Steve Horvath University of California, Los Angeles Weighted Correlation Network Analysis and Systems Biologic Applications.
1 Chapter 10 Correlation and Regression We deal with two variables, x and y. Main goal: Investigate how x and y are related, or correlated; how much they.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.
Ai Li and Steve Horvath Depts Human Genetics and Biostatistics, University of California, Los Angeles Generalizations of.
An Overview of Weighted Gene Co-Expression Network Analysis
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-1 Review and Preview.
“An Extension of Weighted Gene Co-Expression Network Analysis to Include Signed Interactions” Michael Mason Department of Statistics, UCLA.
Bioinformatics Dealing with expression data Kristel Van Steen, PhD, ScD Université de Liege - Institut Montefiore
Clustering of DNA Microarray Data Michael Slifker CIS 526.
Signed weighted gene co-expression network analysis of transcriptional regulation in murine embryonic stem cells ES cell culture Self- renewing Ecto- derm.
Steve Horvath University of California, Los Angeles Module preservation statistics.
Meta Analysis and Differential Network Analysis with Applications in Mouse Expression Data Today you’ve heard quite a bit about weighted gene coexpression.
Slides are based on Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems.
Extended Overview of Weighted Gene Co-Expression Network Analysis (WGCNA) Steve Horvath University of California, Los Angeles.
Differential Network Analysis in Mouse Expression Data Tova Fuller Steve Horvath Department of Human Genetics University of California, Los Angeles BIOCOMP’07.
Steve Horvath Co-authors: Zhang Y, Langfelder P, Kahn RS, Boks MPM, van Eijk K, van den Berg LH, Ophoff RA Aging effects on DNA methylation modules in.
Expression Modules Brian S. Yandell (with slides from Steve Horvath, UCLA, and Mark Keller, UW-Madison)
A Graph-based Friend Recommendation System Using Genetic Algorithm
Network Construction “A General Framework for Weighted Gene Co-Expression Network Analysis” Steve Horvath Human Genetics and Biostatistics University of.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
Distances Between Genes and Samples Naomi Altman Oct. 06.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Differential analysis of Eigengene Networks: Finding And Analyzing Shared Modules Across Multiple Microarray Datasets Peter Langfelder and Steve Horvath.
Understanding Network Concepts in Modules Dong J, Horvath S (2007) BMC Systems Biology 2007, 1:24.
Application of Class Discovery and Class Prediction Methods to Microarray Data Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 22.
Hierarchy Overview Background: Hierarchy surrounds us: what is it? Micro foundations of social stratification Ivan Chase: Structure from process Action.
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Introduction to Matrices and Statistics in SNA Laura L. Hansen Department of Sociology UMB SNA Workshop July 31, 2008 (SOURCE: Introduction to Social Network.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Consensus modules: modules present across multiple data sets Peter Langfelder and Steve Horvath Eigengene networks for studying the relationships between.
Network applications Sushmita Roy BMI/CS 576 Dec 9 th, 2014.
Steve Horvath University of California, Los Angeles Module preservation statistics.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Steve Horvath University of California, Los Angeles Weighted Correlation Network Analysis and Systems Biologic Applications.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Graph clustering to detect network modules
A General Framework for Weighted Gene Co-Expression Network Analysis
Correlation – Regression
SIMPLE LINEAR REGRESSION MODEL
Network analysis.
Correlation and Regression
Topological overlap matrix (TOM) plots of weighted, gene coexpression networks constructed from one mouse studies (A–F) and four human studies including.
SEG5010 Presentation Zhou Lanjun.
Volume 3, Issue 1, Pages (July 2016)
Volume 37, Issue 6, Pages (December 2012)
Inferring Cellular Processes from Coexpressing Genes
Presentation transcript:

A Geometric Interpretation of Gene Co-Expression Network Analysis Steve Horvath, Jun Dong

Outline Network and network concepts Approximately factorizable networks Gene Co-expression Network –Eigengene Factorizability, Eigengene Conformity –Eigengene-based network concepts What can we learn from the geometric interpretation?

Network=Adjacency Matrix A network can be represented by an adjacency matrix, A=[a ij ], that encodes whether/how a pair of nodes is connected. –A is a symmetric matrix with entries in [0,1] –For unweighted network, entries are 1 or 0 depending on whether or not 2 nodes are adjacent (connected) –For weighted networks, the adjacency matrix reports the connection strength between node pairs –Our convention: diagonal elements of A are all 1.

Motivational example I: Pair-wise relationships between genes across different mouse tissues and genders Challenge: Develop simple descriptive measures that describe the patterns. Solution: The following network concepts are useful: density, centralization, clustering coefficient, heterogeneity

Motivational example (continued) Challenge: Find a simple measure for describing the relationship between gene significance and connectivity Solution: network concept called hub gene significance

Backgrounds Network concepts are also known as network statistics or network indices –Examples: connectivity (degree), clustering coefficient, topological overlap, etc Network concepts underlie network language and systems biological modeling. Dozens of potentially useful network concepts are known from graph theory.

Review of some fundamental network concepts which are defined for all networks (not just co-expression networks)

Connectivity Node connectivity = row sum of the adjacency matrix –For unweighted networks=number of direct neighbors –For weighted networks= sum of connection strengths to other nodes

Density Density= mean adjacency Highly related to mean connectivity

Centralization Centralization = 1 because it has a star topology Centralization = 0 because all nodes have the same connectivity of 2 = 1 if the network has a star topology = 0 if all nodes have the same connectivity

Heterogeneity Heterogeneity: coefficient of variation of the connectivity Highly heterogeneous networks exhibit hubs

Clustering Coefficient Measures the cliquishness of a particular node « A node is cliquish if its neighbors know each other » Clustering Coef of the black node = 0 Clustering Coef = 1 This generalizes directly to weighted networks (Zhang and Horvath 2005)

The topological overlap dissimilarity is used as input of hierarchical clustering Generalized in Zhang and Horvath (2005) to the case of weighted networks Generalized in Li and Horvath (2006) to multiple nodes Generalized in Yip and Horvath (2007) to higher order interactions

Network Significance Defined as average gene significance We often refer to the network significance of a module network as module significance.

Hub Gene Significance= slope of the regression line (intercept=0)

Q: What do all of these fundamental network concepts have in common? They are functions of the adjacency matrix A and/or a gene significance measure GS.

CHALLENGE Find relationships between these and other seemingly disparate network concepts. For general networks, this is a difficult problem. But a solution exists for a special subclass of networks: approximately factorizable networks

Definition of an approximately factorizable network Why is this relevant? Answer: Because modules are often approximately factorizable

Algorithmic definition of the conformity and a measure of factorizability

Empirical Observation 1 Sub-networks comprised of module genes tend to be approximately factorizable, i.e. This observation implies the following observation 2… Empirical evidence is provided in the following article: Dong J, Horvath S (2007) Understanding Network Concepts in Modules BMC Systems Biology 2007, 1:24

Observation 2: Approximate relationships among network concepts in approximately factorizable networks

Drosophila PPI module networks: the relationship between fundamental network concepts.

What if we focus on gene co- expression network?

Weighted Gene Co-expression Network

Module Eigengene= measure of over- expression=average redness Rows,=genes, Columns=microarray The brown module eigengenes across samples

Recall that the module eigengene is defined by the singular value decomposition of X X=gene expression data of a module Aside: gene expressions (rows) have been standardized across samples (columns)

Question: When are co-expression modules factorizable?

Question: Characterize gene expression data X that lead to an approximately factorizable correlation matrix

Note that a factorizable correlation matrix implies a factorizable weighted co-expression network We refer to the following as weighted eigengene conformity

If

Theoretical relationships in co- expression modules with high eigengene factorizability

What can network theorists learn from the geometric interpretation? Some examples…

Problem Show that genes that lie intermediate between two distinct co-expression modules cannot be hub genes in these modules.

gene 2 gene 1 k(2) intermediate hub in module 1 eigengene E2 eigengene E1 Geometric Solution

Problem Setting: a co-expression network and a trait based gene significance measure GS(i)=|cor(x(i),T)| Describe a situation when the sample trait (T1) leads to a trait-based gene significance measure with low hub gene significance Describe a situation when the sample trait (T2) leads to a trait-based gene significance measure with high hub gene significance

Intramodular Connectivity k Gene Significance GS2(x)=|cor(x,T2)| GS1(x)=|cor(x,T1)| Another way of stating the problem: Find T2 and T1 such that

gene 2 gene 1 Sample Trait T2 cor(E,T2) k(2) k(1) Sample Trait T1 GS1 (1) eigengene E Solution

What can a microarray data analyst learn from the geometric interpretation?

Some insights Intramodular hub gene= a genes that is highly correlated with the module eigengene, i.e. it is a good representative of a module Gene screening strategies that use intramodular connectivity amount to path-way based gene screening methods Intramodular connectivity is a highly reproducible “fuzzy” measure of module membership. Network concepts are useful for describing pairwise interaction patterns.

The module eigengene is highly correlated with the most highly connected hub gene.

Dictionary for translating between general network terms and the eigengene-based counterparts.

If also

Summary The unification of co-expression network methods with traditional data mining methods can inform the application and development of systems biologic methods. We study network concepts in special types of networks, which we refer to as approximately factorizable networks. We find that modules often are approximately factorizable We characterize co-expression modules that are approximately factorizable We provide a dictionary for relating fundamental network concepts to eigengene based concepts We characterize coexpression networks where hub genes are significant with respect to a microarray sample trait We show that intramodular connectivity can be interpreted as a fuzzy measure of module membership.

Summary Cont’d We provide a geometric interpretation of important network concepts (e.g. hub gene significance, module significance) These theoretical results have important applications for describing pathways of interacting genes They also inform novel module detection procedures and gene selection procedures.

Acknowledgement Biostatistics/Bioinformatics Tova Fuller Peter Langfelder Ai Li Wen Lin Mike Mason Angela Presson Lin Wang Andy Yip Wei Zhao Brain Cancer/Yeast Paul Mischel Stan Nelson Marc Carlson Comparison Human- Chimp Dan Geschwind Mike Oldham Giovanni Mouse Data Jake Lusis Tom Drake Anatole Ghazalpour Atila Van Nas

APPENDIX (back up slides)

Steps for constructing a co-expression network Hi A)Microarray gene expression data B)Measure concordance of gene expression with a Pearson correlation C) The Pearson correlation matrix is either dichotomized to arrive at an adjacency matrix  unweighted network Or transformed continuously with the power adjacency function  weighted network

Definition of module (cluster) Module=cluster of highly connected nodes –Any clustering method that results in such sets is suitable We define modules as branches of a hierarchical clustering tree using the topological overlap matrix

Relationship between Module significance and hub gene significance

Application: Brain Cancer Data