Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.

Slides:



Advertisements
Similar presentations
The Primal-Dual Method: Steiner Forest TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A AA A A A AA A A.
Advertisements

Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
O(N 1.5 ) divide-and-conquer technique for Minimum Spanning Tree problem Step 1: Divide the graph into  N sub-graph by clustering. Step 2: Solve each.
www.brainybetty.com1 MAVisto A tool for the exploration of network motifs By Guo Chuan & Shi Jiayi.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
1 Modularity and Community Structure in Networks* Final project *Based on a paper by M.E.J Newman in PNAS 2006.
A hub-attachment based method to detect functional modules from confidence-scored protein interactions and expression profiles Authors: Chia-Hao Chin 1,4,
Communities in Heterogeneous Networks Chapter 4 1 Chapter 4, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool,
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.
HCS Clustering Algorithm
Gene Co-expression Network Analysis BMI 730 Kun Huang Department of Biomedical Informatics Ohio State University.
University of CreteCS4831 The use of Minimum Spanning Trees in microarray expression data Gkirtzou Ekaterini.
衛資所 生物資訊組 陳俊宇 April 07, 03. graph nodeedge Chromosomegenepositional correlations Pathwayenzymefunctional correlations Gene expression genecoexpressed.
CSE 222 Systems Programming Graph Theory Basics Dr. Jim Holten.
Tracking Moving Objects in Anonymized Trajectories Nikolay Vyahhi 1, Spiridon Bakiras 2, Panos Kalnis 3, and Gabriel Ghinita 3 1 St. Petersburg State University.
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
The Shortest Path Problem
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE
The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
COMMUNITIES IN MULTI-MODE NETWORKS 1. Heterogeneous Network Heterogeneous kinds of objects in social media – YouTube Users, tags, videos, ads – Del.icio.us.
Graph mining in bioinformatics Laur Tooming. Graphs in biology Graphs are often used in bioinformatics for describing processes in the cell Vertices are.
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
Outlier Detection Using k-Nearest Neighbour Graph Ville Hautamäki, Ismo Kärkkäinen and Pasi Fränti Department of Computer Science University of Joensuu,
Chapter 14: SEGMENTATION BY CLUSTERING 1. 2 Outline Introduction Human Vision & Gestalt Properties Applications – Background Subtraction – Shot Boundary.
Greedy Approximation Algorithms for finding Dense Components in a Graph Paper by Moses Charikar Presentation by Paul Horn.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Module 5 – Networks and Decision Mathematics Chapter 23 – Undirected Graphs.
Unraveling condition specific gene transcriptional regulatory networks in Saccharomyces cerevisiae Speaker: Chunhui Cai.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Top X interactions of PIN Network A interactions Coverage of Network A Figure S1 - Network A interactions are distributed evenly across the top 60,000.
CSCE555 Bioinformatics Lecture 18 Network Biology: Comparison of Networks Across Species Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu.
Journal Club Meeting Sept 13, 2010 Tejaswini Narayanan.
Understanding Network Concepts in Modules Dong J, Horvath S (2007) BMC Systems Biology 2007, 1:24.
CSCE555 Bioinformatics Lecture 23 Integrative Genomics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Clustering Microarray Data based on Density and Shared Nearest Neighbor Measure CATA’06, March 23-25, 2006 Seattle, WA, USA Ranapratap Syamala, Taufik.
Network applications Sushmita Roy BMI/CS 576 Dec 9 th, 2014.
6/11/20161 Graph models and efficient exact algorithms in studying cancer signaling pathways Songjian Lu, Lujia Chen, Chunhui Cai Department of Biomedical.
1 Survey of Biodata Analysis from a Data Mining Perspective Peter Bajcsy Jiawei Han Lei Liu Jiong Yang.
A SENSITIVITY ANALYSIS OF A BIOLOGICAL MODULE DISCOVERY PIPELINE James Long International Arctic Research Center University of Alaska Fairbanks March 25,
Subgraph Search Over Uncertain Graphs Erşan Demircioğlu.
Clustering [Idea only, Chapter 10.1, 10.2, 10.4].
Clustering Gene Expression Data BMI/CS 776 Mark Craven April 2002.
Graph clustering to detect network modules
Cohesive Subgraph Computation over Large Graphs
Finding Dense and Connected Subgraphs in Dual Networks
Data Mining: Concepts and Techniques
Outline Introduction State-of-the-art solutions
CSCI2950-C Lecture 12 Networks
Cluster Analysis II 10/03/2012.
Mining in Graphs and Complex Structures
Divide-and-Conquer MST
Data Mining K-means Algorithm
Community detection in graphs
RankClus: Integrating Clustering with Ranking for Heterogeneous Information Network Analysis Yizhou Sun, Jiawei Han, Peixiang Zhao, Zhijun Yin, Hong Cheng,
GPX: Interactive Exploration of Time-series Microarray Data
Spectral methods for Global Network Alignment
SEG5010 Presentation Zhou Lanjun.
Anastasia Baryshnikova  Cell Systems 
Volume 3, Issue 1, Pages (July 2016)
Bioinformatics, Vol.17 Suppl.1 (ISMB 2001) Weekly Lab. Seminar
Pei Lee, ICDE 2014, Chicago, IL, USA
Distance-Constraint Reachability Computation in Uncertain Graphs
Presentation transcript:

Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891

Motivation: Finding patterns across multiple networks, to identify biological modules, and function prediction Current algorithms are too costly Developed a novel algorithm: CODENSE – Scalable in number and size – Adjustable based on the exact or approximate pattern mining

Clustering can detect meaningful biological modules – e.g. a dense protein interaction sub-network may correspond to a protein complex – Dense co-expression sub-network may represent a co- expression cluster Biological modules are expected to be active across multiple conditions One idea: aggregate all the networks and identify dense sub-graphs in the aggregated network – Risk of false positive detection

Aggregated graph: False positive in the aggregated graph Adding six graphs together, and deleting the edges that occur less than 3 times  resulting summary graph

Solution to the false-positive summary-graph Frequent sub-graphs Mine the dense sub-graphs directly in each original network A sub-graph is frequent if it occurs in multiple times in a set of graphs In biological networks, each gene occur only once in a graph  no isomorphism problem

Frequent dense sub-grpah A frequent dense sub-graph doesn’t show accurate information – Some edges in the frequent sub-graph shown above do not occur in the original set – It is more meaningful to divide this to two sub-graphs

Coherent Dense Sub-graphs All edges in a coherent sub-graphs should have correlated occurrences in the original graph set CODENSE divides the networks into 2 meta- graphs and perform clustering on these two graphs only (instead of individual networks) – CODENSE can distinguish the two modules – Good scalability – Discovery of overlapping clusters

Overlapping Sub-graphs Partition-based clustering algorithms fail to identify overlapping sub-graphs Mining Overlapping Dense Sub- graphs (MODES)

Application Identify frequent co-expression clusters across multiple microarray datasets Microarray dataset: – Un-weighted, undirected graph – Each gene represents a node – Two genes are connected by an edge if they show high expression correlation A densely connected sub-graph  tight co-expression cluster Clusters from a single microarray dataset include spurious links, and may not be homogenous in function and regulation

Problem Formulation A relation graph contains n simple graphs, such as – A common vertex set V is shared by the graphs Support(G): the numbers of graphs in a relation graph dataset ( D ) A graph is frequent if support( G ) > threshold Summary graph: is an un-weighted graph extracted from D, where an edge exists only if it occurs in more than k graphs in D

Problem Formulation Edge Support Vector: is the weight of edge e in graph i (for an un-weighted graph it would be 0 or 1)

Second-Order Graph: where each node represents an edge from the relation graph dataset ( D ) and an edge between nodes u and v exists if w(u) and w(v) are highly correlated For efficiency, only construct the S graph for a sub-graph of the summary graph

Coherent Graph: a sub-graph extracted from the summary graph is coherent if – All its edges have support > k – Its second-order graph is dense Graph Density: m: number of edges n: n umber of nodes

Two facts: If a frequent sub-graph is dense, then it must be dense in the summary graph as well, but the reverse way doesn’t hold true always If a sub-graph is coherent (its edges have high correlation across the dataset), then its second-order sub-graph is dense

Aggregate the graphs into a summary graph Eliminate infrequent edges

MODES: Mining Overlapping DEnse Subgraphs Developed based on HCS: Highly Connected Sub- graphs Can efficiently identify dense sub-graphs Can mine overlapping sub-graphs Two approaches: – Minimum cut – Normalized cut (Shi, Malik 2000) Apply the normalized cut in the initial steps of HCS algorothm, then if the size of partitions is small proceed with minimum cut

C

CODENSE analysis Simplify the identification of coherent dense sub- graphs across n graphs into mining in two special graphs: summary graph + second-order graph Can mine network modules Can mine both exact and approximate patterns (by modifying the similarity threshold) Can be extended to weighted graph ( using Pearson correlation instead of Euclidean distance )

Experimental Study: co-expression network 39 yeast microarray datasets 6661 genes Calculate the Pearson correlation between the expression levels (r)  Construct the relation graph, (connectivity of two genes determined by the Pearson correlation) n: number of measurements

Create the summary graph, while removing edges that occur less than 6 times across 39 graphs Apply MODES to identify dense sub-grahs: sub( ) with cutoff density d1 For each sub( ), construct the second-order graph S Apply MODES to S to identify sub-grpahs with density > d2 Transform the edges  vertices, and apply MODES again to identify the dense sub-graphs with density > d3

Functional Module Discovery: MODES vs CODENSE A cluster is considered functionally homogenous if: 1.The functional homogeneity modeled by hypergeometric distribution shall be significant at α= At least 40% of its memebr genes belong to a specific G.O. functional category MODES identified 366 clusters, but only 151 were functionally homogenous (42%) CODENSE identified 770 clusters, which 76% of those were homogenous Improvement is due to second-order graph by eliminating edges which do not show co-occurrence across all networks

Example of MODES false positive: MODES identified 5 genes: MSF1, PHB1, CBP4, NDI1, SCO2 which are not functionally homogenous Protein biosynthesis replicative cell aging mitochondrial electron transfer

Functional prediction: CODENSE identified this 6-nodes sub-graph 5 genes belong to “protein biosynthesis” category Predict: ASC1 must be involved in protein biosynthesis as well Test with 448 known genes: 50% accuracy