Sharon Bruckner, Bastian Kayser, Tim Conrad Freie Uni. Berlin Finding Modules in Networks with Non-modular Regions.

Slides:



Advertisements
Similar presentations
Class 12: Communities Network Science: Communities Dr. Baruch Barzel.
Advertisements

Social network partition Presenter: Xiaofei Cao Partick Berg.
ICDE 2014 LinkSCAN*: Overlapping Community Detection Using the Link-Space Transformation Sungsu Lim †, Seungwoo Ryu ‡, Sejeong Kwon§, Kyomin Jung ¶, and.
The multi-layered organization of information in living systems
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
Information Networks Graph Clustering Lecture 14.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Hierarchical Decompositions for Congestion Minimization in Networks Harald Räcke 1.
SLAW: A Mobility Model for Human Walks Lee et al..
1 Modularity and Community Structure in Networks* Final project *Based on a paper by M.E.J Newman in PNAS 2006.
V4 Matrix algorithms and graph partitioning
Decomposition of overlapping protein complexes: A graph theoretical method for analyzing static and dynamic protein associations Algorithms for Molecular.
Lecture 21: Spectral Clustering
Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
SCAN: A Structural Clustering Algorithm for Networks
Cluster Analysis.
A scalable multilevel algorithm for community structure detection
1 Modularity and Community Structure in Networks* Final project *Based on a paper by M.E.J Newman in PNAS 2006.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Algorithms for Self-Organization and Adaptive Service Placement in Dynamic Distributed Systems Artur Andrzejak, Sven Graupner,Vadim Kotov, Holger Trinks.
The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.
Models of Influence in Online Social Networks
Exploiting indirect neighbors and topological weight to predict protein function from protein– protein interactions Hon Nian Chua, Wing-Kin Sung and Limsoon.
Graph clustering Jin Chen CSE Fall 2012 MSU 1.
Department of Engineering Science Department of Zoology Soft partitioning in networks via Bayesian nonnegative matrix factorization Ioannis Psorakis, Steve.
Graph mining in bioinformatics Laur Tooming. Graphs in biology Graphs are often used in bioinformatics for describing processes in the cell Vertices are.
Community Detection by Modularity Optimization Jooyoung Lee
Liang Ge.  Introduction  Important Concepts in MCL Algorithm  MCL Algorithm  The Features of MCL Algorithm  Summary.
Models and Algorithms for Complex Networks Graph Clustering and Network Communities.
Clustering Spatial Data Using Random Walks Author : David Harel Yehuda Koren Graduate : Chien-Ming Hsiao.
Clustering Spatial Data Using Random Walk David Harel and Yehuda Koren KDD 2001.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Using Graph Theory to Study Neural Networks (Watrous, Tandon, Conner, Pieters & Ekstrom, 2012)
Markov Cluster (MCL) algorithm Stijn van Dongen.
Diffusion in Disordered Media Nicholas Senno PHYS /12/2013.
Clustering.
Understanding Network Concepts in Modules Dong J, Horvath S (2007) BMC Systems Biology 2007, 1:24.
Ch. Eick: Introduction to Hierarchical Clustering and DBSCAN 1 Remaining Lectures in Advanced Clustering and Outlier Detection 2.Advanced Classification.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
University at BuffaloThe State University of New York Detecting Community Structure in Networks.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Informatics tools in network science
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Topics In Social Computing (67810) Module 1 (Structure) Centrality Measures, Graph Clustering Random Walks on Graphs.
Graph clustering to detect network modules
Random Walk for Similarity Testing in Complex Networks
Cohesive Subgraph Computation over Large Graphs
Hierarchical Agglomerative Clustering on graphs
Data Mining: Basic Cluster Analysis
More on Clustering in COSC 4335
by Hyunwoo Park and Kichun Lee Knowledge-Based Systems 60 (2014) 58–72
Greedy Algorithm for Community Detection
CS6220: Data Mining Techniques
Community detection in graphs
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
Peer-to-Peer and Social Networks
Jianping Fan Dept of Computer Science UNC-Charlotte
Graph Clustering based on Random Walk
Noémi Gaskó, Rodica Ioana Lung, Mihai Alexandru Suciu
3.3 Network-Centric Community Detection
Global mean-first-passage time of random walks on Vicsek fractals
Analysis of Clustering Algorithms
“Traditional” image segmentation
Presentation transcript:

Sharon Bruckner, Bastian Kayser, Tim Conrad Freie Uni. Berlin Finding Modules in Networks with Non-modular Regions

What are networks with non-modular regions? Modular region Transition region

Where do NCC networks occur? 1. The network is actually full partitionable, but contains noise. 2. The network structure is not strictly modular:Nodes 1. Overlaps 2. Outliers 3. Paths and nodes connecting modules 3. Example: A protein-protein interaction network

Formalizing this notion of modularity Then: What‘s wrong with simply taking Newman Modularity? Answer: Trees and very sparse graphs have high Newman Modularity 0.75 Goal: A score that quantifies how modular an NCC network is, analogous to Newman‘s Modularity. Goal: A score that quantifies how modular an NCC network is, analogous to Newman‘s Modularity.

Transition matrices and gaps In a discrete random walk on a network, the random walker moves at each step to randomly chosen neighbor. This is encoded in the transition matrix. Clustering algorithms often rely on the gap in the eigenvalue spectrum of the transition matrix. The presence of eigenvalues close to 1, followed by a gap, indicate the presence of modules. For NCC networks, this is not enough. Therefore we introduce a transition matrix Pt, that comes from a continuous random walk.

Transition matrices and gaps

First try: the gap score Exploit the connection between the number of eigenvalues close to 1 of the transition matrix and the number of modules.

Gap score: Analysis and results Main Advantage: This is a global score, not dependent on a partition

Second try: metastability score

Metastability score: Analysis Main Advantage: Explicitly takes into account the transition region, more suited for NCC networks.

Experiment results on the two scores For networks with two modules of size 100 with increasing density and a transition region of 1000 nodes.

Intermediate Conclusion Both scores have some major disadvantages Currently developing an improved score: Takes the transition region into account is provably „good“, at least on some well-defined network classes. dependent on the partition (so, not global), but the partition can be optimized.

Algorithms for identifying modules in NCC networks We compare the behavior of 3 algorithms on a benchmark dataset of NCC networks: MSM: The Markov State Model clustering algorithm first identifies and removes the transition region, and then deterministically clusters the remaining nodes. SCAN: Clusters nodes together based on neighborhood similarity and reachability. Assigns nodes the roles of hubs, outliers. MCL: The Markov Clustering algorithm simulates random walks on the network and identifies modules as regions where the random walker stays for a long time. Returns a full partition.

Adjusting the algorithms For SCAN, outliers and hubs are additionally assigned to the transition region. Main Adjustment: Nodes in modules under a threshold 1% of nodes are assigned to the transition region. Main Adjustment: Nodes in modules under a threshold 1% of nodes are assigned to the transition region.

How can we evaluate the results?

Experiment 1: varying sizes MSM performs best Metastability score behaves similiarly to algorithm score

Experiment 2: varying densities of modules and transition region

Experiment 3: A PPI network We identified modules in the yeast FYI network. Our modules correspond to known protein complexes from the CYC2008 database. We are working on assigning roles to the nodes in the transition region.

Summary and Outlook Clustering networks such that not all nodes are assigned to modules is useful. We presented two scores to quantify how modular a network is, and showed that there is room for improvement. We compared the performance of 3 algorithms on the task of identifying modules in NCC networks.