Melbourne, Australia, Oct., 2015 gSparsify: Graph Motif Based Sparsification for Graph Clustering Peixiang Zhao Department of Computer Science Florida.

Slides:



Advertisements
Similar presentations
1 Weiren Yu 1,2, Xuemin Lin 1, Wenjie Zhang 1 1 University of New South Wales 2 NICTA, Australia Towards Efficient SimRank Computation over Large Networks.
Advertisements

Sparsification and Sampling of Networks for Collective Classification
Social network partition Presenter: Xiaofei Cao Partick Berg.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
An Efficient Algorithm Based on Self-adapted Fuzzy C-Means Clustering for Detecting Communities in Complex Networks Jianzhi Jin 1, Yuhua Liu 1, Kaihua.
Connected Substructure Similarity Search Haichuan Shang The University of New South Wales & NICTA, Australia Joint Work: Xuemin Lin (The University of.
Efficient Cohesive Subgraph Detection in Parallel
Identifying Image Spam Authorship with a Variable Bin-width Histogram-based Projective Clustering Song Gao, Chengcui Zhang, Wei Bang Chen Department of.
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
1 Modularity and Community Structure in Networks* Final project *Based on a paper by M.E.J Newman in PNAS 2006.
1 Local Sparsification for Scalable Module Identification in Networks Srinivasan Parthasarathy Joint work with V. Satuluri, Y. Ruan, D. Fuhry, Y. Zhang.
Parallel Subgraph Listing in a Large-Scale Graph Yingxia Shao  Bin Cui  Lei Chen  Lin Ma  Junjie Yao  Ning Xu   School of EECS, Peking University.
IMapReduce: A Distributed Computing Framework for Iterative Computation Yanfeng Zhang, Northeastern University, China Qixin Gao, Northeastern University,
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Structure discovery in PPI networks using pattern-based network decomposition Philip Bachman and Ying Liu BIOINFORMATICS System biology Vol.25 no
Comparing path-based and vertically-partitioned RDF databases Preetha Lakshmi & Chris Mueller 12/10/2007 CSCI 8715 Shashi Shekhar.
Comparing path-based and vertically-partitioned RDF databases Preetha Lakshmi & Chris Mueller 12/10/2007 CSCI 8715 Shashi Shekhar.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
The Union-Split Algorithm and Cluster-Based Anonymization of Social Networks Brian Thompson Danfeng Yao Rutgers University Dept. of Computer Science Piscataway,
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz University of Maryland MLG, January, 2014 Jaehwan Lee.
Graph Indexing: A Frequent Structure­ based Approach Authors:Xifeng Yan†, Philip S‡. Yu, Jiawei Han†
 H cr(H ) Applied Mathematics Operations Research Simulation Science Computer Science.
How Significant Is the Effect of Faults Interaction on Coverage Based Fault Localizations? Xiaozhen Xue Advanced Empirical Software Testing Group Department.
Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Institute for System Programming of RAS.
Gwangju Institute of Science and Technology Intelligent Design and Graphics Laboratory Multi-scale tensor voting for feature extraction from unstructured.
Tennessee Technological University1 The Scientific Importance of Big Data Xia Li Tennessee Technological University.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Analysis of Constrained Time-Series Similarity Measures
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.
Spanning Trees Introduction to Spanning Trees AQR MRS. BANKS Original Source: Prof. Roger Crawfis from Ohio State University.
1 CSE 980: Data Mining Lecture 17: Density-based and Other Clustering Algorithms.
Protecting Sensitive Labels in Social Network Data Anonymization.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Triangular Mesh Decimation
Efficient Route Computation on Road Networks Based on Hierarchical Communities Qing Song, Xiaofan Wang Department of Automation, Shanghai Jiao Tong University,
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Manoranjan.
2015/12/121 Extracting Key Terms From Noisy and Multi-theme Documents Maria Grineva, Maxim Grinev and Dmitry Lizorkin Proceeding of the 18th International.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
Kijung Shin Jinhong Jung Lee Sael U Kang
Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.
Robust Local Community Detection: On Free Rider Effect and Its Elimination 1 Case Western Reserve University Yubao Wu 1, Ruoming Jin 2, Jing Li 1, Xiang.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Computer Science and Engineering PhD in Computer Science Monday, November 07, :00 a.m. – 11:00 a.m. Swearingen Conference Room 3A75 Network Based.
Graph Indexing From managing and mining graph data.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
ItemBased Collaborative Filtering Recommendation Algorithms 1.
Xifeng Yan Philip S. Yu Jiawei Han SIGMOD 2005 Substructure Similarity Search in Graph Databases.
Outline Introduction State-of-the-art solutions Equi-Truss Experiments
Cohesive Subgraph Computation over Large Graphs
Outline Introduction State-of-the-art solutions
Privacy Preserving Subgraph Matching on Large Graphs in Cloud
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
Distributed Representations of Subgraphs
University of Wisconsin-Madison
SEG5010 Presentation Zhou Lanjun.
Big Data Analytics: Exploring Graphs with Optimized SQL Queries
CSE572: Data Mining by H. Liu
Presentation transcript:

Melbourne, Australia, Oct., 2015 gSparsify: Graph Motif Based Sparsification for Graph Clustering Peixiang Zhao Department of Computer Science Florida State University

/ 18 Synopsis Introduction gSparsify: Graph motif based sparsification – Cluster significance – Path-based indexing and computation Experiments Conclusions 1

/ 18 Introduction Graphs: – A generic model and ubiquitous abstraction for correlated/inter-connected data – Examples: social networks, bioinformatics, business intelligence, scientific computation, and the Web Graph Clusterings: – Partition vertices of a graph into a series of clusters with an objective to optimizing Intra-cluster density Inter-cluster sparsity – Applications: community detection, visualization, ranking, and search 2

/ 18 Challenges and Graph Sparsification Solutions Existing Challenges 1.Real-world graphs are massive in scale Many graph clustering solutions are hard to scale in large graphs 2.Real-world graphs are “dirty” There exist many extremely tangled, noisy edges that easily obfuscate intrinsic cluster properties of graphs Graph sparsification – Simplify (Reduce) the input graph G (V, E) into another graph G’(V, E’) where |E’| << |E| Noisy edges eliminated while crucial structures of graphs well preserved 3

/ 18 Sparsification Based Graph Clustering 4 Graph Sparsification Graph Clustering Algorithm A Graph Clusters C Graph Clusters C “ Verification More Efficient!

/ 18 Wait. Technical Questions Arise Here Graph Sparsification for graph clustering 1.How can we differentiate “significant” edges from “insignificant” ones? 2.How to quantify and compute such “edge importance” efficiently? 3.How to sparsify the graph? 4.Can the resultant spasified graph G’ still preserve the clustering properties (and to what extent) of the original graph G? 5

/ 18 gSparsify Goal – Sparsify G in a way that cluster-significant edges are retained, while edges with little or no clustering insight are filtered Ideas – Structure-aware graph motif based cluster significance – Path-based indexing for short-length cycle motif enumeration Results – An effective preprocessing step for existing graph clustering techniques – Significant speedup with no comprise for clustering quality 6

/ 18 A Motivating Example 7 G with the hair-ball structure |V|=34, |E|=127 Sparsified G’ with four core clusters revealed |V|=34, |E|= 48 gSparsify

/ 18 Graph Motifs: What and Why Graph Motifs – Small, connected graphs encoding local graph structures – Elementary features representing key structure-aware functionalities of graphs 8

/ 18 Graph Motifs: What and Why Evidence: Clusters are oftentimes dense subgraphs involving many small-size graph motifs like cycles 1.An intra-cluster edge is more likely to be located within closed motifs (cycles) than inter-cluster edges 2.Cycles are simplest position-insensitive motifs, and thus easier to be enumerated and quantified 3.Many complex motifs are simply composed by cycles We use cycle motifs to quantify the “significance” of edges in terms of graph clustering 9

/ 18 Cluster Significance We quantify the cluster significance of an edge e in terms of basic cycle motifs 1.Count-based significance 2.(Normalized) Ratio-based significance – For l ≤ l 0, we aggregate cluster significance scores of e in order to quantify how often e is involved in a series of cycle motifs The higher the cluster significance scores of e, the more likely e is an intra-cluster edge! 10 The number of cycles of length l encompassing e The number of paths of length l penetrating e

/ 18 Cluster Significance: An Example 11

/ 18 Cluster Significance: How to Compute 12

/ 18 Cluster Significance: How to Compute 13 Three cycles of length 4 encompassing (u, v) Seven cycles of length 5 encompassing (u, v)

/ 18 gSparsify: The Algorithm 14

/ 18 Experiments Datasets – Yeast PPI network, DBLP, Orkut Graph Clustering Methods – METIS, Graclus, MCL Evaluation Metric 1.Sparsification ratio 2.Clustering quality (F-score, graph conductance) 3.Speedup for graph clustering In comparison with L-Spar – Satuluri etc. in SIGMOD’11 (triangle motif with MinHash) 15

/ 18 Experimental Results 16

/ 18 Experimental Results 17

/ 18 Conclusions Graph sparsification – Identify and preferentially retain cluster significant edges from a graph G into a sparsified graph G’ Graph motif based cluster significance – Short-length cycles to quantify structure significance – Path based indexing and join to facilitate the computation Future directions 1.More efficient graph motif enumeration methods 2.More complicated graph motifs 3.Sparsification for other graph computational tasks 18

/ 18 Thank you! Q & A