Shuai Ma, Yang Cao, Wenfei Fan, Jinpeng Huai, Tianyu Wo Capturing Topology in Graph Pattern Matching University of Edinburgh.

Slides:



Advertisements
Similar presentations
Practical Searches for Stability in iBGP
Advertisements

CSE 211 Discrete Mathematics
Bounded Conjunctive Queries Yang Cao 1,2, Wenfei Fan 1,2, Tianyu Wo 2, Wenyuan Yu 3 1 University of Edinburgh, 2 Beihang University, 3 Facebook Inc.
2012: J Paul GibsonT&MSP: Mathematical FoundationsMAT7003/L2-GraphsAndTrees.1 MAT 7003 : Mathematical Foundations (for Software Engineering) J Paul Gibson,
Yinghui Wu, LFCS DB talk Database Group Meeting Talk Yinghui Wu 10/11/ Simulation Revised for Graph Pattern Matching.
www.brainybetty.com1 MAVisto A tool for the exploration of network motifs By Guo Chuan & Shi Jiayi.
New Models for Graph Pattern Matching Shuai Ma ( 马 帅 )
The IEEE International Conference on Big Data 2013 Arash Fard M. Usman Nisar Lakshmish Ramaswamy John A. Miller Matthew Saltz Computer Science Department.
The Big Picture Chapter 3. We want to examine a given computational problem and see how difficult it is. Then we need to compare problems Problems appear.
1 NP-completeness Lecture 2: Jan P The class of problems that can be solved in polynomial time. e.g. gcd, shortest path, prime, etc. There are many.
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
Xyleme A Dynamic Warehouse for XML Data of the Web.
International Workshop on Computer Vision - Institute for Studies in Theoretical Physics and Mathematics, April , Tehran 1 IV COMPUTING SIZE.
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
The Theory of NP-Completeness
1 Brief Announcement: Distributed Broadcasting and Mapping Protocols in Directed Anonymous Networks Michael Langberg: Open University of Israel Moshe Schwartz:
Minimum Maximum Degree Publish-Subscribe Overlay Network Design Melih Onus TOBB Ekonomi ve Teknoloji Üniversitesi, 28 Mayıs 2009.
1 QSX: Querying Social Graphs Graph Pattern Matching Graph pattern matching via subgraph isomorphism Graph pattern matching via graph simulation Revisions.
Application of Graph Theory to OO Software Engineering Alexander Chatzigeorgiou, Nikolaos Tsantalis, George Stephanides Department of Applied Informatics.
Discrete Mathematics Lecture 9 Alexander Bukharovich New York University.
Yinghui Wu LFCS Lab Lunch Homomorphism and Simulation Revised for Graph Matching.
Making Pattern Queries Bounded in Big Graphs 11 Yang Cao 1,2 Wenfei Fan 1,2 Jinpeng Huai 2 Ruizhe Huang 1 1 University of Edinburgh 2 Beihang University.
Querying Big Graphs within Bounded Resources 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.
Yinghui Wu, SIGMOD 2012 Query Preserving Graph Compression Wenfei Fan 1,2 Jianzhong Li 2 Xin Wang 1 Yinghui Wu 1,3 1 University of Edinburgh 2 Harbin Institute.
Graph Indexing: A Frequent Structure­ based Approach Authors:Xifeng Yan†, Philip S‡. Yu, Jiawei Han†
Virtual Network Mapping: A Graph Pattern Matching Approach Yang Cao 1,2, Wenfei Fan 1,2, Shuai Ma University of Edinburgh 2 Beihang University.
May 5, 2015Applied Discrete Mathematics Week 13: Boolean Algebra 1 Dijkstra’s Algorithm procedure Dijkstra(G: weighted connected simple graph with vertices.
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
Network Aware Resource Allocation in Distributed Clouds.
Research Directions for Big Data Graph Analytics John A. Miller, Lakshmish Ramaswamy, Krys J. Kochut and Arash Fard Department of Computer Science University.
Querying Structured Text in an XML Database By Xuemei Luo.
Diversified Top-k Graph Pattern Matching 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.
RESOURCES, TRADE-OFFS, AND LIMITATIONS Group 5 8/27/2014.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.
Lecture 17 Trees CSCI – 1900 Mathematics for Computer Science Fall 2014 Bill Pine.
System Support for Managing Graphs in the Cloud Sameh Elnikety & Yuxiong He Microsoft Research.
Association Rules with Graph Patterns Yinghui Wu Washington State University Wenfei Fan Jingbo Xu University of Edinburgh Southwest Jiaotong University.
Graph pattern matching Graph GGraph Qisomorphism f(a) = 1 f(b) = 6 f(c) = 8 f(d) = 3 f(g) = 5 f(h) = 2 f(i) = 4 f(j) = 7.
Shuai Ma Graph Search & Social Networks. 2 Graphs are everywhere, and quite a few are huge graphs!
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Yinghui Wu, ICDE Adding Regular Expressions to Graph Reachability and Pattern Queries Wenfei Fan Shuai Ma Nan Tang Yinghui Wu University of Edinburgh.
University at BuffaloThe State University of New York Lei Shi Department of Computer Science and Engineering State University of New York at Buffalo Frequent.
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang.
Distributed Graph Simulation: Impossibility and Possibility 1 Yinghui Wu Washington State University Wenfei Fan University of Edinburgh Southwest Jiaotong.
Lecture 6 NP Class. P = ? NP = ? PSPACE They are central problems in computational complexity.
Computer Science and Engineering TreeSpan Efficiently Computing Similarity All-Matching Gaoping Zhu #, Xuemin Lin #, Ke Zhu #, Wenjie Zhang #, Jeffrey.
Mining Top-K Large Structural Patterns in a Massive Network Feida Zhu 1, Qiang Qu 2, David Lo 1, Xifeng Yan 3, Jiawei Han 4, and Philip S. Yu 5 1 Singapore.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
Research Directions for Big Data Graph Analytics John A. Miller, Lakshmish Ramaswamy, Krys J. Kochut and Arash Fard.
Chapter 10: Trees A tree is a connected simple undirected graph with no simple circuits. Properties: There is a unique simple path between any 2 of its.
Lecture 25 NP Class. P = ? NP = ? PSPACE They are central problems in computational complexity.
Outline  Introduction  Subgraph Pattern Matching  Types of Subgraph Pattern Matching  Models of Computation  Distributed Algorithms  Performance.
Yinghui Wu, SIGMOD Incremental Graph Pattern Matching Wenfei Fan Xin Wang Yinghui Wu University of Edinburgh Jianzhong Li Jizhou Luo Harbin Institute.
Binary Decision Diagrams Prof. Shobha Vasudevan ECE, UIUC ECE 462.
Lecture 20. Graphs and network models 1. Recap Binary search tree is a special binary tree which is designed to make the search of elements or keys in.
Cohesive Subgraph Computation over Large Graphs
Answering pattern queries using views
Michael Langberg: Open University of Israel
CPT-S 415 Big Data Yinghui Wu EME B45 1.
RE-Tree: An Efficient Index Structure for Regular Expressions
Probabilistic Data Management
Bart M. P. Jansen June 3rd 2016, Algorithms for Optimization Problems
Simulation based approach Shang Zechao
Efficient Subgraph Similarity All-Matching
Scale-Space Representation for Matching of 3D Models
Jongik Kim1, Dong-Hoon Choi2, and Chen Li3
Approximate Graph Mining with Label Costs
Presentation transcript:

Shuai Ma, Yang Cao, Wenfei Fan, Jinpeng Huai, Tianyu Wo Capturing Topology in Graph Pattern Matching University of Edinburgh

2 File systems DatabasesWorld Wide Web Graph searching is a key to social searching engines! Graphs are everywhere, and quite a few are huge graphs! Social Networks

Graph Pattern Matching Given two directed graphs G1 (pattern graph) and G2 (data graph), –decide whether G1 “matches” G2 (Boolean queries); –identify “subgraphs” of G2 that match G1 Applications –Web mirror detection/ Web site classification –Complex object identification –Software plagiarism detection –Social network/biology analyses –… Matching Models –Traditional: Subgraph Isomorphism –Emerging applications: Graph Simulation and its extensions, etc.. 3 A variety of emerging real-life applications!

Subgraph Isomorphism Pattern graph Q, subgraph G s of data graph G Q matches G s if there exists a bijective function f: V Q → V Gs such that –for each node u in Q, u and f(u) have the same label –An edge (u, u‘) in Q if and only if (f(u), f(u')) is an edge in G s Goodness: Badness: 4 These hinder the usability in emerging applications, e.g., social networks Keep exact structure topology between Q and G s May return exponential many matched subgraphs Decision problem is NP-complete In certain scenarios, too restrictive to find matches

Graph Simulation Given pattern graph Q(Vq, Eq) and data graph G(V, E), a binary relation R ⊆ Vq × V is said to be a match if – (1) for each (u, v) ∈ R, u and v have the same label; and – (2) for each edge (u, u′) ∈ Eq, there exists an edge (v, v′) in E such that (u′, v′) ∈ R. Graph G matches pattern Q via graph simulation, if there exists a total match relation M – for each u ∈ Vq, there exists v ∈ V such that (u, v) ∈ M. Goodness: Badness: 5 Subgraph isomorphism (NP-complete) vs. graph simulation (O(n 2 ))! Quadratic time solvable Return a single unique matched subgraph Lose structure topology (how much? open question)

Graph Simulation 6 Subgraph Isomorphism is too strict for emerging applications! Set up a team to develop a new software product Graph simulation returns F 3, F 4 and F 5; Subgraph isomorphism returns empty! Graph simulation returns F 3, F 4 and F 5; Subgraph isomorphism returns empty!

Graph Simulation Loses Structures 7 Connected pattern graphs match disconnected subgraphs Cyclic pattern graphs match tree subgraphs S(HR) = {HR} S(SE) = {SE} S(Bio) = {Bio 1, Bio 2 } Q Gs S(HR) = {HR} S(SE) = {SE} S(Bio) = {Bio 1, Bio 2 } QGs These motivate us to propose a new matching model!

Strong Simulation: A New Model 8 Strong Simulation = Graph Simulation + Duality + Locality Striking a balance between expressiveness and complexity! Duality (dual simulation) –Both child and parent relationships –Simulation considers only child relationships Locality –Restricting matches within a ball –When social distance increases, the closeness of relationships decreases and the relationships may become irrelevant The semantics of strong simulation is well defined –The matching results are unique

Duality and Locality 9 Pattern graph Q matches data graph G via dual simulation if there exists a binary match relation S ⊆ V Q × V such that: –for each (u, v) ∈ S, u and v have the same label; and –for each u ∈ V q, there exits v ∈ V such that (u, v) ∈ S; and for each edge (u, u1) in E q, there is an edge (v, v1 ) in E with (u1; v1) ∈ S ; for each edge (u2, u) in E q, there is an edge (v2; v) in E with (u2, v2) ∈ S. -> Child relationships -> Parent relationships Graph Simulation The matched subgraph must be a connected subgraph –falling into a ball with center v and radius d Q (diameter of Q) –containing the ball center v Dual simulation: bring duality intro graph simulation! Strong simulation: bring locality intro dual simulation!

Strong simulation preserves more topology structures! Properties of Strong Simulation 10 Connectivity: Connected pattern graphs match disconnected subgraphs S(HR) = {HR} S(SE) = {SE} S(Bio) = {Bio 1, Bio 2 } Q Gs Cycles: Cyclic pattern graphs match tree subgraphs QGs S(HR) = {HR} S(SE) = {SE} S(Bio) = {Bio 1, Bio 2 } QGs × ×

Properties of Strong Simulation 11 Locality: the diameter of matched subgraphs is bounded by 2*d Q Strong simulation preserves more topology structures! Bounded matches: The number of matched subgraphs is bounded by |V| Graph simulation finds at most one matched subgraph Subgraph isomorphism may finds exponential number of matched subgraphs Graph simulation does NOT have this property Subgraph isomorphism does have this property Bounded cycles: The length of cycles in matched subgraphs is bounded by the ones in pattern graphs. Graph simulation and strong simulation does NOT have this property Subgraph isomorphism does have this property!

Properties of Strong Simulation 12 A balance between expressiveness and complexity!

Properties of Strong Simulation 13 If Q matches G, via subgraph isomorphism, then Q matches G, via strong simulation If Q matches G, via strong simulation, then Q matches G, via dual simulation If Q matches G, via dual simulation, then Q matches G, via graph simulation Subgraph Isomorphism Strong Simulation Dual Simulation Graph Simulation

Algorithms for Strong Simulation 14 A cubic time algorithm –Graph simulation: Quadratic time –Subgraph isomorphism: NP-Complete A distributed algorithm –Using the data locality property –Real life graphs are typically distributed Q matches G, via dual simulation for any connected component G c of the match graph w.r.t. the maximum match relation of Q and G, – Q matches G c, – G c is exactly the match graph w.r.t. the maximum match relation of Q and G c Connectivity theorem: Nontrivial extension of the algorithm for graph simulation!

Optimization Techniques 15 Minimizing pattern graphs (Q ≡ Q m ): An quadratic time algorithm Given pattern graph Q, we compute a minimized equivalent pattern graph Q m such that for any data graph G, G matches Q iff G matches Q m, via strong simulation. Dual simulation filtering Connectivity pruning –First compute the matched subgraph of dual simulation, –Then project on each ball of the data graph –Based on the connectivity theorem

Experimental Study 16 Real life d atasets: Amazon product co-buy network: 548,552 nodes and 1,788,725 edges YouTube video network: 155,513 nodes and 3,110,120 edges Machines: PC machines with Intel Core i7 860 CPUs and 16GB memory Synthetic graph generator: (10 7 nodes and 251,188,643 edges) Three parameters: 1. The number n of nodes; 2. The number n α of edges; and 3. The number l of node labels Algorithms: Strong simulation algorithm Match and its optimized version Match + Graph simulation algorithm Sim [HHK, FOCS 95] Approximate matching algorithm TALE [TP, ICDE 08] Maximum common subgraph algorithm VF2 [CN, 2006]

Experiments - Quality 17 The results of strong simulation are more realistic!

Experiments - Quality 18 The results of strong simulation are more compact!

Experiments - Quality 19 70%-80% found by subgraph isomorphism are retrieved by strong simulation Up to 50% found by subgraph isomorphism are retrieved by graph simulation

Experiments - Quality 20 Strong simulation effectively reduces the number of match results!

Experiments - Quality 21 The sizes of matched subgraphs are small! Graph simulation (returns a single graph) –Amazon: 103 –YouTube: 177 –Synthetic: 311 Pattern graphs have 10 nodes

Experiments - Efficiency Our algorithms scale well; 2. Optimization techniques are effective (reduce about 1/3 time); 3. The time gap between Sim and Match is tolerable, considering the matching quality that Match improves. 1. Our algorithms scale well; 2. Optimization techniques are effective (reduce about 1/3 time); 3. The time gap between Sim and Match is tolerable, considering the matching quality that Match improves.

Summary 23 A new matching model with a balance between complexity and expressiveness We identify a set of criteria for topology preservation, and show that strong simulation preserves the topology of pattern graphs and data graphs. –Children, Parent, Connectivity, Cycles, Bounded matches –Bounded cycles, Bisimilarity We identify a set of criteria for topology preservation, and show that strong simulation preserves the topology of pattern graphs and data graphs. –Children, Parent, Connectivity, Cycles, Bounded matches –Bounded cycles, Bisimilarity We present the locality property of strong simulation, which allows us to effectively conduct pattern matching on distributed graphs We present the locality property of strong simulation, which allows us to effectively conduct pattern matching on distributed graphs We have proposed and investigated strong simulation to rectify the problems of subgraph isomorphism and graph simulation. –the duality to preserve the parent relationships –the locality to eliminate excessive matches. We have proposed and investigated strong simulation to rectify the problems of subgraph isomorphism and graph simulation. –the duality to preserve the parent relationships –the locality to eliminate excessive matches. We show that strong simulation retains the same complexity as earlier extensions of simulation (a cubic-time algorithm) –Optimizations: minimization, dual simulation filtering, connectivity pruning We show that strong simulation retains the same complexity as earlier extensions of simulation (a cubic-time algorithm) –Optimizations: minimization, dual simulation filtering, connectivity pruning