Yinghui Wu, SIGMOD 2012 Query Preserving Graph Compression Wenfei Fan 1,2 Jianzhong Li 2 Xin Wang 1 Yinghui Wu 1,3 1 University of Edinburgh 2 Harbin Institute.

Slides:



Advertisements
Similar presentations
Reachability Querying: An Independent Permutation Labeling Approach (published in VLDB 2014) Presenter: WEI, Hao.
Advertisements

Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.
Yinghui Wu, LFCS DB talk Database Group Meeting Talk Yinghui Wu 10/11/ Simulation Revised for Graph Pattern Matching.
Shuai Ma, Yang Cao, Wenfei Fan, Jinpeng Huai, Tianyu Wo Capturing Topology in Graph Pattern Matching University of Edinburgh.
New Models for Graph Pattern Matching Shuai Ma ( 马 帅 )
The IEEE International Conference on Big Data 2013 Arash Fard M. Usman Nisar Lakshmish Ramaswamy John A. Miller Matthew Saltz Computer Science Department.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Graph Isomorphism Algorithms and networks. Graph Isomorphism 2 Today Graph isomorphism: definition Complexity: isomorphism completeness The refinement.
GRAIL: Scalable Reachability Index for Large Graphs VLDB2010 Vineet Chaoji Mohammed J. Zaki.
Towards Efficient Query Processing on Massive Evolving Graphs (C-Big2012) Arash Fard, Amir Abdolrashidi, Lakshmish Ramaswamy and John A. Miller UGA Presentation.
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
1 Querying Big Data: Theory and Practice Theory –Tractability revisited for querying big data –Parallel scalability –Bounded evaluability Techniques –Parallel.
Circuit Retiming with Interconnect Delay CUHK CSE CAD Group Meeting One Evangeline Young Aug 19, 2003.
CS Lecture 9 Storeing and Querying Large Web Graphs.
An Efficient Algorithm for Answering Graph Reachability Queries Yangjun Chen, Yibin Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage.
CS728 Lecture 16 Web indexes II. Last Time Indexes for answering text queries –given term produce all URLs containing –Compact representations for postings.
Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual.
Minimum Maximum Degree Publish-Subscribe Overlay Network Design Melih Onus TOBB Ekonomi ve Teknoloji Üniversitesi, 28 Mayıs 2009.
EDA (CS286.5b) Day 19 Covering and Retiming. “Final” Like Assignment #1 –longer –more breadth –focus since assignment #2 –…but ideas are cummulative –open.
Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,
1 QSX: Querying Social Graphs Graph Pattern Matching Graph pattern matching via subgraph isomorphism Graph pattern matching via graph simulation Revisions.
Trip Planning Queries F. Li, D. Cheng, M. Hadjieleftheriou, G. Kollios, S.-H. Teng Boston University.
Yinghui Wu LFCS Lab Lunch Homomorphism and Simulation Revised for Graph Matching.
VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 5: Global Routing © KLMH Lienig 1 FLUTE: Fast Lookup Table Based RSMT Algorithm.
Making Pattern Queries Bounded in Big Graphs 11 Yang Cao 1,2 Wenfei Fan 1,2 Jinpeng Huai 2 Ruizhe Huang 1 1 University of Edinburgh 2 Beihang University.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
Querying Big Graphs within Bounded Resources 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.
1 QSX: Querying Social Graphs Querying big graphs Parallel query processing Boundedly evaluable queries Query-preserving graph compression Query answering.
Cost-based Optimization of Graph Queries Silke Trißl Humboldt-Universität zu Berlin Knowledge Management in Bioinformatics IDAR 2007.
Subgraph Containment Search Dayu Yuan The Pennsylvania State University 1© Dayu Yuan9/7/2015.
Performance Guarantees for Distributed Reachability Queries Wenfei Fan 1,2 Xin Wang 1 Yinghui Wu 1,3 1 University of Edinburgh 2 Harbin Institute of Technology.
Efficient Gathering of Correlated Data in Sensor Networks
1 QSX: Querying Social Graphs Querying Big Graphs Parallel scalability Making big graphs small –Bounded evaluability –Query-preserving graph compression.
Research Directions for Big Data Graph Analytics John A. Miller, Lakshmish Ramaswamy, Krys J. Kochut and Arash Fard Department of Computer Science University.
Diversified Top-k Graph Pattern Matching 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.
On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.
Association Rules with Graph Patterns Yinghui Wu Washington State University Wenfei Fan Jingbo Xu University of Edinburgh Southwest Jiaotong University.
Yinghui Wu, ICDE Adding Regular Expressions to Graph Reachability and Pattern Queries Wenfei Fan Shuai Ma Nan Tang Yinghui Wu University of Edinburgh.
Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang.
Summarizing Answer Graphs Induced by keyword Queries Yinghui Wu (UCSB)
Distributed Graph Simulation: Impossibility and Possibility 1 Yinghui Wu Washington State University Wenfei Fan University of Edinburgh Southwest Jiaotong.
1 QSX: Querying Social Graphs Approximate query answering Query-driven approximation Data-driven approximation Graph systems.
Computer Science and Engineering TreeSpan Efficiently Computing Similarity All-Matching Gaoping Zhu #, Xuemin Lin #, Ke Zhu #, Wenjie Zhang #, Jeffrey.
CPT-S Topics in Computer Science Big Data 1 Yinghui Wu EME 49.
The 30th International Conference on Distributed Computing Systems June 2010, Genoa, Italy Parameterized Maximum and Average Degree Approximation in Topic-based.
CALTECH CS137 Spring DeHon 1 CS137: Electronic Design Automation Day 5: April 12, 2004 Covering and Retiming.
Author: Haoyu Song, Murali Kodialam, Fang Hao and T.V. Lakshman Publisher/Conf. : IEEE International Conference on Network Protocols (ICNP), 2009 Speaker:
Graph Data Management Lab, School of Computer Science Branch Code: A Labeling Scheme for Efficient Query Answering on Tree
An Optimal Certificate Dispersal Algorithm for Mobile Ad Hoc Networks Nagoya Institute of Technology Hua Zheng Shingo Omura Jiro Uchida Koichi Wada.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
CPT-S Advanced Databases 11 Yinghui Wu EME 49.
Outline  Introduction  Subgraph Pattern Matching  Types of Subgraph Pattern Matching  Models of Computation  Distributed Algorithms  Performance.
Efficient Semantic Web Service Discovery in Centralized and P2P Environments Dimitrios Skoutas 1,2 Dimitris Sacharidis.
Yinghui Wu, SIGMOD Incremental Graph Pattern Matching Wenfei Fan Xin Wang Yinghui Wu University of Edinburgh Jianzhong Li Jizhou Luo Harbin Institute.
CPT-S Advanced Databases 11 Yinghui Wu EME 49.
Incrementally Improving Lookup Latency in Distributed Hash Table Systems Hui Zhang 1, Ashish Goel 2, Ramesh Govindan 1 1 University of Southern California.
Outline Introduction State-of-the-art solutions Equi-Truss Experiments
Cohesive Subgraph Computation over Large Graphs
Answering pattern queries using views
New Characterizations in Turnstile Streams with Applications
RE-Tree: An Efficient Index Structure for Regular Expressions
Probabilistic Data Management
From dense to sparse and back again: On testing graph properties (and some properties of Oded)
Query-Friendly Compression of Graph Streams
Simulation based approach Shang Zechao
Graph Indexing for Shortest-Path Finding over Dynamic Sub-Graphs
XML indexing – A(k) indices
Incremental Maintenance of XML Structural Indexes
Approximate Graph Mining with Label Costs
Forbidden-set labelling in graphs
Presentation transcript:

Yinghui Wu, SIGMOD 2012 Query Preserving Graph Compression Wenfei Fan 1,2 Jianzhong Li 2 Xin Wang 1 Yinghui Wu 1,3 1 University of Edinburgh 2 Harbin Institute of Technology 3 University of California, Santa Barbara 1

Yinghui Wu SIGMOD 2012 Querying Real-life Graphs Real life graphs as “Big Data” Complexities of several common graph queries NP-complete for subgraph isomorphism Quadratic for simulation queries Cubic time for bounded simulation queries O(|V|+|E|) for reachability queries Indexing techniques IndexQuery time time (Index)Size (Index) TCO(1)O(|V||E|)O(|V| 2 ) GRIPPO(|E|-|V|)O(|V|+|E|) Tree CoverO(log|V|)O(|V||E|)O(|V| 2 ) 2-HopO(|E| 1/2 )O(|V| 3 |TC|)O(|V||E| 1/2 ) 3-HopO(log|V| + k)O(k|V| 2 |Con(G)| )O(|V|k) Querying real-life graphs is prohibitively expensive theoretically hard to reduce! 3

Yinghui Wu SIGMOD 2012 Graph compression techniques General graph compression encoding via node ordering extrinsic information-dependent lossless compression Query-friendly compression (for e.g., neighborhood queries) construct compact data structures require decompression and algorithm revision 4 require decompression or revision of evaluation algorithms Compression for a query class?

Yinghui Wu SIGMOD 2012 Querying a recommendation network MSA 1 BSA 1 MSA 2 BSA 2 … FA 1 C1C1 FA 3 C3C3 FA 2 C2C2 CkCk FA 4 BSA FA C QpQp G MSA r BSA r FA r FA’ r CrCr C’ r Directly querying a compressed graph 2 5 preserving information only relevant to queries

Yinghui Wu SIGMOD 2012 outline Querying Preserving Graph Compression compress graphs while preserving query results Reachability preserving compression Graph pattern preserving compression Incremental query preserving compression Experimental study Conclusion Query-preserving Graph Compression 2

Yinghui Wu SIGMOD 2012 Query-preserving compression 6 Compression related to a class of queries of users’ choice Query Preserving Graph Compression, a triple where R: a compression function, F: L q ->L q is a query rewriting function, where L q denotes a class of graph queries (in the same class) P: a post-processing function For any graph G, Gr = R(G) s.t. for all Q ∈ L q, Q(G) = P(Q’(Gr)), and Any query evaluation algorithm for Q can be directly used to compute Q’(Gr), without decompressing Gr. Indexing and optimization techniques can be directly applied to Gr Lossy compression; Gr is not necessarily a subgraph of G; Gr can be directly queried without decompression rather than to restore the original graph

Yinghui Wu SIGMOD 2012 Query-preserving compression 7 … Q G Q(G) Gr Q’ Q’(Gr) direct querying R (compression) query-preserving compression P (post-processing) post processing query rewriting generic, once for all compression

Yinghui Wu SIGMOD 2012 a tale of two queries… 8 QRQR G Q(G) Gr QR’QR’ Q R ’(Gr) R QPQP G Q(G) Gr QP’QP’ Q P ’(Gr) R P Reachability preserving Compression -Q R : reachability queries - R reduce G by 95% in average in O(|V||E|) time - F is in O(1) time - P: not needed Graph Pattern preserving Compression - Q P : graph pattern queries - R reduce G by 57% in average in O(E| log|V|) time - F: identify mapping - P: linear time

Yinghui Wu SIGMOD 2012 Reachability preserving compression 9 R is in quadratic time F is in constant time no post-processing P is required. Reachability equivalence relation reachability relation R e : a node pair (u,v) ∈ R e iff they have the same set of ancestors and descendants in G. for any graph G, there is a unique maximum R e, i.e., the reachability equivalence relation of G Query preserving compression for reachability queries

Yinghui Wu SIGMOD 2012 Reachability preserving compression A reachability preserving compression for G R maps each node v in G to its reachability equivalence class [v] in Gr, and each edge to an edge between two equivalence classes (if necessary) F maps each node in Q R to its equivalence class in Gr Correctness: |Gr| ≤ |G| For any query Q R (v,w) over G, v can reach w iff R(v) can reach R(w) in Gr 10 Nodes in Gr denote equivalence classes Reduction: 95% in average for reachability queries

Yinghui Wu SIGMOD 2012 C1 QRQR MSA 1 BSA 1 MSA 2 BSA 2 … FA 1 C1C1 C3C3 FA 2 C2C2 CkCk FA 3 FA 4 FA 1 FA 3 FA 4 MSA 1 BSA 1 MSA 2 BSA 2 C1C1 FA 2 C2C2 C3C3 … C4C4 CkCk 1. Compute Re and its reduced partition 2. Construct a node for each node set in the partition 3. Construct Gr Reachability preserving compression: algorithm and example O(|V||E|)

Yinghui Wu SIGMOD 2012 Graph Pattern Preserving Compression Graph pattern preserving compression, in which for any graph G(V,E,L), R is in O(|E|log|V|), F is the identity mapping P is in linear time in the size of the query answer. Bisimulation relation: a binary relation B over V of G, s.t for each node pair (u,v) ∈ B, L(u) = L(v) for each edge (u,u’) ∈ E, there exists (v,v’) ∈ E, s.t. (u’,v’) ∈ B, for each edge (v,v’) ∈ E, there exists (u,u’) ∈ E, s.t. (u’,v’) ∈ B Bisimulation equivalence relation Rb: the unique maximum bisimulation relation Equivalence relation 12 A3A3 B4B4 A4A4 A5A5 B5B5 C3C3 C4C4 A1A1 B1B1 D1D1 C1C1 A2A2 B2B2 D2D2 C2C2 B3B3 G1G1 G2G2

Yinghui Wu SIGMOD 2012 Compressing graphs via bisimulation The pattern preserving compression R(G) = G r, where each node in Gr represents an equivalence class [v] of a node v in G, and there is an edge ([u],[v]) in G r if (u,v) is an edge in G. F(Q p ) = Q p, i.e., identity mapping. P: for each (v p, [v]) ∈ Q p (G r ), and each v’ ∈ [v], (v p,v’) ∈ Q p (G) Correctness: for any pattern query Q p, Q p (G) = P(Q p (G r )). 13 Making use of the reverse of R: nodes in Gr and Q( G ) are expanded to nodes in their equivalence classes Reduction: 57% in average for graph pattern matching

Yinghui Wu SIGMOD Compute the bisimulation equivalence relation Rb and its induced partition P: initialize and refine P w.r.t Rb until fixpoint 2. Construct Gr Graph Pattern Preserving Compression: algorithm MSA 1 BSA 1 MSA 2 BSA 2 … FA 1 C1C1 FA 3 C3C3 FA 2 C2C2 CkCk FA 4 BSA FA C QpQp G MSA r BSA r FA r FA’ r CrCr C’ r Directly querying a compressed graph 2 14 A1A1 B1B1 A 2 … B2B2 B3B3 AkAk …B k A k+1 O(|E|log|V|)

Yinghui Wu SIGMOD 2012 Incremental Graph Compression Real-life data are changing and evolving… Incremental Graph Compression: compute changes ∆Gr to Gr, s.t., Gr ⊕ ∆Gr = R (G ⊕ ∆G). update Gr without recompressing G ⊕ ∆G Affected area: the changes in the input ∆G and the output Gr |AFF| = |∆Gr| + |∆G| bounded and unbounded problem expressible by f(|AFF|)? 15 5%/week in Web graphs ∆G ∆Gr GGr Gr ⊕ ∆Gr R(G ⊕ ∆G) R Complexity measurement? Incremental Graph Compression Compressed once and incrementally maintained

Yinghui Wu SIGMOD 2012 Incremental Reachability Preserving Compression Incremental reachability preserving compression (RCM) unbounded even for unit update, i.e., a single edge insertion and deletion RCM is solvable in O(|AFF||Gr|) time without decompressing Gr 16 Reduction from single source reachability problem FA 1 C2C2 C1C1 FA 2 G FA 1 C1C1 FA 2 C2C2 Gr C1C1 FA 2 C2C2 Gr’ C1C1 FA 1 FA 2 C2C2 Gr’’ 1. Update topological ranking, initialize AFF FA 1 C1C1 FA 2 C2C2 2. (iteratively) split/merge nodes and update Gr

Yinghui Wu SIGMOD 2012 Incremental Graph Pattern Preserving Compression 17 G BSA 1 MSA 2 BSA 2 … MSA 1 FA 1 FA 2 FA 3 FA 4 C1C1 C2C2 C3C3 C4C4 FA 2 C2C2 FA 1 FA 3 FA 4 … C1C1 C3C3 C4C4 MSA 2 MSA 1 BSA 1 BSA 2 GqGq Incremental pattern preserving compression (PCM) is unbounded even for unit update RCM is solvable in O(|AFF| 2 +|Gr|) time without the need to access the original graph G 1. Update node ranking, initialize AFF 2. Iteratively split/merge nodes in Gr and update AFF Affected area Incremental compression without recomputation

Yinghui Wu SIGMOD 2012 Experimental Evaluation Experimental setting Real-life datasets: Facebook, Amazon, YouTube, wikiVote, wikiTalk, socEpinions; NotreDame, P2P, Internet; citHepTh, Citation Synthetic data, with randomly generated updates. Pattern generator, controlled by the number of nodes, edges, predicates and bounds on edges 18 ProblemBatchIncremental Reachability Preserving Compression Compression R IncRCM Transitive compressionAHO Pattern Preserving Compression Compression B IncPCM Query evaluationBFS,BiBFS; MatchIncBMatch compression ratio, memory reduction, query time, and incremental maintenance

Yinghui Wu SIGMOD 2012 Experimental Results I: compression ratio Reachability preserving compression Graph Patten preserving compression 19 in average 5% reduce SCC graphs by 81% in average Perform best on social networks due to high connectivity in average 43% Perform best on Internet

Yinghui Wu SIGMOD 2012 Experimental Results I: compression ratio 20 Reachability preserving compression ratio w.r.t edge increment Pattern preserving compression ratio w.r.t edge increment

Yinghui Wu SIGMOD 2012 Experimental Results I: compression ratio 21 2-hop as index Reduction: 92% of the memory of G in average

Yinghui Wu SIGMOD 2012 Experimental Results II: query evaluation 22 Reachability preserving compressionPattern preserving compression Reduction: 70% of the querying time over G in average

Yinghui Wu SIGMOD 2012 Experimental Results III: Incremental compression 23 Incremental reachability preserving compression w.r.t edge insertions Incremental graph pattern preserving compression w.r.t batch updates The compressed graphs can be efficiently maintained Changes up to 22%

Yinghui Wu SIGMOD 2012 Conclusion Querying preserving graph compression directly query compressed graph without decompression Reachability preserving compression Graph pattern preserving compression Incremental query preserving compression Incrementally update compressed graphs without decompression Future work Query-preserving compression for other queries Testing the compression techniques over more real-life datasets Optimizations for incremental compression techniques Extending the techniques to distributed graph querying 24 Query preserving compression: A promising approach to coping with Big Data

Yinghui Wu SIGMOD Thank you! Query preserving graph compression

Yinghui Wu SIGMOD 2012 Subgraph isomorphism and Graph Simulation Node label equivalence Edge-to-edge function/relation 26 Identical label matching, edge-to-edge function/relations Capable enough? A B D B v1v1v1v1 v2v2v2v2 E G A B DE P P A B DEED BB A G