Performance Guarantees for Distributed Reachability Queries Wenfei Fan 1,2 Xin Wang 1 Yinghui Wu 1,3 1 University of Edinburgh 2 Harbin Institute of Technology.

Slides:



Advertisements
Similar presentations
CS 267: Automated Verification Lecture 8: Automata Theoretic Model Checking Instructor: Tevfik Bultan.
Advertisements

Evaluating “find a path” reachability queries P. Bouros 1, T. Dalamagas 2, S.Skiadopoulos 3, T. Sellis 1,2 1 National Technical University of Athens 2.
1 TDD: Topics in Distributed Databases Distributed Query Processing MapReduce Vertex-centric models for querying graphs Distributed query evaluation by.
Yinghui Wu, LFCS DB talk Database Group Meeting Talk Yinghui Wu 10/11/ Simulation Revised for Graph Pattern Matching.
Shuai Ma, Yang Cao, Wenfei Fan, Jinpeng Huai, Tianyu Wo Capturing Topology in Graph Pattern Matching University of Edinburgh.
New Models for Graph Pattern Matching Shuai Ma ( 马 帅 )
The IEEE International Conference on Big Data 2013 Arash Fard M. Usman Nisar Lakshmish Ramaswamy John A. Miller Matthew Saltz Computer Science Department.
Queries with Difference on Probabilistic Databases Sanjeev Khanna Sudeepa Roy Val Tannen University of Pennsylvania 1.
Distributed Approximate Spectral Clustering for Large- Scale Datasets FEI GAO, WAEL ABD-ALMAGEED, MOHAMED HEFEEDA PRESENTED BY : BITA KAZEMI ZAHRANI 1.
Efficient Processing Regular Queries In Shared-Nothing Parallel Database Systems Using Tree- And Structural Indexes (ADBIS 2007, Bulgaria) Vu Le Anh, Attilla.
CS 267: Automated Verification Lecture 10: Nested Depth First Search, Counter- Example Generation Revisited, Bit-State Hashing, On-The-Fly Model Checking.
1 QSX: Querying Social Graphs Parallel models for querying graphs beyond MapReduce Vertex-centric models –Pregel (BSP) –GraphLab GRAPE.
University of Minnesota CG_Hadoop: Computational Geometry in MapReduce Ahmed Eldawy* Yuan Li* Mohamed F. Mokbel*$ Ravi Janardan* * Department of Computer.
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
Data Flow Analysis Compiler Design Nov. 8, 2005.
Computing with Finite Automata (part 2) 290N: The Unknown Component Problem Lecture 10.
Yinghui Wu LFCS Lab Lunch Homomorphism and Simulation Revised for Graph Matching.
Making Pattern Queries Bounded in Big Graphs 11 Yang Cao 1,2 Wenfei Fan 1,2 Jinpeng Huai 2 Ruizhe Huang 1 1 University of Edinburgh 2 Beihang University.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Querying Big Graphs within Bounded Resources 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.
Yinghui Wu, SIGMOD 2012 Query Preserving Graph Compression Wenfei Fan 1,2 Jianzhong Li 2 Xin Wang 1 Yinghui Wu 1,3 1 University of Edinburgh 2 Harbin Institute.
1 QSX: Querying Social Graphs Querying big graphs Parallel query processing Boundedly evaluable queries Query-preserving graph compression Query answering.
Graph Indexing: A Frequent Structure­ based Approach Authors:Xifeng Yan†, Philip S‡. Yu, Jiawei Han†
Advanced Topics NP-complete reports. Continue on NP, parallelism.
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Charu Aggarwal + * Department of Computer Science, University of Texas at Dallas + IBM T. J. Watson.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
G-SPARQL: A Hybrid Engine for Querying Large Attributed Graphs Sherif SakrSameh ElniketyYuxiong He NICTA & UNSW Sydney, Australia Microsoft Research Redmond,
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.
Tree Decomposition Benoit Vanalderweireldt Phan Quoc Trung Tram Minh Tri Vu Thi Phuong 1.
February 18, 2015CS21 Lecture 181 CS21 Decidability and Tractability Lecture 18 February 18, 2015.
1 Automatic Refinement and Vacuity Detection for Symbolic Trajectory Evaluation Orna Grumberg Technion Haifa, Israel Joint work with Rachel Tzoref.
Diversified Top-k Graph Pattern Matching 1 Yinghui Wu UC Santa Barbara Wenfei Fan University of Edinburgh Southwest Jiaotong University Xin Wang.
Association Rules with Graph Patterns Yinghui Wu Washington State University Wenfei Fan Jingbo Xu University of Edinburgh Southwest Jiaotong University.
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.
Yinghui Wu, ICDE Adding Regular Expressions to Graph Reachability and Pattern Queries Wenfei Fan Shuai Ma Nan Tang Yinghui Wu University of Edinburgh.
Efficient Processing of Top-k Spatial Preference Queries
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
Answering pattern queries using views Yinghui Wu UC Santa Barbara Wenfei Fan University of EdinburghSouthwest Jiaotong University Xin Wang.
Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
Computing Scientometrics in Large-Scale Academic Search Engines with MapReduce Leonidas Akritidis Panayiotis Bozanis Department of Computer & Communication.
Distributed Graph Simulation: Impossibility and Possibility 1 Yinghui Wu Washington State University Wenfei Fan University of Edinburgh Southwest Jiaotong.
Mining Top-K Large Structural Patterns in a Massive Network Feida Zhu 1, Qiang Qu 2, David Lo 1, Xifeng Yan 3, Jiawei Han 4, and Philip S. Yu 5 1 Singapore.
CPT-S Topics in Computer Science Big Data 1 Yinghui Wu EME 49.
CS223 Advanced Data Structures and Algorithms 1 Maximum Flow Neil Tang 3/30/2010.
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
NPC.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
NETWORK FLOWS Shruti Aggrawal Preeti Palkar. Requirements 1.Implement the Ford-Fulkerson algorithm for computing network flow in bipartite graphs. 2.For.
Genetic algorithms for task scheduling problem J. Parallel Distrib. Comput. (2010) Fatma A. Omara, Mona M. Arafa 2016/3/111 Shang-Chi Wu.
Yinghui Wu, SIGMOD Incremental Graph Pattern Matching Wenfei Fan Xin Wang Yinghui Wu University of Edinburgh Jianzhong Li Jizhou Luo Harbin Institute.
CPT-S Advanced Databases 11 Yinghui Wu EME 49.
Da Yan (HKUST) James Cheng (CUHK) Wilfred Ng (HKUST) Steven Liu (HKUST)
CPT-S Advanced Databases 11 Yinghui Wu EME 49.
Topics  Direct Predicate Characterization as an evaluation method.  Implementation and Testing of the Approach.  Conclusions and Future Work.
EpiC: an Extensible and Scalable System for Processing Big Data Dawei Jiang, Gang Chen, Beng Chin Ooi, Kian Lee Tan, Sai Wu School of Computing, National.
Date : 2016/08/09 Advisor : Jia-ling Koh Speaker : Yi-Yui Lee
Answering pattern queries using views
Optimizing Parallel Algorithms for All Pairs Similarity Search
CPT-S 415 Big Data Yinghui Wu EME B45.
Algebra Bell-work 9/13/17 Turn in your HW! 1.) 7x – 6 = 2x + 9
Probabilistic Data Management
Central Florida Business Intelligence User Group
Queries with Difference on Probabilistic Databases
Simulation based approach Shang Zechao
Graph Indexing for Shortest-Path Finding over Dynamic Sub-Graphs
G-CORE: A Core for Future Graph Query Languages
Switching Lemmas and Proof Complexity
Efficient Processing of Top-k Spatial Preference Queries
Presentation transcript:

Performance Guarantees for Distributed Reachability Queries Wenfei Fan 1,2 Xin Wang 1 Yinghui Wu 1,3 1 University of Edinburgh 2 Harbin Institute of Technology 3 University of California, Santa Barbara 1

outline Querying distributed real-life graphs Real-life graphs are often fragmented/distributed Distributed reachability queries Distributed bounded reachability queries Distributed regular reachability queries Distributed reachability with MapReduce Experimental study Conclusion Distributed query evaluation with performance guarantees 2 Partial Evaluation Yinghui Wu VLDB 2012

Distributed Real-life Graphs Real life graphs are distributed Geo-distributed, e.g., data centers Decentralization, e.g., social networks Distributed entity and personal information 3 Yinghui Wu VLDB 2012 Real-life graphs are purposely or naturally distributed

Distributed querying methods Federated/centralized graph database collect and link graph fragments query the centralized graph 4 construction and maintenance cost centralized querying Q Q(G) fragments... Yinghui Wu VLDB 2012

Distributed querying methods Graph exploration strategy Master node and slave node Predefined graph partition and query execution plan 5 no bounds on visit numbers and data shipment master node Q Q(G) query planintermediate results slave node... Yinghui Wu VLDB 2012

Querying a distributed social network 6 (DB* ∪ HR*) Ann, "CTO" Mark, "FA" Fred, "HR" Walt, "HR" Dan,"DB" Bill,"DB" Mark,"FA" Pat,"SE" Tom,"AI" Ross,"HR" Jack,"MK" Ben,"MK" Emmy,"HR" Mat,"HR" Q DC 1 DC 2 DC 3 Yinghui Wu VLDB 2012 Using partial evaluation to obtain performance guarantees centralized method?Graph exploration?

Yinghui Wu VLDB 2012 Partial evaluation Partial evaluation (a.k.a program specialization) given a function f(s,d) and a part of input e.g., s, specializes f(s,d) w.r.t s only conducts the part of f’s computation that depends on s generates a residual function f’ Partial evaluation: generating partial answer f (s, d)f’ (d) s Q (F i, G) Fi Q’ (G) for graph queries? 7

Distributed graphs and graph queries 8 Distributed graph graph fragmentation F = (F, G f ) fragment graph G f Reachability query reachability query Qr(s,t) bounded reachability query Qbr(s,t,l) regular reachability (path) query Qrr(s,t,R)  R::= ε| a | RR | R ∪ R | R* Fred, "HR" Walt, "HR" Bill,"DB" Pat,"SE" Tom,"AI" Ross,"HR " Jack,"MK" Emmy,"HR" Mat,"HR " F1F1 F2F2 F3F3 GfGf fragment a virtual node of F 1 an in-node of F 1 a cross edge Ann, "CTO" Mark, "FA" Q r (Ann, Mark) Ann, "CTO" Mark, "FA" Q br (Ann, Mark, 5) (DB* ∪ HR*) Ann, "CTO" Mark, "FA" Q rr (Ann, Mark, (DB* ∪ HR*)) 5 Yinghui Wu VLDB 2012

Distributed graph querying framework 9 Applying partial evaluation to graph querying coordinator Sc Q Q(G) fragments... coordinating site Sc and a set of graph fragments F1, …, Fn distributing at Sc: post Q to fragments local evaluation: partially evaluate Q Assembling at Sc QQQ Q Q(Fi) Yinghui Wu VLDB 2012

Distributed reachability queries 10 Yinghui Wu VLDB 2012 Performance guarantees: Over a fragmentation F = (F, G f ) of a graph G, reachability queries can be evaluated (a) in O(|V f ||F m |) time, (b) by visiting each site only once, and (c) with the total network traffic bounded by O(|V f | 2 ), where G f = (V f, E f ) and F m is the largest fragment in F. A distributed reachability evaluation algorithm DisReach Coordinator Sc posts qr(s,t) to each fragment site in F Each site locally evaluates qr(s,t) in parallel, and produces partial answer as a set of Boolean equations Sc collects and assembles the partial results

Distributed reachability: partial evaluation 11 Partial evaluation by introducing Boolean variables Yinghui Wu VLDB 2012 Local evaluate each qr(v,t) on Fi in parallel: for each in-node v’ in Fi, decides if v’ reaches t; introduce a Boolean variable to each v’ Partial answer to qr(v,t): a set of Boolean formula, disjunction of variables of v’ to which v can reach qr(v,t) v t v’ t qr(v,v’) X v ’ = qr(v’,t) = X v1 ’ or … or X vn ’

Distributed reachability: assembling 12 Partial evaluation by introducing Boolean variables Yinghui Wu VLDB 2012 Collect the Boolean equation set at coordinator Sc solve a Boolean equation system over a dependency graph qr(s,t) is true iff Xs = true at Sc X v = X v’’ or X v’ X v’’ = false Xt = 1 X v ’ = Xt Xs = Xv O(|V f |)

QQ Q Q Q Yinghui Wu VLDB Dispatch Q to fragments (at Sc) 2. Partial evaluation: generating Boolean equations (at Fi) 3. Assembling: solving equation system (at Sc) Distributed reachability queries: example Sc Jack,"MK" Emmy,"HR" Mat,"HR " F2F2 Fred, "HR" Walt, "HR" Bill,"DB " F1F1 Pat,"SE" Tom,"AI" Ross,"HR " F3F3 13 Ann Mark

QQ Q Q Q Yinghui Wu VLDB Dispatch Q to fragments (at Sc) 2. Partial evaluation: generating equations (at Fi) 3. Assembling: solving equation system (at Sc) Distributed bounded reachability queries Sc Jack,"MK" Emmy,"HR" Mat,"HR " F2F2 Fred, "HR" Walt, "HR" Bill,"DB " F1F1 Pat,"SE" Tom,"AI" Ross,"HR " F3F3 Variables denoting numeric values 15 Ann Mark A weighted dependency graph

Distributed bounded reachability queries Performance guarantees: bounded reachability queries can be evaluated with the same performance guarantees as for reachability queries. 16 Yinghui Wu VLDB 2012 each site is visited once O(|F|) number of visits O(|Vf| 2 ) independent of |G| total network traffic O(|Vf||Fm|) parallel computation computational cost Any strategy (i.e., indexes, graph partitions) can be applied on fragment Local evaluation Performance guarantees for distributed bounded reachability

Distributed regular reachability queries Performance guarantees: Over a fragmentation F = (F, G f ) of a graph G, regular reachability queries qrr(s, t, R) can be evaluated (a) in O((|V f | 2 +|F m |)|R| 2 ) time, (b) by visiting each site only once, and (c) with the total network traffic bounded by O(R| 2 |V f | 2 ), where G f = (V f, E f ) and F m is the largest fragment in F. Query automaton Gq(R) of R: 17 Yinghui Wu VLDB 2012 Automaton representation for queries

Query automaton A node v is a match of state u v in G q (R) iff (1) they have the same label, and (2) there is a path ρ from v to t and a path ρ’ from u v to u t, s.t. ρ and ρ’ induce the same label Given a graph G, q rr (s, t, R) over G is true if and only if s is a match of u s in G q (R) 18 Yinghui Wu VLDB 2012 Fred, "HR" Walt, "HR" Mark,"FA" Tom,"AI" Ross,"HR" Emmy,"HR" Mat,"HR" Q Ann FA DBHR

Distributed regular query evaluation: algorithm 19 Yinghui Wu VLDB 2012 Query distribution construct query automaton Gq(R) post Gq(R) to each graph fragment Partial evaluation compute partial answers as a set of Boolean formula vectors assembling collect partial answers and construct dependency graph assembling final answer

f 21 f 22 … f 2k Distributed regular query evaluation: partial evaluation 20 Yinghui Wu VLDB 2012 Partial evaluation by introducing Boolean variables For each node v in Fi, assign v. rvec: a vector of O(|Vq|) Boolean formulas, each entry v.rvec[u] denotes if v matches u introduce a Boolean variable X(v’,w) to each virtual node v’ of Fi and a state w in Vq, denoting if v’ matches w Partial answer to qrr(s,t): a set of Boolean formula from each in- nodes of Fi v1v1 t v’ t vqvq wqwq … v2v2 f 11 f 12 … f 1k f 1v’ f 2v’ … f kv’ q rr X(v’,w)

Distributed regular query evaluation: assembling 21 Yinghui Wu VLDB 2012 Partial evaluation by introducing Boolean variables Collects partial results as set of Boolean formulas Constructs a dependency graph: a node v d for each in-node and each entry of its formula vector, labeled with Boolean formula and an edge for dependencies Checks the reachability of v d (s, u s ) can reach v d (t, u t ) in the dependency graph v1v1 t v’ t vqvq wqwq … v2v2 q rr f 11 f 12 … f 1k vd(v 1, v q ) vd(v’,w) vd(v 2,v q ) vd(t,u t )=true vd(s, u s )

Yinghui Wu VLDB 2012 Distributed Regular Reachability Evaluation: Example 22 QQ Q Q Q 1. Dispatch Q to fragments (at Sc) 2. Partial evaluation: generating a set of Boolean equations (at Fi) 3. Assembling: solving equation system (at Sc) Sc Jack,"MK" Emmy,"HR" Mat,"HR " F2F2 Fred, "HR" Walt, "HR" Bill,"DB " F1F1 Pat,"SE" Tom,"AI" Ross,"HR " F3F3 Test reachability in dependency graph distributed regular reachability query evaluation vector of Boolean formulas

Yinghui Wu VLDB 2012 Distributed Reachability with MapReduce Partial evaluation properly fits in MapReduce framework … coordinator mapper 1mapper mmapper k … reducer 1. generates query automata Gq; partition graph G to K fragments (as a key/value pair) (i, ) 2. Map function: local evaluation upon (i, ) and generates 3. Reduce function: assembles collected partial results and writes to distributed file system. 1, k, 1, rvset k 1, rvset 1 O(F m ) O(|R| 2 |Vf | 2 ) Processing path O(F m ) + |R| 2 |Vf | 2 ) 24

Experimental Evaluation Experimental setting Real-life datasets Synthetic data: larger random graphs following densification law Algorithms:  disReach, disReachn and disReachm  disDist and disDistn  disRPQ, disRPQn and disRPQd  MRdRPQ 25 Yinghui Wu VLDB 2012

26 Distributed reachability Efficiency and scalability 20% and 6%9% of disReachn three thousand visits over 4 fragments disReach outperforms centralized and message-passing approaches

Distributed regular reachability Efficiency and network traffic 27 Yinghui Wu VLDB 2012 Time: 60% of disRPQnTraffic: at most 25% and 3% disRPQ takes much less time and communication cost

Distributed regular reachability (cont.) Scalability 28 Yinghui Wu VLDB 2012 Scales well with the number of fragments; takes less time over more fragments disRPQ scales well over the number of fragments

Performance of MapReduce implementation Efficiency and Scalability 29 Yinghui Wu VLDB 2012 scales well with the size of fragments Takes more time over more complex queries Takes less time with more mappers Partial evaluation works well in MapReduce model

Conclusion Distributed reachability querying Partial evaluation based distributed evaluation Reachability, bounded reachability and regular reachability queries Performance guarantees Partial evaluation can be naturally conducted as MapReduce Future work Distributed evaluation for other queries, e.g., graph pattern matching using simulation Combining partial evaluation and incremental computation 30 Partial evaluation based distributed query evaluation Yinghui Wu VLDB 2012

29 Thank you! Performance Guarantees for Distributed Reachability Queries