Download presentation

Presentation is loading. Please wait.

Published byPierce Waters Modified about 1 year ago

1
Towards Efficient Query Processing on Massive Evolving Graphs (C-Big2012) Arash Fard, Amir Abdolrashidi, Lakshmish Ramaswamy and John A. Miller UGA Presentation by : Charith Wickramaarachchi

2
Time Evolving Graph Paradigm for molding dynamic relationships in networks. TEG : Series of snapshots of a graph which evolves over time. Web graph Relationship structure of social networks Communication flow networks Evolution History of genome families

3
TEG and Scalability Additional Dimension – Time New queries Historical Inverse temporal Continuous Data volume Indexing

4
Overview Data distribution strategies for TEGs Answering reachability queries Sub graph queries in large TEGs

5
TEG distribution Objectives Improve node utilization Minimize the communication cost Strategies Random distribution Improves node utilization High communication connected sub-graph distribution Low communication Low node utilization

6
Type of Algorithm High communication low computation Page rank, HCC - Min-cut Low communication SSSP - Radom distribution Dynamic Nature of Graph Additions and deletions of nodes. Repartitioning cost Data transfer cost. Re-wiring cost Data node configuration More partitions than compute nodes (Partition : CC ) Smaller sized partitions Small stragglers

7
Reachability queries in TEGs {G 1,G 2,…… G q, …..G r } – Snapshots of TEG : G Diff(G q,G q-1 ) – Changes between snapshots G q and G q-1 Vertex addition Edge addtion Reach(v,w,q) – TRUE/FALSE

8
Reachability Queries in Static Graphs On demand Traversal O(M+N) Pre Indexing O(1) – Pre computed spanning tree High indexing time Index table Limitations for TEGs High indexing cost – Need to index per each snapshot High storage overhead Low cost benefit ratio

9
Approach Interval – based indexing

10
Approach Steps (Assume Reach (u,v,q) where q > p and G p is indexed) Reach(u,v,p) ? Does Diff(G p,G q ) change that Naïve approach : process Diff(G p,G q ) in Chronological order A Better approach : Does the changes impact the reachability ?

11
Approach Reach (A,H,3) Add(E,F) ? Related ? Add(B,E) & Add(F,G) & Add(E,F)

12
Observations If Reach(u,v,q) = true Need to process diffs if diff stack contains at least one delete(p,q) where p,q is a edge on a path from u,v in G p If Reach(u,v,q) = false Contains at least one Add(p,q) p is reachable from u q is reachable from v

13
Graph Pattern Matching Subgraph Isomorphism Bijective mapping between query (Q(V q,E q ))graph and subgraph(G’(V’,E’)) of target graph G. There exist f : V’--> V q For all v’,w’ in V’ there is v q,w q in V q s.t. (v’,w’) in E’ ↔ (v q,w q ) in E q Simulation G(V,E) matches Q(V q,E q ) if there exist R subset of V q X V s.t. (u,u’) in R -> u and u’ have same label For all u in V q there is u’ in V For all (u,v) in E q there is a (u’,v’) in E

14
Vertex Centric approach Graph (V,E,l) Query Q(V q,E q,l q ) Output M : a Max m match in G for Q Use GPS features Master for global operations

15
Vertex Centric approach 1 ST - Master broadcasts the query 2 nd – Each vertex whose label is same as in Q will get flagged S : set of matched nodes (Note v in G can be matched to two vertices in Q) Each vertex keeps set of lists of labels for possible children. # of outgoing edges < any list of children : remove. Send id to children. 3 rd Children reply with id, label 4 th : If received child label is superset of matched children labels in Q keep, else remove. Pass the removal report to parents 5 th : Remove the child list, Check for validity in S. If not remove your self from S, Report to parents. Next : Goto 5 th.

16
Conclusion TEG processing : an emerging research area with lot of applications Need for new partitioning techniques and graph query techniques Does TEG processing applications benefits more from an EDA based model than traditional query processing model ?

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google