Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,

Slides:



Advertisements
Similar presentations
Lecture 15. Graph Algorithms
Advertisements

Introduction to Algorithms Lecture 12 Prof. Constantinos Daskalakis CLRS
1/44 A simple Test For the Consecutive Ones Property.
Transitive Closure Compression Jan. 2013Yangjun Chen ACS Outline: Transitive Closure Compression Motivation DAG decomposition into node-disjoint.
Tirgul 7 Review of graphs Graph algorithms: –DFS –Properties of DFS –Topological sort.
TREES Chapter 6. Trees - Introduction  All previous data organizations we've studied are linear—each element can have only one predecessor and successor.
CS 312 – Graph Algorithms1 Graph Algorithms Many problems are naturally represented as graphs – Networks, Maps, Possible paths, Resource Flow, etc. Ch.
B + -Trees Sept. 2012Yangjun Chen ACS B + -Tree Construction and Record Searching in Relational DBs Chapter 6 – 3rd (Chap. 14 – 4 th, 5 th ed.; Chap.
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 21: Graphs.
Trees Chapter 8.
Core Labeling: A New Way to Compress Transitive Closure
Jan. 2013Dr. Yangjun Chen ACS Outline Signature Files - Signature for attribute values - Signature for records - Searching a signature file Signature.
Implementation of Graph Decomposition and Recursive Closures Graph Decomposition and Recursive Closures was published in 2003 by Professor Chen. The project.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Spring, 2001 Makeup Lecture Chapter 23: Graph Algorithms Depth-First SearchBreadth-First.
Fall 2007CS 2251 Trees Chapter 8. Fall 2007CS 2252 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information.
Trees Chapter 8. Chapter 8: Trees2 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information To learn how.
An Efficient Algorithm for Answering Graph Reachability Queries Yangjun Chen, Yibin Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage.
Lists A list is a finite, ordered sequence of data items. Two Implementations –Arrays –Linked Lists.
Connected Components, Directed Graphs, Topological Sort COMP171.
Reachability Queries Sept. 2014Yangjun Chen ACS Outline: Reachability Query Evaluation What is reachability query? Reachability query evaluation.
Constructing Signature Graphs for Signature Files Dr. Yangjun Chen Dept. Applied Computer Science University of Winnipeg Canada.
Tirgul 7 Review of graphs Graph algorithms: – BFS (next tirgul) – DFS – Properties of DFS – Topological sort.
Important Problem Types and Fundamental Data Structures
Graph Algorithms Using Depth First Search Prepared by John Reif, Ph.D. Distinguished Professor of Computer Science Duke University Analysis of Algorithms.
Cost-based Optimization of Graph Queries Silke Trißl Humboldt-Universität zu Berlin Knowledge Management in Bioinformatics IDAR 2007.
Chapter 9 – Graphs A graph G=(V,E) – vertices and edges
Spring 2015 Lecture 10: Elementary Graph Algorithms
GRAPHS CSE, POSTECH. Chapter 16 covers the following topics Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component,
Lecture 10 Trees –Definiton of trees –Uses of trees –Operations on a tree.
Trees Chapter 8. Chapter 8: Trees2 Chapter Objectives To learn how to use a tree to represent a hierarchical organization of information To learn how.
Spring 2010CS 2251 Trees Chapter 6. Spring 2010CS 2252 Chapter Objectives Learn to use a tree to represent a hierarchical organization of information.
CSCI 115 Chapter 7 Trees. CSCI 115 §7.1 Trees §7.1 – Trees TREE –Let T be a relation on a set A. T is a tree if there exists a vertex v 0 in A s.t. there.
Path-Hop: efficiently indexing large graphs for reachability queries Tylor Cai and C.K. Poon CityU of Hong Kong.
Graphs. Definitions A graph is two sets. A graph is two sets. –A set of nodes or vertices V –A set of edges E Edges connect nodes. Edges connect nodes.
Discrete Structures Trees (Ch. 11)
Data Structures & Algorithms Graphs
Agenda Review: –Planar Graphs Lecture Content:  Concepts of Trees  Spanning Trees  Binary Trees Exercise.
 2004 SDU Lectrue4-Properties of DFS Properties of DFS Classification of edges Topological sort.
Problem Reduction So far we have considered search strategies for OR graph. In OR graph, several arcs indicate a variety of ways in which the original.
M180: Data Structures & Algorithms in Java Trees & Binary Trees Arab Open University 1.
GRAPHS. Graph Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component, spanning tree Types of graphs: undirected,
Properties and Applications of Depth-First Search Trees and Forests
Graph Connectivity This discussion concerns connected components of a graph. Previously, we discussed depth-first search (DFS) as a means of determining.
A New Top-down Algorithm for Tree Inclusion Dr. Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
Finding Regular Simple Paths Sept. 2013Yangjun Chen ACS Finding Regular Simple Paths in Graph Databases Basic definitions Regular paths Regular simple.
Introduction to Graph Theory By: Arun Kumar (Asst. Professor) (Asst. Professor)
Data Structures and Algorithm Analysis Graph Algorithms Lecturer: Jing Liu Homepage:
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
On the Intersection of Inverted Lists Yangjun Chen and Weixin Shen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba,
1 GRAPHS – Definitions A graph G = (V, E) consists of –a set of vertices, V, and –a set of edges, E, where each edge is a pair (v,w) s.t. v,w  V Vertices.
Lecture 20. Graphs and network models 1. Recap Binary search tree is a special binary tree which is designed to make the search of elements or keys in.
Introduction to Algorithms
A Linear-Space Top-down Algorithm for Tree Inclusion Problem
Chapter 5 : Trees.
MCS680: Foundations Of Computer Science
12. Graphs and Trees 2 Summary
Depth-First Search.
CS202 - Fundamental Structures of Computer Science II
Graph Algorithms Using Depth First Search
Outline: Transitive Closure Compression
Graphs Graph transversals.
Graphs Chapter 13.
Lectures on Graph Algorithms: searching, testing and sorting
Chapter 11 Graphs.
Trevor Brown DC 2338, Office hour M3-4pm
On the Graph Decomposition
Important Problem Types and Fundamental Data Structures
CSC 325: Algorithms Graph Algorithms David Luebke /24/2019.
Graph Traversals Some applications require visiting every vertex in the graph exactly once. The application may require that vertices be visited in some.
Presentation transcript:

Recursive Graph Deduction and Reachability Queries Yangjun Chen Dept. Applied Computer Science, University of Winnipeg 515 Portage Ave. Winnipeg, Manitoba, Canada R3B 2E9

Outline Motivation Graph deduction -Basic definitions -Critical nodes and critical subgraphs -Evaluation of reachability queries Recursive graph deduction (RGD) -Recursive deduction -Evaluation of reachability queries based on RGD Conclusion

Motivation Efficient method to evaluate graph reachability queries Given a directed acyclic graph (DAG) G, check whether a node v is reachable from another node u through a path in G. Application XML data processing, gene-regulatory networks or metabolic networks. It is well known that XML documents are often represented by tree structure. However, an XML document may contain IDREF/ID references that turn itself into a directed, but sparse graph: a tree structure plus a few reference links. For a metabolic network, the graph reachability models a relationship whether two genes interact with each other or whether two proteins participate in a common pathway. Many such graphs are sparse.

A simple method - store a transitive closure as a matrix Motivation c b a d e G:G: c b a d e G*: M = abcdeabcde abcdeabcde M* = abcdeabcde abcdeabcde O(n 2 ) space query time: O(1)

Question: Is it possible to reduce the size of M*, but still have a constant query time? Motivation

Graph deduction Basic definitions a b c k d r h e f g i j Let G be a sparse graph. we will first find a spanning tree T of G. The spanning tree of G is represented by the solid arrows, which covers all nodes of G.

tree edges (E tree ): edges appearing in T. cross edges (E cross ): any edge (u, v) such that u and v are not on the same path in T. forward edges (E forward ): any edge (u, v) not appearing in T, but there exists a path from u to v in T. back edges (E back ): any edge (u, v) not appearing in T, but there exists a path from v to u in T. i a b c k d r h e f g j Graph deduction Edge classification

Graph deduction Tree encoding Let G be a DAG. we will first find a spanning tree T of G. Each node v in T will be assigned an interval [start, end), where start is v’s preorder number and end - 1 is the largest preorder number among all the nodes in T[v]. So another node u labeled [start’, end’) is a descendant of v (with respect to T) iff start’  [start, end). i [3, 4) j [11, 12) [9, 12) [5, 9) k d r [8, 9) h e f c b a [10, 11) [6, 9) [7, 8) [4, 5) [2, 4) [1, 5) [0, 12) g

Graph deduction Tree encoding Let v and u be two nodes in T, labeled [a, b) and [a’, b’), respectively. If a  [a’, b’), v is a descendant of u. In this case, we say, [a, b) is subsumed by [a’, b’). Also, we must have b  b’. Therefore, if v and u are not on the same path in T, we have either a’  b or a  b’. In the former case, we say, [a, b) is smaller than [a’, b’), denoted [a, b)  [a’, b’). In the latter case, [a’, b’) is smaller than [a, b). i [3, 4) j [11, 12) [9, 12) [5, 9) k d r [8, 9) h e f c b a [10, 11) [6, 9) [7, 8) [4, 5) [2, 4) [1, 5) [0, 12) g

Graph deduction Critical nodes and critical subgraph We denote by E’ the set of all cross edges. Denote by V’ the set of all the end points of the cross edges. That is, V’ = V start  V end, where V start contains all the start nodes while V end all the end nodes of the cross edges. V start = {d, f, g, h} V end = {c, k, e, d, g} i j k d r h e f c b a g

Definition 1 (anti-subsuming subset) A subset S  V start is called an anti-subsuming subset iff |S| > 1 and no two nodes in S are related by ancestor-descendant relationship with respect to T. {d, f} {d, g} {d, h} {f, g} {f, h} {g, h} {d, f, g} {d, f, h} {d, g, h} {f, g, h} {d, f, g, h} anti-subsumming subsets: Graph deduction Critical nodes and critical subgraph i j k d r h e f c b a g

Definition 2 (critical node) A node v in a spanning tree T of G is critical if v  V start or there exists an anti-subsuming subset S = {v 1, v 2,..., v k } for k  2 such that v is the lowest common ancestor of v 1, v 2,..., v k. We denote V c the set of all critical nodes. In the graph, node e is the lowest common ancestor of {f, g}, and node a is the lowest common ancestor of {d, f, g, h}. So e and a are critical nodes. In addition, each v  V start is a critical node. So all the critical nodes of G with respect to T are {d, f, g, h, e, a}. h i j k d r e f c b a g Graph deduction Critical nodes and critical subgraph V c = {d, f, g, h, e, a}.

Critical node recognition Algorithm critical-node-recognition(T) 1. Mark any node in T, which belongs to V start. 2. Let v be the first marked node encountered during the bottom-up searching of T. Create the first node for v in G c. 3. Let u be the currently encountered node in T. Let u’ be a node in T, for which a node in G c is created just before u is met. Do (4) or (5), depending on whether u is a marked node or not. 4. If u is a marked node, then do the following. (a)If u’ is not a child (descendant) of u, create a link from u to u’, called a left-sibling link and denoted as left-sibling(u) = u’. Graph deduction

Critical node recognition Algorithm critical-node-recognition(T) (continued) (b)If u’ is a child (descendant) of u, we will first create a link from u’ to u, called a parent link and denoted as parent(u’) = u. Then, we will go along a left-sibling chain starting from u’ until we meet a node u’’ which is not a child (descendant) of u. For each encountered node w except u’’, set parent(w)  u. Set left- sibling(u)  u’’. Remove left-sibling(w) for each child w of u. 5.If u is a non-marked node, then do the following. (c)If u’ is not a child (descendant) of u, no node will be created. (d)If u’ is a child (descendant) of u, we will go along a left-sibling chain starting from u’ until we meet a node u’’ which is not a child (descendant) of u. If the number of the nodes encountered during the chain navigation (not including u’’) is more than 1, we will create new node in G c and do the same operation as (4.b). Otherwise, no node is created. Graph deduction

Sample trace … u’’ u u’’ is not a child of u. u’ link to the left sibling … u’’ u u’ ddfdf g (c)(b)(a) e h g f d a (f) df g e df g e h (e) (d) Graph deduction i j k d r h e f c b a g

Tree deduction Let T be a spanning tree of G. Denote by T r a reduction of T obtained by removing all those nodes v  V c  V end. Deleting a node v entails connecting v’s parent to each of v’s children. So, removing a node in this way corresponds to the elimination of a tree edge. Example: T r obtained by removing the nodes b, r, i, and j one by one. (Note that none of them belongs to V c  V end. V c = {a, d, e, f, g, h} and V end = {c, d, e, g, k}.) Graph deduction a c k d h e f g Tr:Tr: i j k d r h e f c b a g

Critical subgraph Definition 4 (critical subgraph) Let G(V, E) be a DAG. Let T be a spanning tree of G. The critical subgraph G c of G with respect to T is graph with node set V(T r ) and edge set E(T r )  E cross. a c k d h e f g Gc:Gc: The reachability of any two nodes can be checked by using T or G c.

Graph deduction i [3, 4) j [11, 12) [9, 12) [5, 9) k d r [8, 9) h e f c b a [10, 11) [6, 9) [7, 8) [4, 5) [2, 4) [1, 5) [0, 12) g r f ?  r d ?  a c k d h e f g Gc:Gc:

Graph deduction Evaluation of reachablity queries Definition 5 (anchor nodes) Let G be a DAG and T a spanning tree of G. Let v be a node in T. Denote by C v all the critical nodes in T[v]. We associate two anchor nodes with v as below. i)A node u  C v is called an anchor node (of the first kind) of v if u is closest to v. u is denoted v*. ii)A node w is called an anchor node (of the second kind) of v if it is the lowest ancestor of v (in T), which has a cross incoming edge. w is denoted v**. Example. r* = e. It is because node e is critical and closest to node r in T[r]. But r** does not exist since it does not have an ancestor which has a cross incoming edge. e* = e** = e. That is, both the first and second kinds of anchor nodes of e are e itself.

Example. r* = e. It is because node e is critical and closest to node r in T[r]. But r** does not exist since it does not have an ancestor which has a cross incoming edge. e* = e** = e. That is, both the first and second kinds of anchor nodes of e are e itself. Graph deduction Evaluation of reachablity queries i j k d r h e f c b a g f** = e

Graph deduction Evaluation of reachablity queries Definition 6 (non-tree labels) Let v be a node in G. The non-tree label of v is a pair, where -x = v* if v* exists. If v* does not exists, let x be the special symbol “-”. -y = v** if v** exists. If v** does not exist, let y be “-”.

Graph deduction Example a b c k d r h e f g i j [5, 9) [4, 5) r* = e d** = d d is reachable from e through a path in G c. So d is reachable from r. r d ?  a c k d h e f g Gc:Gc:

a c k dhe f g a e f d k h g c acdefghkacdefghk (1, 1) (2, 3) (1, 4) (1, 2) (1, 3) (2, 2) (2, 1) (1, 5) Index(v) Graph deduction Evaluation of reachablity queries Reachability checking over G c : Decompose G c into chains:

Graph deduction Evaluation of reachablity queries Reachability checking over G: b c k 1st chain r e f 2nd chain d (2, 1) (1, 2)(3, 2)(4, -)(5, -) (2, 2) (1, 2)(3, 2)(4, -)(5, -) (2, 3) (1, 3)(3, -)(4, -)(5, -) (2, 4) (1, 3)(3, -)(4, -)(5, -) a i 4th chain (4, 1) (1, 1)(2, 1)(3, 1)(5, 1) (4, 2) (1, -)(2, -)(3, -)(5, -) h g 3rd chain (3, 1) (1, 2)(2, 2)(4, 2)(5, 1) (3, 2) (1, 2)(2, -)(4, -)(5, -) j 5th chain (5, 1) (1, -)(2, -)(3, -)(4, -) Index(v) (1, 1) (2, 4)(3, -)(4, -)(5, -) (1, 2) (2, -)(3, -)(4, -)(5, -) (1, 3) (2, -)(3, -)(4, -)(5, -) abcdefghijkrabcdefghijkr

From the above discussion, we can see that G c is much smaller than G. However, it can be observed that G c itself can be further re­duced, leading to a further reduction of space requirement. Recursive graph decomposition Recursive deduction Using the above method, we can find a series of graph reductions: G 0 = G, G 1,..., G k,(k  1) where G i is a critical subgraph of G i-1 (i = 1,..., k). In order to construct such critical subgraphs, a series of spanning trees have to be established: T 0, T 1,..., T k-1, where each T i is a spanning tree of G i (i = 0,..., k - 1), used to construct G i+1.

To check reachability efficiently, each node v in G will be asssociated with two sequences: an interval sequence and an anchor node sequence: 1)[ 0 (v),  0 (v)),..., [ j (v),  j (v)) (j  k - 1) where each [ i (v),  i (v)) is an interval generated by labeling Ti; 2)(x 0 (v), y 0 (v)),..., (x j (v), y j (v)), where each is a pointer to an anchor node of the first kind (a node appearing in G i+1 ) while each a pointer to an anchor node of the second kind (also, a node in G i+1 ). Recursive graph decomposition Recursive deduction

Recursive graph decomposition Recursive deduction G 0 : U [ 0 (u),  0 (u)) v [ 0 (v),  0 (v)) w [ 0 (w),  0 (w)) z [ 0 (z),  0 (z)) G 1 : U [ 1 (u),  1 (u)) v [ 1 (v),  1 (v)) w [ 1 (w),  1 (w)) z [ 1 (z),  1 (z)) G j : U [ j (u),  j (u)) v [ j (v),  j (v)) w [ j (w),  j (w)) z [ j (z),  j (z)) * ** * *

Recursive graph decomposition Recursive deduction Example g c k f a h e d i j k d r h e f c b a g G0:G0:G1:G1:G2:G2: ck ckck (1, 1) (1, 2) Index(v)

Recursive graph decomposition Recursive deduction abcdefghijkrabcdefghijkr [0, 12)[0, 8) [1, 5) [2, 4)[7, 8) [4, 5)[4, 6) [6, 9)[2, 8] [7, 8)[3, 6) [8, 9)[6, 8) [9, 12)[1, 8) [[10, 11) [11, 12) [3, 4)[5, 6) [5, 9) Example Interval sequence:Anchor node sequence:

Recursive graph decomposition Evaluation of reachability queries abcdefghijkrabcdefghijkr Anchor node sequence: a {1, *} c {1, **} {2, *} g {1, *} {1, **} e {1, *} {1, **} h {1, *} f d {1, **} b k {2, **} r {2, *} {1, **} {1, *} {2, *} g k ?k ?  [ 0 (g),  0 (g)) = [8, 9);  0 (k),  0 (k)) = [3, 4); [ 1 (g),  1 (g)) = [6, 8);  1 (k),  1 (k)) = [5, 6). In G 2, k is reachable from c, which shows that k is reachable from g.

Summary Transitive closure compression based on graph deduction - DAG decomposition: a spanning and a subgraph -Reachability checking: tree labels and reachability of anchor nodes in the subgraph Transitive closure compression based on recursive graph deduction -DAG decomposition: a series of spanning trees and a subgraph -Reachability checking: interval sequences and anchor node sequences

Summary Computational complexities - labeling time: O(ke + b k 1.5 n k ) -space overhead: O(kn + b k n k ) -query time: O(k) where n – number of the nodes of G, e - number of the nodes of G, n k – number of the nodes of G k, and b k – width of G k.

Thank you.