Online Topological Ordering Siddhartha Sen, COS 518 11/20/2007.

Slides:



Advertisements
Similar presentations
Chapter 5: Tree Constructions
Advertisements

Algorithms (and Datastructures) Lecture 3 MAS 714 part 2 Hartmut Klauck.
Single Source Shortest Paths
Incremental Linear Programming Linear programming involves finding a solution to the constraints, one that maximizes the given linear function of variables.
Lecture 7 March 1, 11 discuss HW 2, Problem 3
Counting the bits Analysis of Algorithms Will it run on a larger problem? When will it fail?
Dynamic Planar Convex Hull Operations in Near- Logarithmic Amortized Time TIMOTHY M. CHAN.
Online Scheduling with Known Arrival Times Nicholas G Hall (Ohio State University) Marc E Posner (Ohio State University) Chris N Potts (University of Southampton)
Discussion #33 Adjacency Matrices. Topics Adjacency matrix for a directed graph Reachability Algorithmic Complexity and Correctness –Big Oh –Proofs of.
Lecture 16: DFS, DAG, and Strongly Connected Components Shang-Hua Teng.
Outline. Theorem For the two processor network, Bit C(Leader) = Bit C(MaxF) = 2[log 2 ((M + 2)/3.5)] and Bit C t (Leader) = Bit C t (MaxF) = 2[log 2 ((M.
© The McGraw-Hill Companies, Inc., Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
Computability and Complexity 19-1 Computability and Complexity Andrei Bulatov Non-Deterministic Space.
Jim Anderson Comp 122, Fall 2003 Single-source SPs - 1 Chapter 24: Single-Source Shortest Paths Given: A single source vertex in a weighted, directed graph.
2 -1 Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
Shortest Paths Definitions Single Source Algorithms –Bellman Ford –DAG shortest path algorithm –Dijkstra All Pairs Algorithms –Using Single Source Algorithms.
1 Data Structures DFS, Topological Sort Dana Shapira.
Incremental topological ordering with Bernhard Haeupler, Sid Sen Two problems on digraphs: Cycle detection Topological ordering: a total order O such that.
Connected Components, Directed Graphs, Topological Sort COMP171.
Graph COMP171 Fall Graph / Slide 2 Graphs * Extremely useful tool in modeling problems * Consist of: n Vertices n Edges D E A C F B Vertex Edge.
Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)
Shortest Paths Definitions Single Source Algorithms
2 -1 Chapter 2 The Complexity of Algorithms and the Lower Bounds of Problems.
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
External Memory Algorithms Kamesh Munagala. External Memory Model Aggrawal and Vitter, 1988.
The Complexity of Algorithms and the Lower Bounds of Problems
CS420 lecture eight Greedy Algorithms. Going from A to G Starting with a full tank, we can drive 350 miles before we need to gas up, minimize the number.
Review of Graphs A graph is composed of edges E and vertices V that link the nodes together. A graph G is often denoted G=(V,E) where V is the set of vertices.
Important Problem Types and Fundamental Data Structures
Induction and recursion
Graph Algorithms Using Depth First Search Prepared by John Reif, Ph.D. Distinguished Professor of Computer Science Duke University Analysis of Algorithms.
Randomized Algorithms - Treaps
15.082J, 6.855J, and ESD.78J September 21, 2010 Eulerian Walks Flow Decomposition and Transformations.
Graphs – Shortest Path (Weighted Graph) ORD DFW SFO LAX
Minimal Spanning Trees What is a minimal spanning tree (MST) and how to find one.
Jim Anderson Comp 122, Fall 2003 Single-source SPs - 1 Chapter 24: Single-Source Shortest Paths Given: A single source vertex in a weighted, directed graph.
Chapter 9 – Graphs A graph G=(V,E) – vertices and edges
C++ Programming: Program Design Including Data Structures, Fourth Edition Chapter 19: Searching and Sorting Algorithms.
Jessie Zhao Course page: 1.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
An Optimal Cache-Oblivious Priority Queue and Its Applications in Graph Algorithms By Arge, Bender, Demaine, Holland-Minkley, Munro Presented by Adam Sheffer.
Chapter 24: Single-Source Shortest Paths Given: A single source vertex in a weighted, directed graph. Want to compute a shortest path for each possible.
15.082J & 6.855J & ESD.78J September 23, 2010 Dijkstra’s Algorithm for the Shortest Path Problem.
1 Combinatorial Algorithms Local Search. A local search algorithm starts with an arbitrary feasible solution to the problem, and then check if some small,
Graphs. Definitions A graph is two sets. A graph is two sets. –A set of nodes or vertices V –A set of edges E Edges connect nodes. Edges connect nodes.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
Lecture 11 Algorithm Analysis Arne Kutzner Hanyang University / Seoul Korea.
Chapter 18: Searching and Sorting Algorithms. Objectives In this chapter, you will: Learn the various search algorithms Implement sequential and binary.
Jan Topological Order and SCC Edge classification Topological order Recognition of strongly connected components.
1 Directed Graphs Chapter 8. 2 Objectives You will be able to: Say what a directed graph is. Describe two ways to represent a directed graph: Adjacency.
A dynamic algorithm for topologically sorting directed acyclic graphs David J. Pearce and Paul H.J. Kelly Imperial College, London, UK
Great Theoretical Ideas in Computer Science for Some.
COMPSCI 102 Introduction to Discrete Mathematics.
Computational Geometry
CSE 589 Applied Algorithms Spring 1999 Prim’s Algorithm for MST Load Balance Spanning Tree Hamiltonian Path.
Lecture 20. Graphs and network models 1. Recap Binary search tree is a special binary tree which is designed to make the search of elements or keys in.
Advanced Algorithms Analysis and Design By Dr. Nazir Ahmad Zafar Dr Nazir A. Zafar Advanced Algorithms Analysis and Design.
Data Structures and Algorithm Analysis Lecture 5
CSE 2331/5331 Topic 9: Basic Graph Alg.
Graph Algorithms Using Depth First Search
Enumerating Distances Using Spanners of Bounded Degree
Lecture 13 Algorithm Analysis
Lecture 13 Algorithm Analysis
Lecture 13 Algorithm Analysis
Lecture 13 Algorithm Analysis
Discrete Mathematics for Computer Science
Presentation transcript:

Online Topological Ordering Siddhartha Sen, COS /20/2007

Outline Problem statement and motivation Prior work (summary) Result by Ajwani et al. – Algorithm – Correctness – Running time – Implementation Comparison to prior work – Incremental complexity analysis – Practical implications Open problems Breaking news

Problem statement Offline or static version (STO) – Given a DAG G = (V,E) (with n =  V  and m =  E  ), find a linear ordering T of its nodes such that for all directed paths from x є V to y є V (x ≠ y), T(x) < T(y), where T:V  [1..n] is a bijective mapping Online version (DTO) – Edges of G are not known before hand, but are revealed one by one – Each time an edge is added to the graph, T must be updated

Problem statement a a b b c c d d u u v v affected region u  v invalidates topological order

Motivation Traditional applications – Online cycle detection in pointer analysis – Incremental evaluation of computational circuits – Semantic checking by structure-based editors – Maintaining dependences between modules during compilation Other applications – Scheduling jobs in grid computing systems, where dependences arise between the subtasks of a job

Offline problem: per edge – for m edges Alpern et al. (AHRSZ, ‘90): per edge Marchetti-Spaccamela et al. (MNR, ‘96): per edge (amortized) – for m edges Pearce and Kelly (PK, ’04): per edge Katriel and Bodlaender (KB, ’05):. per edge (amortized) – for m edges Prior work (summary) incremental complexity analysis

Ajwani et al. (AFM) Contributions – Solves DTO in O(n 2.75 ) time, regardless of the number of edges m inserted – Uses generic bucket data structure with efficient support for: insert, delete, collect-all Analysis based on tunable parameter t = max number of nodes in each bucket Contributions – Poor discussion of motivating applications – No insight into how algorithm works or achieves running time – No intuitive comparison with prior algorithms (AHRSZ, MNR, etc.)

Notation d(u,v) denotes  T(u) – T(v)  u < v is shorthand for T(u) < T(v) u  v denotes an edge from u to v u  v means v is reachable from u

Algorithm AFM

a a b b c c d d u u v v Call: Set A: Set B: Recursion depth: R EORDER (u,v) { v, a } { c, u } u  v invalidates topological order

Algorithm AFM a a b b c c d d u u v v Call: Set A: Set B: Recursion depth: R EORDER (c,a) Ø

Algorithm AFM c c b b a a d d u u v v Call: Set A: Set B: Recursion depth: R EORDER (c,a) Ø Swap!

Algorithm AFM c c b b a a d d u u v v Call: Set A: Set B: Recursion depth: R EORDER (u,v) { v, a } { c, u }

Algorithm AFM c c b b a a d d u u v v Call: Set A: Set B: Recursion depth: R EORDER (u,a) { a, b } { u }

Algorithm AFM c c b b a a d d u u v v Call: Set A: Set B: Recursion depth: R EORDER (u,b) Ø

Algorithm AFM c c u u a a d d b b v v Call: Set A: Set B: Recursion depth: R EORDER (u,b) Ø Swap!

Algorithm AFM Call: Set A: Set B: Recursion depth: R EORDER (u,a) { a, b } { u } c c u u a a d d b b v v

Algorithm AFM Call: Set A: Set B: Recursion depth: R EORDER (u,a) Ø c c u u a a d d b b v v

Algorithm AFM Call: Set A: Set B: Recursion depth: R EORDER (u,a) Ø c c a a u u d d b b v v Swap!

Algorithm AFM Call: Set A: Set B: Recursion depth: R EORDER (u,v) { v, a } { c, u } c c a a u u d d b b v v

Algorithm AFM Call: Set A: Set B: Recursion depth: R EORDER (c,v) Ø c c a a u u d d b b v v

Algorithm AFM Call: Set A: Set B: Recursion depth: R EORDER (c,v) Ø v v a a u u d d b b c c Swap!

Algorithm AFM Call: Set A: Set B: Recursion depth: v v a a u u d d b b c c R EORDER (u,v) { v, a } { c, u }

Algorithm AFM Call: Set A: Set B: Recursion depth: R EORDER (u,v) Ø v v a a u u d d b b c c

Algorithm AFM Call: Set A: Set B: Recursion depth: R EORDER (u,v) Ø u u a a v v d d b b c c Swap!

Algorithm AFM Call: Set A: Set B: Recursion depth: R EORDER (u,v) Ø u u a a v v d d b b c c Done!

Data structures Store T and T -1 as arrays – O(1) lookup for topological order and inverse Graph stored as array of vertices, where each vertex has two adjacency lists (for incoming/outgoing edges) Each adjacency list stored as array of buckets – Each bucket contains at most t nodes for a fixed t – i-th bucket of node u contains all adjacent nodes v with i  t  d(u,v)  (i + 1)  t

Data structures A bucket is any data structure with efficient support for the following operations: – Insert: insert an element into a given bucket – Delete: given an element and a bucket, delete the element from the bucket (if found; otherwise, return 0) – Collect-all: copy all elements from a given bucket to some vector Analysis assumes a generic bucket data structure and counts the number of bucket operations – Later, we will consider different implementations of the data structure and corresponding running times/space usage

Correctness Theorem 1. Algorithm AFM returns a valid topological order after each edge insertion. Lemma 1. Given a DAG G and a valid topological order, if u  v and u  v, then all subsequent calls to REORDER will maintain u  v. Lemma 2. Given a DAG G with v  y and x  u, a call of REORDER(u,v) will ensure that x < y. Theorem 2. The algorithm detects a cycle iff there is a cycle in the given edge sequence.

Correctness Theorem 1. Algorithm AFM returns a valid topological order after each edge insertion. Proof: use Lemmas 1 and 2. – For graph with no edges, any ordering is a topological ordering – Need to show that I NSERT (u,v) maintains correct topological order of G’ = G ∪ {(u,v)} If u  v, this is trivial; otherwise, Show that x  y for all nodes x,y of G’ with x  y. If there was a path x  y in G, Lemma 1 gives x  y. Otherwise, x  y was introduced to G’ by (u,v), and Lemma 2 gives x  y in G’ since there is x  u  v  y in G’.

Correctness Lemma 1. Given a DAG G and a valid topological order, if u  v and u  v, then all subsequent calls to R EORDER will maintain u  v. Proof: by contradiction – Consider the first call of R EORDER that leads to u  v. Either this led to swapping u and w with w  v or swapping w and v with w  u. In the first case: Call was R EORDER (w,u) and A = Ø However,  x  A for which u  x  v (since v is between u and w), leading to a contradiction

Correctness Lemma 2. Given a DAG G with v  y and x  u, a call of R EORDER (u,v) will ensure that x < y. Proof: by induction on recursion depth of R EORDER (u,v) – For leaf nodes, A = B = Ø. If x  y before, Lemma 1 ensures x  y will continue; otherwise, x = u and y = v and swapping gives x  y. – Assume lemma is true up to a certain tree level (show this implies higher levels). If A  Ø, there is a v’ such that v  v’  y, otherwise v’ = v = y. If B  Ø, there is a u’ such that x  u’  u, otherwise u’ = u = x. Hence v’  y  x  u’. For loops will call R EORDER (u’,v’), which ensures x  y by inductive hypothesis Lemma 1 ensures further calls to R EORDER maintain x  y

Correctness Theorem 2. The algorithm detects a cycle iff there is a cycle in the given edge sequence. Proof:  – Within a call to Insert(u,v), there are paths v  v’ and u’  u for each recursive call to R EORDER (u’,v’) Trivial for first call and follows by definition of A and B for subsequent calls If algorithm detects a cycle in line 1, then we have v  v’ = u’  u and adding u  v completes the cycle

Correctness Theorem 2. The algorithm detects a cycle iff there is a cycle in the given edge sequence. Proof: , by induction on number of nodes in path v  u – Consider edge (u,v) of the cycle v  u  v inserted last. Since v  u before inserting this edge, Theorem 1 states that v  u, so R EORDER (u,v) will be called. Call of R EORDER (u’,v’) with u’ = v’ or v’  u’ clearly reports a cycle Consider path v  x  y  u of length k  2 and call to R EORDER (u,v). Since v  x  y  u before the call, x  A and y  B, so R EORDER (y,x) will be called. y  x has k – 2 nodes in the path, so call to Reorder will detect the cycle (by the inductive hypothesis).

Algorithm AFM

Running time Theorem 3. Online topological ordering can be computed using O(n 3.5 /t) bucket inserts and deletes, O(n 3 /t) bucket collect-all operations collecting O(n 2 t) elements, and O(n n 2 t) operations for sorting. Lemma 4. R EORDER is called O(n 2 ) times. Lemma 5. The summation of  A  +  B  over all calls of R EORDER is O(n 2 ). Lemma 6. Calculating the sorted sets A and B over all calls of R EORDER can be done by O(n 3 /t) bucket collect- all operations touching a total of O(n 2 t) elements and O(n n 2 t) operations for sorting these elements. Lemma 9. Updating the data structure over all calls of R EORDER requires O(n 3.5 /t) bucket inserts and deletes.

Running time Theorem 3. Online topological ordering can be computed using O(n 3.5 /t) bucket inserts and deletes, O(n 3 /t) bucket collect-all operations collecting O(n 2 t) elements, and O(n n 2 t) operations for sorting. Proof: – Use lemmas 4, 6, and 9. Additionally, show that merging sets A and B (lines 6-7 in the algorithm) takes O(n 2 ) time Merging takes O(  A  +  B  ), which is O(n 2 ) over all calls to R EORDER by Lemma 5; finding vertices in B that exceed the chosen v’ takes O(the number of those vertices), which is also the number of recursive calls to R EORDER made. Lemma 4 says the latter value is O(n 2 ).

Running time Lemma 4. R EORDER is called O(n 2 ) times. Proof: – Consider the first time R EORDER (u,v) is called. If A = B = Ø, then u and v are swapped. Otherwise, R EORDER (u’,v’) is called recursrivelly for all v’  {v} ∪ A and u’  B ∪ {v} with u’  v’. The order in which recursive calls are made and the fact that R EORDER is local (only touches the affected region) ensures that R EORDER (u,v) is not called except as the last recursive call. In this second call to R EORDER (u,v), A = B = Ø Consider all v’  A and v’  B from the first call of R EORDER (u,v). R EORDER (u,v’) and R EORDER (u’,v) must have been called by the for loops before the second call to R EORDER (u,v). Therefore, u  v’ and u’  v for all v’  A and v’  B, so u and v are swapped during the second call. R EORDER (u,v) will not be called again because u  v.

Running time Lemma 9. Updating the data structure over all calls of REORDER requires O(n 3.5 /t) bucket inserts and deletes. Proof: use LP – Data structure requires O(d(u,v)n/t) bucket inserts and deletes to swap two nodes u and v. Need to update adjacency lists of u and v and all w adjacent to u and/or v. If d(u,v)  t, build from scratch in O(n). Otherwise, can show that at most d(u,v) nodes need to transfer between any pair of consecutive buckets. This yields a bound of O(d(u,v)n/t). – Each node pair is swapped at most once (Lemma 7), so summing up over all calls of REORDER(u,v) where u and v are swapped, we need O(  d(u,v)n/t) bucket inserts and deletes.  d(u,v) = O(n 2.5 ) by Lemma 8, so the result follows.

Running time How to prove  d(u,v) = O(n 2.5 )? Use an LP: – Let T* denote the final topological ordering and – Model some linear constraints on X(i,j): 0  X(i,j)  n for all i,j  [1..n] X(i,j) = 0 for all j  i  j  i X(i,j) –  j<i X(j,i)  n for all 1  i  n – Over insertion of all edges, a node’s net movement right and left in the topological ordering must be less than n if and when R EORDER (u,v) leads to a swapping otherwise

Yields the following LP: And it’s dual: Running time

Which yields the following feasible solution: This solution has a value of:

Implementation of data structure Balanced binary tree gives O(1 + log  ) time insert and delete and O(1 +  ) collect-all – Total time is O(n 2 t + n 3.5 log n/t) by Theorem 3. Setting t = n 0.75 (log n) 1/2, we get a total time of O(n 2.75 (log n) 1/2 ) and O(n 2 ) space n-bit array gives O(1) insert and delete and O(total output size + total # of deletes) collect-all operation – Total time is O(n 2 t + n 3.5 /t). Setting t = n 0.75 gives O(n 2.75 ) time and O(n 2.25 ) space for O(n 2 /t) buckets Uniform hashing is similar to n-bit array – O(n 2.75 ) expected time and O(n 2 ) space

Empirical comparison Compared against PK, MNR, and AHRSZ for the following “hard-case” graph:

Empirical comparison

Comparison to prior work No insight provided by Ajwani et al. Pearce and Kelly compare PK, AHRSZ, and MNR using incremental complexity analysis – In dynamic problems, typically no fixed input captures the minimal amount of work to be performed – Use complexity analysis based on input size: measure work in terms of a paramter  representing the (minimal) change in input and output required For DTO problem, input is current DAG and topological order, output after an edge insertion is updated DAG and (any) valid ordering – Algorithm is bounded if time complexity can be expressed only in terms of  ; otherwise, it is unbounded

Comparison to prior work Runtime comparisons: – AHRSZ is bounded by  K min , the minimal cover of vertices that are incorrectly ordered after an edge insertion, plus adjacent edges – PK is bounded by  uv , the set of vertices in the affected region which reach u or are reachable from v, plus adjacent edges; PK is worst-case optimal wrt number of vertices reordered – MNR takes  (  uv F  + + AR uv ) in the incremental complexity model, where AR uv is the set of vertices in the affected region –  K min    uv    AR uv , so AHRSZ is strictly better than PK, but PK and MNR are more difficult to compare (former expected to outperform the latter on sparse graphs) – KB analyzes a variant of AHRSZ – AFM appears to improve the bound on the time to insert m edges for AHRSZ

Comparison to prior work Intuitive comparison – AHRSZ performs simultaneous forward and backward searches from u and v until the two frontiers meet; nodes with incorrect priorities are placed in a set and corrected using DFS’s in this set – MNR does a similar DFS to discover incorrect priorities, but visits all nodes in the affected region during reassignment – PK is similar to MNR but reassigns priorities using only positions previously held by members of  uv – KB and AFM appear to be improvements in the runtime analysis of variants of AHRSZ

Comparison to prior work Practical implications – PK and MNR use simpler data structures (arrays) than AHRSZ (priority queues and Diez and Sleator ordered list structure) – PK and MNR use simpler traversal algorithms than AHRSZ – PK visits fewer nodes during reassignments Experiments run by Pearce and Kelly – MNR performs poorly on sparse graphs, but is the most efficient on dense graphs – PK performs well on very sparse/dense graphs, but not so well in between – AHRSZ is relatively poor on sparse graphs, but has constant performance otherwise (competitive with the others)

Open problems Only lower bound in the problem is  (n log n) for inserting n – 1 edges, by Ramalingam and Reps; better lower bounds? Reduce the (wide) gap between best known lower and upper bounds Answer: does the definition of  for DTO need to include adjacent edges? Does the bounded complexity model capture the power of amortization? Include edge deletions in the analysis of AFM or any of the other algorithms Perform a theoretical and empirical analysis of a parallel version of AFM or any of the other algorithms

Breaking news Kavitha and Mathew improve the upper bound to O(min  n 2.5, (m + n log n)m 0.5  ) – Doesn’t appear to be anything wildly unique about their algorithm – Do a better job of keeping the sizes of sets  uv F and  uv B close to each other

Thank you