Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bo Hong Electrical and Computer Engineering Department Drexel University

Similar presentations


Presentation on theme: "Bo Hong Electrical and Computer Engineering Department Drexel University"— Presentation transcript:

1 Bo Hong Electrical and Computer Engineering Department Drexel University bohong@coe.drexel.edu http://www.ece.drexel.edu/faculty/bohong

2 2Bo Hong Stadbc 8 6 4 5 4 6 3 7 4 Find: maximum flow from s to t Subject to: edge capacity constraints zero net-flow for u є V- {s,t}

3  Sequential Algorithms Augmenting Path ○ Ford-Fulkerson, pseudo-polynomial ○ Edmonds and Karp, O(|V|∙|E| 2 ) ○ Dinitz, O(|V| 2 ∙|E|) Preflow Push ○ Karzanov, O(|V| 3 ) Push-Relabel ○ Goldberg, O(|V| 2 ∙|E|), with dynamic trees O(|V| ∙ |E| ∙ log(|V| 2 ∙|E|) )  Parallel Algorithms Shiloach, etc. O(|V| 2 ∙log|V| ) with |V|-processor PRAM Goldberg, O(|V| 2 ∙log|V| ) with |V|-processor PRAM Anderson, etc. Global relabeling Bader, etc. Gap relabeling 3 Bo Hong

4 4 S t a d b c 3 1 03 Excessive flow: the net flow into a vertex e.g. e(c) = 5 S t a d b c Every vertex has an integer valued height e.g. h(c) = 2

5 5Bo Hong S t a d b c S ta d bc Lift: applicable when e(c)>0 and all c f (c,x) > 0 implies h(x) ≥ h(c) Actions: Lock v v = lowest such vertex x h(c) = h(v) + 1 Unlock v Push: applicable when e(a)>0 and there exists c f (a,v) > 0 and h(v)=h(a)-1 Actions: Lock a and v a->v still pushable? d = min( e(a), c f (a,v) ) e(a) = e(a) – d e(v) = e(v) + d c f (a,v) = c f (a,v) – d c f (v,a) = c f (v,a) + d Unlock a and v

6 P2 Lock x ← x+1 Unlock P1 Lock x ← x+1 Unlock 6Bo Hong s l l l l l l l l l l l l l l l l n n n n n n n n n n n n n n n n u u u u u u u u u u u u u u u u Number of processors T Lock acquisition time ( us ) 111315 0 2 4 6 8 10 12 14 16 9753 Ideal Actual But locks are expensive Locks protect shared accesses time Read x Increase 1 Update x Read x Increase 1 Update x

7  SMP computer with multiple processors sharing the memory Multi-processor systems Multi-core systems  Supports atomic ‘fetch-and-add’ instruction  Supports sequential consistency Bo Hong 7 P1 x ← x+c 1 … x ← x+c 2 P2 x ← x+c 3 … x ← x+c 4 Eventual result x ← x+c 1 +c 2 +c 3 +c 4 not matter how exactly the instructions were interleaved.

8 8Bo Hong S t a d b c S t a d b c Lift: applicable when e(c)>0 and all c f (c,x) > 0 implies h(x) ≥ h(c) Actions: v = lowest such vertex x h(c) = h(v) + 1 Push: applicable when e(a)>0 and there exists c f (a,x) > 0 and h(x)<h(a) Actions: v = lowest such vertex x d = min( e(a), c f (a,v) ) e(a) = e(a) – d e(v) = e(v) + d c f (a,v) = c f (a,v) – d c f (v,a) = c f (v,a) + d

9 9Bo Hong Initialize h(u), e(u), and f(u,v) h(s) = |V| h(u) = 0 for u є V – {s} f(s,u) = c(s,u) e(u) = c(s,u) f(u,v) = 0, otherwise While there exists applicable push or lift operations execute the push or lift operations asynchronously S t a d b c

10 10Bo Hong while e(u) > 0 e’ = e(u) h’ = ∞ for each (u,v) s.t. c f (u, v) > 0 if h(v) < h’ h’ = h(v) v’ = v if h(u) > h’ d = min ( e’, c f (u, v’) ) c f (u, v’) = c f (u, v’) + d c f (v’, u) = c f (v’, u) – d e(u) = e(u) – d e(v’) = e(v’) + d else h(u) = h’ + 1 while e(u) > 0 e’ = e(u) h’ = ∞ for each (u,v) s.t. c f (u, v) > 0 if h(v) < h’ h’ = h(v) v’ = v if h(u) > h’ d = min ( e’, c f (u, v’) ) c f (u, v’) = c f (u, v’) + d c f (v’, u) = c f (v’, u) – d e(u) = e(u) – d e(v’) = e(v’) + d else h(u) = h’ + 1 while e(u) > 0 e’ = e(u) h’ = ∞ for each (u,v) s.t. c f (u, v) > 0 if h(v) < h’ h’ = h(v) v’ = v if h(u) > h’ d = min ( e’, c f (u, v’) ) c f (u, v’) = c f (u, v’) + d c f (v’, u) = c f (v’, u) – d e(u) = e(u) – d e(v’) = e(v’) + d else h(u) = h’ + 1 while e(u) > 0 e’ = e(u) h’ = ∞ for each (u,v) s.t. c f (u, v) > 0 if h(v) < h’ h’ = h(v) v’ = v if h(u) > h’ d = min ( e’, c f (u, v’) ) c f (u, v’) = c f (u, v’) + d c f (v’, u) = c f (v’, u) – d e(u) = e(u) – d e(v’) = e(v’) + d else h(u) = h’ + 1 P1P2

11 11Bo Hong while e(u) > 0 e’ = e(u) h’ = ∞ for each (u,v) s.t. c f (u, v) > 0 if h(v) < h’ h’ = h(v) v’ = v if h(u) > h’ d = min ( e’, c f (u, v’) ) c f (u, v’) = c f (u, v’) + d c f (v’, u) = c f (v’, u) – d e(u) = e(u) – d e(v’) = e(v’) + d else h(u) = h’ + 1 while e(u) > 0 e’ = e(u) h’ = ∞ for each (u,v) s.t. c f (u, v) > 0 if h(v) < h’ h’ = h(v) v’ = v if h(u) > h’ d = min ( e’, c f (u, v’) ) c f (u, v’) = c f (u, v’) + d c f (v’, u) = c f (v’, u) – d e(u) = e(u) – d e(v’) = e(v’) + d else h(u) = h’ + 1 P1P2 time or

12 As long as c f (u,v) and e(u) are updated atomically, we always have h(u) ≤ h(v) + 1 for any c f (u,v) > 0, no matter how the threads are interleaved. 12 Bo Hong

13  If any e(u) > 0, then the algorithm will not terminate Property of the push and lift operations  If the algorithm terminates, then there is no path from s to t in the residual graph Proof by contradiction, if such path exists, then the invariant property of function f has to be broken  If the algorithm terminates, it finds a maximum flow Termination implies all e(u)=0, meaning this is a feasible flow. No path from s to t, by max-flow min-cut theorem, it has to be a maximum flow 13 Bo Hong

14  For any u s.t. e(u) > 0, there exists a path from u to s in the residual graph Property of network flow  The height of any vertex is less than 2|V| - 1 The longest path can have at most |V| vertices  The total number of lift operations is bound by 2|V| 2 -|V| Bound by the height of vertices  The total number of saturated pushes is bound by (2|V|-1)∙|E| Bound by the total number of lift operations  The total number of un-saturated pushes is bound by 4|V| 2 ∙|E| Bound by the number of lift and saturated pushes  Therefore the algorithm terminates with O(|V| 2 ∙|E|) operations 14 Bo Hong

15  The algorithm terminates when e(u) = 0 for all u є V – {s,t} e(u) = 0 at a single thread is insufficient to terminate the thread  An elegant solution: The net flow out of source s decreases monotonically The net flow into sink t increases monotonically When the two values become equal, we must have e(u) = 0 for all u є V – {s,t}, a necessary and sufficient termination condition. 15 Bo Hong

16  Execution results on 2-way SMP with 3.2GHz Intel Xeon Processors  4-thread results obtained when hyper-threading was enabled Bo Hong 16 Comparison Against Classical Lock-Based Algorithm Scalability of the Lock-Free Algorithm

17  Developed a lock-free multi-threaded algorithm for the max-flow problem having the same complexity bound as existing parallel algorithms eliminated lock usages thereby improving thread-level parallelism 20% improvement over existing lock-based parallel algorithms  Results indicate the effectiveness of algorithmic method in reducing synchronization overheads  Future work Load balancing across the threads: vertex to thread assignment, static or dynamic or hybrid? Optimize cache usages Reduce the number of operations via global and gap relabling What if edge capacities are floating-point? Bo Hong 17


Download ppt "Bo Hong Electrical and Computer Engineering Department Drexel University"

Similar presentations


Ads by Google