Download presentation

Presentation is loading. Please wait.

Published byVictoria Trible Modified about 1 year ago

1
Bo Hong Electrical and Computer Engineering Department Drexel University

2
2Bo Hong Stadbc Find: maximum flow from s to t Subject to: edge capacity constraints zero net-flow for u є V- {s,t}

3
Sequential Algorithms Augmenting Path ○ Ford-Fulkerson, pseudo-polynomial ○ Edmonds and Karp, O(|V|∙|E| 2 ) ○ Dinitz, O(|V| 2 ∙|E|) Preflow Push ○ Karzanov, O(|V| 3 ) Push-Relabel ○ Goldberg, O(|V| 2 ∙|E|), with dynamic trees O(|V| ∙ |E| ∙ log(|V| 2 ∙|E|) ) Parallel Algorithms Shiloach, etc. O(|V| 2 ∙log|V| ) with |V|-processor PRAM Goldberg, O(|V| 2 ∙log|V| ) with |V|-processor PRAM Anderson, etc. Global relabeling Bader, etc. Gap relabeling 3 Bo Hong

4
4 S t a d b c Excessive flow: the net flow into a vertex e.g. e(c) = 5 S t a d b c Every vertex has an integer valued height e.g. h(c) = 2

5
5Bo Hong S t a d b c S ta d bc Lift: applicable when e(c)>0 and all c f (c,x) > 0 implies h(x) ≥ h(c) Actions: Lock v v = lowest such vertex x h(c) = h(v) + 1 Unlock v Push: applicable when e(a)>0 and there exists c f (a,v) > 0 and h(v)=h(a)-1 Actions: Lock a and v a->v still pushable? d = min( e(a), c f (a,v) ) e(a) = e(a) – d e(v) = e(v) + d c f (a,v) = c f (a,v) – d c f (v,a) = c f (v,a) + d Unlock a and v

6
P2 Lock x ← x+1 Unlock P1 Lock x ← x+1 Unlock 6Bo Hong s l l l l l l l l l l l l l l l l n n n n n n n n n n n n n n n n u u u u u u u u u u u u u u u u Number of processors T Lock acquisition time ( us ) Ideal Actual But locks are expensive Locks protect shared accesses time Read x Increase 1 Update x Read x Increase 1 Update x

7
SMP computer with multiple processors sharing the memory Multi-processor systems Multi-core systems Supports atomic ‘fetch-and-add’ instruction Supports sequential consistency Bo Hong 7 P1 x ← x+c 1 … x ← x+c 2 P2 x ← x+c 3 … x ← x+c 4 Eventual result x ← x+c 1 +c 2 +c 3 +c 4 not matter how exactly the instructions were interleaved.

8
8Bo Hong S t a d b c S t a d b c Lift: applicable when e(c)>0 and all c f (c,x) > 0 implies h(x) ≥ h(c) Actions: v = lowest such vertex x h(c) = h(v) + 1 Push: applicable when e(a)>0 and there exists c f (a,x) > 0 and h(x)

9
9Bo Hong Initialize h(u), e(u), and f(u,v) h(s) = |V| h(u) = 0 for u є V – {s} f(s,u) = c(s,u) e(u) = c(s,u) f(u,v) = 0, otherwise While there exists applicable push or lift operations execute the push or lift operations asynchronously S t a d b c

10
10Bo Hong while e(u) > 0 e’ = e(u) h’ = ∞ for each (u,v) s.t. c f (u, v) > 0 if h(v) < h’ h’ = h(v) v’ = v if h(u) > h’ d = min ( e’, c f (u, v’) ) c f (u, v’) = c f (u, v’) + d c f (v’, u) = c f (v’, u) – d e(u) = e(u) – d e(v’) = e(v’) + d else h(u) = h’ + 1 while e(u) > 0 e’ = e(u) h’ = ∞ for each (u,v) s.t. c f (u, v) > 0 if h(v) < h’ h’ = h(v) v’ = v if h(u) > h’ d = min ( e’, c f (u, v’) ) c f (u, v’) = c f (u, v’) + d c f (v’, u) = c f (v’, u) – d e(u) = e(u) – d e(v’) = e(v’) + d else h(u) = h’ + 1 while e(u) > 0 e’ = e(u) h’ = ∞ for each (u,v) s.t. c f (u, v) > 0 if h(v) < h’ h’ = h(v) v’ = v if h(u) > h’ d = min ( e’, c f (u, v’) ) c f (u, v’) = c f (u, v’) + d c f (v’, u) = c f (v’, u) – d e(u) = e(u) – d e(v’) = e(v’) + d else h(u) = h’ + 1 while e(u) > 0 e’ = e(u) h’ = ∞ for each (u,v) s.t. c f (u, v) > 0 if h(v) < h’ h’ = h(v) v’ = v if h(u) > h’ d = min ( e’, c f (u, v’) ) c f (u, v’) = c f (u, v’) + d c f (v’, u) = c f (v’, u) – d e(u) = e(u) – d e(v’) = e(v’) + d else h(u) = h’ + 1 P1P2

11
11Bo Hong while e(u) > 0 e’ = e(u) h’ = ∞ for each (u,v) s.t. c f (u, v) > 0 if h(v) < h’ h’ = h(v) v’ = v if h(u) > h’ d = min ( e’, c f (u, v’) ) c f (u, v’) = c f (u, v’) + d c f (v’, u) = c f (v’, u) – d e(u) = e(u) – d e(v’) = e(v’) + d else h(u) = h’ + 1 while e(u) > 0 e’ = e(u) h’ = ∞ for each (u,v) s.t. c f (u, v) > 0 if h(v) < h’ h’ = h(v) v’ = v if h(u) > h’ d = min ( e’, c f (u, v’) ) c f (u, v’) = c f (u, v’) + d c f (v’, u) = c f (v’, u) – d e(u) = e(u) – d e(v’) = e(v’) + d else h(u) = h’ + 1 P1P2 time or

12
As long as c f (u,v) and e(u) are updated atomically, we always have h(u) ≤ h(v) + 1 for any c f (u,v) > 0, no matter how the threads are interleaved. 12 Bo Hong

13
If any e(u) > 0, then the algorithm will not terminate Property of the push and lift operations If the algorithm terminates, then there is no path from s to t in the residual graph Proof by contradiction, if such path exists, then the invariant property of function f has to be broken If the algorithm terminates, it finds a maximum flow Termination implies all e(u)=0, meaning this is a feasible flow. No path from s to t, by max-flow min-cut theorem, it has to be a maximum flow 13 Bo Hong

14
For any u s.t. e(u) > 0, there exists a path from u to s in the residual graph Property of network flow The height of any vertex is less than 2|V| - 1 The longest path can have at most |V| vertices The total number of lift operations is bound by 2|V| 2 -|V| Bound by the height of vertices The total number of saturated pushes is bound by (2|V|-1)∙|E| Bound by the total number of lift operations The total number of un-saturated pushes is bound by 4|V| 2 ∙|E| Bound by the number of lift and saturated pushes Therefore the algorithm terminates with O(|V| 2 ∙|E|) operations 14 Bo Hong

15
The algorithm terminates when e(u) = 0 for all u є V – {s,t} e(u) = 0 at a single thread is insufficient to terminate the thread An elegant solution: The net flow out of source s decreases monotonically The net flow into sink t increases monotonically When the two values become equal, we must have e(u) = 0 for all u є V – {s,t}, a necessary and sufficient termination condition. 15 Bo Hong

16
Execution results on 2-way SMP with 3.2GHz Intel Xeon Processors 4-thread results obtained when hyper-threading was enabled Bo Hong 16 Comparison Against Classical Lock-Based Algorithm Scalability of the Lock-Free Algorithm

17
Developed a lock-free multi-threaded algorithm for the max-flow problem having the same complexity bound as existing parallel algorithms eliminated lock usages thereby improving thread-level parallelism 20% improvement over existing lock-based parallel algorithms Results indicate the effectiveness of algorithmic method in reducing synchronization overheads Future work Load balancing across the threads: vertex to thread assignment, static or dynamic or hybrid? Optimize cache usages Reduce the number of operations via global and gap relabling What if edge capacities are floating-point? Bo Hong 17

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google