Download presentation
Presentation is loading. Please wait.
Published byMerry Green Modified over 3 years ago
1
Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads Aleksandar Prokopec Martin Odersky 1
2
Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads Aleksandar Prokopec Martin Odersky Irregular Data-Parallel 2
3
Uniform workload (0 until 10000000) reduce (+) 3
4
Uniform workload (0 until 10000000) reduce (+) sum = sum + x 4
5
Uniform workload (0 until 10000000) reduce (+) sum = sum + x … N cycles 5
6
Baseline workload for (0 until 10000000) {} … N cycles 6
7
Irregular workload 7
8
N cycles 8
9
Irregular workload for { x <- 0 until width y <- 0 until height } image(x, y) = compute(x, y) N cycles 9
10
Irregular workload for { x <- 0 until width y <- 0 until height } image(x, y) = compute(x, y) image(x, y) = compute(x, y) N cycles 10
11
Workload function workload(n) – work spent on element n after the data-parallel operation completed 11
12
Workload function Could be… Runtime value dependent for { x <- 0 until width y <- 0 until height } img(x, y) = compute(x, y) workload(n) – work spent on element n after the data-parallel operation completed 12
13
Workload function Could be… Execution-schedule dependent for (n <- nodes) n.neighbours += new Node workload(n) – work spent on element n after the data-parallel operation completed 13
14
Workload function Could be… Totally random for ((x, y) <- img.indices) img(x, y) = sample( x + random(), y + random() ) workload(n) – work spent on element n after the data-parallel operation completed 14
15
Data-parallel scheduler Assign loop elements to workers without knowledge about the workload function. 15
16
Data-parallel scheduler 1. Linear speedup for the baseline workload Assign loop elements to workers without knowledge about the workload function. 16
17
Data-parallel scheduler 1. Linear speedup for the baseline workload 2. Optimal speedup for irregular workloads Assign loop elements to workers without knowledge about the workload function. 17
18
Static batching Decides on the worker-element assignment before the data-parallel operation begins. N cycles 18
19
Static batching Decides on the worker-element assignment before the data-parallel operation begins. No knowledge → divide uniformly. Not optimal for even mildly irregular workloads. N cycles 19
20
Fixed-size batching Workload-driven – decides during execution. N cycles progress 20
21
Fixed-size batching Workload-driven – decides during execution. N cycles 0 21
22
Fixed-size batching Workload-driven – decides during execution. N cycles 2 T0: CAS T0 22
23
Fixed-size batching Workload-driven – decides during execution. N cycles 4 T1: CAS T0T1 23
24
Fixed-size batching Workload-driven – decides during execution. N cycles 6 T0: CAS T0T1 24
25
Fixed-size batching Workload-driven – decides during execution. N cycles 8 T0: CAS T0T1 25
26
Fixed-size batching Workload-driven – decides during execution. N cycles 10 T0: CAS T0T1 26
27
Fixed-size batching Workload-driven – decides during execution. N cycles 12 T0: CAS T0T1 27
28
Fixed-size batching Workload-driven – decides during execution. N cycles progress Pros: lightweight Cons: minimum batch size, contention 28
29
Fixed-size batching - contention 29
30
Factoring, GSS, TS Batch size varies. N cycles progress Pros: lightweight Cons: contention 30
31
Task-based work-stealing N cycles 0..2 2..44..8 8..16 31
32
Task-based work-stealing N cycles 0..2 2..44..8 8..16 2..4 4..8 8..16 T0 T1 0..2 32
33
Task-based work-stealing N cycles 0..2 2..44..8 8..16 2..4 4..8 8..16 T0 T1 0..2 steal – a rare event 33
34
Task-based work-stealing N cycles 0..2 2..44..8 8..16 2..4 4..8 8..16 T0 T1 10..12 12..16 8..100..2 34
35
Task-based work-stealing Pros: can be adaptive - uses stealing information Cons: heavyweight - minimum batch size much larger N cycles 0..2 2..44..8 8..16 2..4 4..8 8..16 T0 T1 10..12 12..16 0..28..10 35
36
Task-based work-stealing N cycles 0..2 2..44..8 8..16 Cannot be stolen after T0 starts processing it 36
37
Work-stealing tree 00T0N owned 37
38
Work-stealing tree 00T0N050T0N owned T0: CAS 38
39
Work-stealing tree 00T0N050T0N0N N … owned completed T0: CAS What about stealing? 39
40
Work-stealing tree 00T0N050T0N0N N … owned completed 0-51T0N T0: CAS T1: CAS stolen T0: CAS 40
41
Work-stealing tree 050T0N0N N … ownedcompleted 0-51T0N T0: CAS stolen T0: CAS 00T0N owned T1: CAS 41
42
Work-stealing tree 050T0N0N N … ownedcompleted 0-51T0N T0: CAS stolen 0-51T0N expanded 50 T0M MMT1N T0: CAS 00T0N owned M = (50 + N) / 2 42
43
Work-stealing tree 050T0N0N N … ownedcompleted 0-51T0N T0: CAS stolen 0-51T0N expanded 50 T0M MMT1N T0: CAS 00T0N owned M = (50 + N) / 2 T0 or T1: CAS 43
44
Work-stealing tree 050T0N0N N … ownedcompleted 0-51T0N T0: CAS stolen 0-51T0N expanded 50 T0M MMT1N T0 or T1: CAS T0: CAS 00T0N owned M = (50 + N) / 2 44
45
Work-stealing tree - contention 45
46
Work-stealing tree scheduling 1)find either a non-expanded, non-completed node 0-51T0100 50 T0100 75 T1100 46
47
Work-stealing tree scheduling 1)find either a non-expanded, non-completed node 2)if not found, terminate 0100T0100 47
48
Work-stealing tree scheduling 1)find either a non-expanded, non-completed node 2)if not found, terminate 3)if not owned, steal and/or expand, and descend 0-51T0N T1: CAS stolen 0-51T0N expanded 50 T0M MMT1N T1: CAS 48
49
Work-stealing tree scheduling 1)find either a non-expanded, non-completed node 2)if not found, terminate 3)if not owned, steal and/or expand, and descend 4)advance until node is completed or stolen 00T0N050T0N0N N … owned completed T0: CAS 49
50
Work-stealing tree scheduling 1)find either a non-expanded, non-completed node 2)if not found, terminate 3)if not owned, steal and/or expand, and descend 4)advance until node is completed or stolen 5)go to 1) 50
51
Work-stealing tree scheduling 1)find either a non-expanded, non-completed node 2)if not found, terminate 3)if not owned, steal and/or expand, and descend 4)advance until node is completed or stolen 5)go to 1) 1)find either a non-expanded, non-completed node 51
52
Choosing the node to steal Find first, in-order traversal 29 5 3 52
53
Choosing the node to steal Find first, in-order traversal 29 5 3 Catastrophic – a lot of stealing, huge trees 53
54
Choosing the node to steal Find first, in-order traversalFind first, random order traversal 29 5 3 29 5 3 Catastrophic – a lot of stealing, huge trees 54
55
Choosing the node to steal Find first, in-order traversalFind first, random order traversal 29 5 3 29 5 3 Catastrophic – a lot of stealing, huge trees Works reasonably well. 55
56
Choosing the node to steal Find first, in-order traversalFind first, random order traversalFind most elements 29 5 3 29 5 3 29 5 3 Catastrophic – a lot of stealing, huge trees Works reasonably well. Generates least nodes. Seems to be best. 56
57
Comparison with fixed-size batching 57
58
Comparison with fixed-size batching 58
59
Comparison with task work-stealing 59
60
Thank you! Questions? 60
61
Finding work 61
62
Other workloads 62
Similar presentations
© 2018 SlidePlayer.com Inc.
All rights reserved.
Ppt on limitation act alberta Ppt on different types of flowers Ppt on non renewable sources of energy Ppt on art and architecture of delhi sultanate Download ppt on national festivals of india Small ppt on save water Ppt on water conservation free download Ppt on aquatic plants and animals Ppt on junk food and its effects Ppt on electric meter testing equipment