Presentation is loading. Please wait.

Presentation is loading. Please wait.

Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads Aleksandar Prokopec Martin Odersky 1.

Similar presentations


Presentation on theme: "Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads Aleksandar Prokopec Martin Odersky 1."— Presentation transcript:

1 Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads Aleksandar Prokopec Martin Odersky 1

2 Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads Aleksandar Prokopec Martin Odersky Irregular Data-Parallel 2

3 Uniform workload (0 until ) reduce (+) 3

4 Uniform workload (0 until ) reduce (+) sum = sum + x 4

5 Uniform workload (0 until ) reduce (+) sum = sum + x … N cycles 5

6 Baseline workload for (0 until ) {} … N cycles 6

7 Irregular workload 7

8 N cycles 8

9 Irregular workload for { x <- 0 until width y <- 0 until height } image(x, y) = compute(x, y) N cycles 9

10 Irregular workload for { x <- 0 until width y <- 0 until height } image(x, y) = compute(x, y) image(x, y) = compute(x, y) N cycles 10

11 Workload function workload(n) – work spent on element n after the data-parallel operation completed 11

12 Workload function Could be… Runtime value dependent for { x <- 0 until width y <- 0 until height } img(x, y) = compute(x, y) workload(n) – work spent on element n after the data-parallel operation completed 12

13 Workload function Could be… Execution-schedule dependent for (n <- nodes) n.neighbours += new Node workload(n) – work spent on element n after the data-parallel operation completed 13

14 Workload function Could be… Totally random for ((x, y) <- img.indices) img(x, y) = sample( x + random(), y + random() ) workload(n) – work spent on element n after the data-parallel operation completed 14

15 Data-parallel scheduler Assign loop elements to workers without knowledge about the workload function. 15

16 Data-parallel scheduler 1. Linear speedup for the baseline workload Assign loop elements to workers without knowledge about the workload function. 16

17 Data-parallel scheduler 1. Linear speedup for the baseline workload 2. Optimal speedup for irregular workloads Assign loop elements to workers without knowledge about the workload function. 17

18 Static batching Decides on the worker-element assignment before the data-parallel operation begins. N cycles 18

19 Static batching Decides on the worker-element assignment before the data-parallel operation begins. No knowledge → divide uniformly. Not optimal for even mildly irregular workloads. N cycles 19

20 Fixed-size batching Workload-driven – decides during execution. N cycles progress 20

21 Fixed-size batching Workload-driven – decides during execution. N cycles 0 21

22 Fixed-size batching Workload-driven – decides during execution. N cycles 2 T0: CAS T0 22

23 Fixed-size batching Workload-driven – decides during execution. N cycles 4 T1: CAS T0T1 23

24 Fixed-size batching Workload-driven – decides during execution. N cycles 6 T0: CAS T0T1 24

25 Fixed-size batching Workload-driven – decides during execution. N cycles 8 T0: CAS T0T1 25

26 Fixed-size batching Workload-driven – decides during execution. N cycles 10 T0: CAS T0T1 26

27 Fixed-size batching Workload-driven – decides during execution. N cycles 12 T0: CAS T0T1 27

28 Fixed-size batching Workload-driven – decides during execution. N cycles progress Pros: lightweight Cons: minimum batch size, contention 28

29 Fixed-size batching - contention 29

30 Factoring, GSS, TS Batch size varies. N cycles progress Pros: lightweight Cons: contention 30

31 Task-based work-stealing N cycles

32 Task-based work-stealing N cycles T0 T

33 Task-based work-stealing N cycles T0 T steal – a rare event 33

34 Task-based work-stealing N cycles T0 T

35 Task-based work-stealing Pros: can be adaptive - uses stealing information Cons: heavyweight - minimum batch size much larger N cycles T0 T

36 Task-based work-stealing N cycles Cannot be stolen after T0 starts processing it 36

37 Work-stealing tree 00T0N owned 37

38 Work-stealing tree 00T0N050T0N owned T0: CAS 38

39 Work-stealing tree 00T0N050T0N0N N … owned completed T0: CAS What about stealing? 39

40 Work-stealing tree 00T0N050T0N0N N … owned completed 0-51T0N T0: CAS T1: CAS stolen T0: CAS 40

41 Work-stealing tree 050T0N0N N … ownedcompleted 0-51T0N T0: CAS stolen T0: CAS 00T0N owned T1: CAS 41

42 Work-stealing tree 050T0N0N N … ownedcompleted 0-51T0N T0: CAS stolen 0-51T0N expanded 50 T0M MMT1N T0: CAS 00T0N owned M = (50 + N) / 2 42

43 Work-stealing tree 050T0N0N N … ownedcompleted 0-51T0N T0: CAS stolen 0-51T0N expanded 50 T0M MMT1N T0: CAS 00T0N owned M = (50 + N) / 2 T0 or T1: CAS 43

44 Work-stealing tree 050T0N0N N … ownedcompleted 0-51T0N T0: CAS stolen 0-51T0N expanded 50 T0M MMT1N T0 or T1: CAS T0: CAS 00T0N owned M = (50 + N) / 2 44

45 Work-stealing tree - contention 45

46 Work-stealing tree scheduling 1)find either a non-expanded, non-completed node 0-51T T T

47 Work-stealing tree scheduling 1)find either a non-expanded, non-completed node 2)if not found, terminate 0100T

48 Work-stealing tree scheduling 1)find either a non-expanded, non-completed node 2)if not found, terminate 3)if not owned, steal and/or expand, and descend 0-51T0N T1: CAS stolen 0-51T0N expanded 50 T0M MMT1N T1: CAS 48

49 Work-stealing tree scheduling 1)find either a non-expanded, non-completed node 2)if not found, terminate 3)if not owned, steal and/or expand, and descend 4)advance until node is completed or stolen 00T0N050T0N0N N … owned completed T0: CAS 49

50 Work-stealing tree scheduling 1)find either a non-expanded, non-completed node 2)if not found, terminate 3)if not owned, steal and/or expand, and descend 4)advance until node is completed or stolen 5)go to 1) 50

51 Work-stealing tree scheduling 1)find either a non-expanded, non-completed node 2)if not found, terminate 3)if not owned, steal and/or expand, and descend 4)advance until node is completed or stolen 5)go to 1) 1)find either a non-expanded, non-completed node 51

52 Choosing the node to steal Find first, in-order traversal

53 Choosing the node to steal Find first, in-order traversal Catastrophic – a lot of stealing, huge trees 53

54 Choosing the node to steal Find first, in-order traversalFind first, random order traversal Catastrophic – a lot of stealing, huge trees 54

55 Choosing the node to steal Find first, in-order traversalFind first, random order traversal Catastrophic – a lot of stealing, huge trees Works reasonably well. 55

56 Choosing the node to steal Find first, in-order traversalFind first, random order traversalFind most elements Catastrophic – a lot of stealing, huge trees Works reasonably well. Generates least nodes. Seems to be best. 56

57 Comparison with fixed-size batching 57

58 Comparison with fixed-size batching 58

59 Comparison with task work-stealing 59

60 Thank you! Questions? 60

61 Finding work 61

62 Other workloads 62


Download ppt "Near Optimal Work-Stealing Tree for Highly Irregular Data-Parallel Workloads Aleksandar Prokopec Martin Odersky 1."

Similar presentations


Ads by Google