Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Dynamic Elimination-Combining Stack Algorithm Gal Bar-Nissan, Danny Hendler and Adi Suissa Department of Computer Science, BGU, January 2011 Presnted.

Similar presentations


Presentation on theme: "A Dynamic Elimination-Combining Stack Algorithm Gal Bar-Nissan, Danny Hendler and Adi Suissa Department of Computer Science, BGU, January 2011 Presnted."— Presentation transcript:

1 A Dynamic Elimination-Combining Stack Algorithm Gal Bar-Nissan, Danny Hendler and Adi Suissa Department of Computer Science, BGU, January 2011 Presnted by: Ilya Mirsky 28.03.2011

2 Outline  Concurrent programming terms  Motivation  Introduction  DECS: The Algorithm  DECS Performance evaluation  NB-DECS  Summary 2

3 Concurrent programming terms 3  Locks (coarse and fine grained)  Non blocking algorithms  Wait-freedom  Lock-freedom  Obstruction-freedom  Linearizability  Memory Contention  Latency

4 Outline  Concurrent programming terms  Motivation  Introduction  DECS: The Algorithm  DECS Performance evaluation  NB-DECS  Summary 4

5 Motivation 5  Concurrent stacks are widely used in parallel applications and operating systems.  A simple implementation using coarse grained locking mechanism causes a “hot spot” at the central stack object and poses a sequential bottleneck.  There is a need in a scalable concurrent stack, which presents a good performance under low, medium and high workloads, with no dependency in the ratio of the operations type (push/ pop).

6 Outline  Concurrent programming terms  Motivation  Introduction  DECS: The Algorithm  DECS Performance evaluation  NB-DECS  Summary 6

7 Introduction 7  Two key synchronization paradigms for construction of scalable concurrent data structures are software combining and elimination.  The most highly scalable concurrent stack algorithm previously known is the lock-free elimination-backoff stack )Hendler, Shavit, Yershalmi).  The HSY stack is highly efficient under low contention, as well as under high contention when workload is symmetric.  Unfortunately, when workloads are asymmetric, the performance of HSY deteriorates to a sequential stack.  Flat-combining (by Hendler et al.) significantly outperforms HSY in low and medium contentions, but it does not scale and even deteriorates at high contention level.

8 Introduction - DECS 8  DECS employs both combining & elimination mechanism.  Scales well for all workload types, and outperforms other stack implementations.  Maintains the simplicity and low overhead of the HSY stack.  Uses a contention-reduction layer as a backoff scheme for a central stack- an elimination-combining layer.  A non blocking implementation is presented, NB-DECS, a lock-free variant of DECS in which threads that have waited for too long may cancel their “combining contract” and retry their operation on the central stack.

9 Introduction - DECS 9

10 10 Central Stack Elimination-combining layer

11 Introduction - DECS 11 Central Stack Elimination-combining layer

12 Introduction - DECS 12 Central Stack zzz… Elimination-combining layer

13 Introduction - DECS 13 zzz… Wake up! Central Stack Elimination-combining layer

14 Introduction - DECS 14 Central Stack zzz… Elimination-combining layer

15 Introduction - DECS 15 Central Stack zzz… Elimination-combining layer

16 Introduction - DECS 16 Central Stack zzz… Elimination-combining layer

17 Outline  Concurrent programming terms  Motivation  Introduction  DECS: The Algorithm  DECS Performance evaluation  NB-DECS  Summary 17

18 DECS- The Algorithm 18  The data structures 164 Collision ArrayLocations Array MultiOp int id; int op; int length; int cStatus; Cell cell; MultiOp next; MultiOp last; Cell Data data; Cell next; Cell Data data; Cell next; Cell Data data; Cell next; Cell Data data; Cell next; CentralStack Elimination-combining layer

19 DECS- The Algorithm 19 Central Stack push(data1) push(data2) pop() I wish there was someone in similar situation…

20 DECS- The Algorithm 20 multiOp tInfo = initMultiOp(); multiOp tInfo = initMultiOp(data);

21 DECS- The Algorithm 21 Collision ArrayLocations Array T. 6 T. 2 MultiOp id = 2 op = POP length = 1 cStatus = INIT cell next = NULL last EMPTY MultiOp id = 6 op = PUSH length = 1 cStatus = INIT cell next = NULL last data1 …4 EMPTY 6 6 I’ll wait, maybe someone will arrive… Yay, I can collide with thread 6! Active collider Passive collider

22 DECS- The Algorithm 22  Central Stack Functions

23 DECS- The Algorithm 23

24 DECS- The Algorithm 24

25 DECS- The Algorithm 25 T. 6 T. 2 zzz… Collision ArrayLocations Array MultiOp id = 2 op = POP length = 1 cStatus = INIT cell next = NULL last EMPTY MultiOp id = 6 op = PUSH length = 1 cStatus = INIT cell next = NULL last data1 I see that T. 6 got PUSH, and I got POP- we can eliminate!

26 DECS- The Algorithm 26  Elimination-Combining Layer Functions

27 DECS- The Algorithm 27 T. 6 T. 2 zzz… MultiOp id = 2 op = POP length = 1 cStatus = INIT cell next = NULL last EMPTY MultiOp id = 6 op = PUSH length = 1 cStatus = INIT cell next = NULL last data1 MultiOp id = 6 op = PUSH length = 0 cStatus = FINISHED cell next = NULL last MultiOp id = 2 op = POP length = 0 cStatus = FINISHED cell next = NULL last Working…

28 DECS- The Algorithm 28 T. 6 T. 2 zzz… MultiOp id = 2 op = POP length = 1 cStatus = INIT cell next = NULL last MultiOp id = 6 op = PUSH length = 1 cStatus = INIT cell next = NULL last data1 MultiOp id = 6 op = PUSH length = 0 cStatus = FINISHED cell next = NULL last MultiOp id = 2 op = POP length = 0 cStatus = FINISHED cell next = NULL last Working… Done!

29 DECS- The Algorithm 29

30 DECS- The Algorithm 30 T. 6 T. 2 zzz… Wake up man, I’ve done your job! Thank you T. 2, let’s go have a beer; I’m buying!

31 DECS- The Algorithm 31

32 DECS- The Algorithm 32

33 Outline  Concurrent programming terms  Motivation  Introduction  DECS: The Algorithm  DECS Performance evaluation  NB-DECS  Summary 33

34 DECS Performance Evaluation 34  Hardware  128-way UltraSparc T2 Plus (T5140) server. A 2 chip system, in which each chip contains 8 cores, and each core multiplexes 8 hardware threads.  Running Solaris 10 OS.  The cores in each CPU share the same L2 cache.  C++ code compiled with GCC with the –O3 flag.  Compared VS:  Treiber stack  The HSY elimination-backoff stacks  Flat-combining stack

35 DECS Performance Evaluation 35  Course of experiments  Threads repeatedly apply operations on the stack for a fixed duration of 1 sec, and the resulting throughput is measured, varying the level of concurrency from 1 to 128.  Throughput is measured on both symmetric and asymmetric workloads.  Stacks are pre-populated with enough cells so that pop operations do not operate on an empty stack.  Each data point is the average of 3 runs.

36 DECS Performance Evaluation 36 X-axis: threads number  Symmetric workload

37 DECS Performance Evaluation 37 X-axis: threads number  Moderately-asymmetric workload

38 DECS Performance Evaluation 38 X-axis: threads number  Fully-asymmetric workload

39 Outline  Concurrent programming terms  Motivation  Introduction  DECS: The Algorithm  DECS Performance evaluation  NB-DECS  Summary 39

40 NB-DECS 40  DECS is blocking.  For some applications non-blocking implementation may be preferable because it’s more robust to thread failures.  NB-DECS is a lock-free variant of DECS that allows threads that delegated their operations to another thread, and have waited for too long, to cancel their “combining contracts”, and retry their operations.

41 Outline  Concurrent programming terms  Motivation  Introduction  DECS: The Algorithm  DECS Performance evaluation  NB-DECS  Summary 41

42 Summary 42  DECS comprises a combining-elimination layer, therefore benefits from collision of operations of reverse, as well as identical semantics.  Empirical evaluation showed that DECS outperforms all best known stack algorithms for all workloads.  NB-DECS  The idea of combining-elimination layer could be used to efficiently implement other concurrent data-structures.


Download ppt "A Dynamic Elimination-Combining Stack Algorithm Gal Bar-Nissan, Danny Hendler and Adi Suissa Department of Computer Science, BGU, January 2011 Presnted."

Similar presentations


Ads by Google