Presentation is loading. Please wait.

Presentation is loading. Please wait.

Wait-Free Queues with Multiple Enqueuers and Dequeuers Alex Kogan Erez Petrank Computer Science, Technion, Israel.

Similar presentations


Presentation on theme: "Wait-Free Queues with Multiple Enqueuers and Dequeuers Alex Kogan Erez Petrank Computer Science, Technion, Israel."— Presentation transcript:

1 Wait-Free Queues with Multiple Enqueuers and Dequeuers Alex Kogan Erez Petrank Computer Science, Technion, Israel

2 Outline  Queue data structure  Progress guarantees  Previous work on concurrent queues  Review of the MS-queue  Our ideas in a nutshell  Review of the KP-queue  Performance results  Performance optimizations  Summary

3 FIFO queues  One of the most fundamental and common data structures dequeue 532 enqueue 9

4 Concurrent FIFO queues  Concurrent implementation supports “correct” concurrent adding and removing elements  correct = linearizable  The access to the shared memory should be synchronized 32 enqueue 9 empty! dequeue

5 Non-blocking synchronization NNo thread is blocked in waiting for another thread to complete ee.g., no locks / critical sections PProgress guarantees: OObstruction-freedom pprogress is guaranteed only in the eventual absence of interference LLock-freedom aamong all threads trying to apply an operation, one will succeed WWait-freedom aa thread completes its operation in a bounded number of steps

6 Lock-freedom  Among all threads trying to apply an operation, one will succeed  opportunistic approach  make attempts until succeeding global progress  all but one threads may starve  Many efficient and scalable lock-free queue implementations

7 Wait-freedom  A thread completes its operation in a bounded number of steps  regardless of what other threads are doing  A highly desired property of any concurrent data structure  but, commonly regarded as inefficient and too costly to achieve  Particularly important in several domains  real-time systems  operating under SLA  heterogeneous environments

8 Related work: existing wait-free queues  Limited concurrency  one enqueuer and one dequeuer  multiple enqueuers, one concurrent dequeuer  multiple dequeuers, one concurrent enqueuer  Universal constructions  generic method to transform any (sequential) object into lock- free/wait-free concurrent object  expensive impractical implementations  (Almost) no experimental results [Lamport’83] [David’04] [Jayanti&Petrovic’05] [Herlihy’91]

9 Related work: lock-free queue  One of the most scalable and efficient lock-free implementations  Widely adopted by industry  part of Java Concurrency package  Relatively simple and intuitive implementation  Based on singly-linked list of nodes headtail [Michael & Scott’96]

10 MS-queue brief review: enqueue 4 head tail enqueue 9 CAS

11 MS-queue brief review: enqueue 4 head tail enqueue enqueue 5 5 CAS

12 MS-queue brief review: dequeue 4 head tail dequeue CAS

13 Our idea (in a nutshell)  Based on the lock-free queue by Michael & Scott  Helping mechanism  each operation is applied in a bounded time  “Wait-free” implementation scheme  each operation is applied exactly once

14 Helping mechanism  Each operation is assigned a dynamic age-based priority  inspired by the Doorway mechanism used in Bakery mutex Each thread accessing a queue  chooses a monotonically increasing phase number  writes down its phase and operation info in a special state array  helps all threads with a non-larger phase to apply their operations phase: long pending: boolean enqueue: boolean node: Node state entry per thread

15 Helping mechanism in action 4 true ref 9 false true null 9 true ref 3 false true ref phase pending enqueue node

16 Helping mechanism in action 4 true ref 9 false true null 9 true ref 10 true ref I need to help! phase pending enqueue node

17 Helping mechanism in action 4 true ref 9 false true null 9 true ref 10 true ref phase pending enqueue node I do not need to help!

18 Helping mechanism in action 4 true ref 9 false true null 11 true false null 10 true ref phase pending enqueue node I do not need to help! I need to help!

19 Helping mechanism in action  The number of operations that may linearize before any given operation is bounded  hence, wait-freedom 4 true ref 9 false true null 11 true false null 10 true ref phase pending enqueue node

20 Optimized helping  The basic scheme has two drawbacks:  the number of steps executed by each thread on every operation depends on n (the number of threads)  even when there is no contention  creates scenarios where many threads help same operations  e.g., when many threads access the queue concurrently  large redundant work  Optimization: help one thread at a time, in a cyclic manner  faster threads help slower peers in parallel  reduces the amount of redundant work

21 How to choose the phase numbers  Every time t i chooses a phase number, it is greater than the number of any thread that made its choice before t i  defines a logical order on operations and provides wait- freedom  Like in Bakery mutex:  scan through state  calculate the maximal phase value + 1  requires O(n) steps  Alternative: use an atomic counter requires O(1) steps 4 true ref 3 false true null 5 true ref 6!

22 “Wait-free” design scheme  Break each operation into three atomic steps  can be executed by different threads  cannot be interleaved 1. Initial change of the internal structure  concurrent operations realize that there is an operation-in-progress 2. Updating the state of the operation-in-progress as being performed (linearized) 3. Fixing the internal structure  finalizing the operation-in-progress

23 Internal structures 4 head tail false null 4 false true null 9 false true null phase pending enqueue node state

24 Internal structures head tail 9 false null 4 false true null 9 false true null phase pending enqueue node holds ID of the thread that performs / has performed the insertion of the node into the queue these elements were enqueued by Thread 0this element was enqueued by Thread 1 state enqTid: int

25 Internal structures head tail false null 4 false true null 9 false true null phase pending enqueue node state deqTid: int holds ID of the thread that performs / has performed the removal of the node into the queue this element was dequeued by Thread 1

26 enqueue operation head tail enqueue 6 ID: 2 9 false null 4 false true null 9 false true null phase pending enqueue node Creating a new node state

27 enqueue operation head tail 9 false null 4 false true null 10 true phase pending enqueue node Announcing a new operation state enqueue 6 ID:

28 enqueue operation head tail 9 false null 4 false true null 10 true phase pending enqueue node Step 1: Initial change of the internal structure state enqueue 6 ID: CAS

29 enqueue operation head tail 9 false null 4 false true null 10 false true phase pending enqueue node Step 2: Updating the state of the operation-in-progress as being performed CAS state enqueue 6 ID:

30 enqueue operation head tail 9 false null 4 false true null 10 false true phase pending enqueue node state enqueue 6 ID: Step 3: Fixing the internal structure CAS

31 enqueue operation head tail 9 false null 4 false true null 10 true phase pending enqueue node enqueue 3 ID: 0 state enqueue 6 ID: 2 Step 1: Initial change of the internal structure

32 enqueue operation head tail 11 true 4 false true null 10 true phase pending enqueue node enqueue 3 ID: 0 state enqueue 6 ID: Creating a new node Announcing a new operation

33 enqueue operation head tail 11 true 4 false true null 10 true phase pending enqueue node enqueue 3 ID: 0 state enqueue 6 ID: Step 2: Updating the state of the operation-in-progress as being performed

34 enqueue operation head tail 11 true 4 false true null 10 false true phase pending enqueue node enqueue 3 ID: 0 state enqueue 6 ID: Step 2: Updating the state of the operation-in-progress as being performed CAS

35 enqueue operation head tail 11 true 4 false true null 10 false true phase pending enqueue node enqueue 3 ID: 0 state enqueue 6 ID: Step 3: Fixing the internal structure CAS

36 enqueue operation head tail 11 true 4 false true null 10 false true phase pending enqueue node enqueue 3 ID: 0 state enqueue 6 ID: Step 1: Initial change of the internal structure CAS

37 dequeue operation head tail 9 false null 4 false true null 9 false true null phase pending enqueue node state dequeue ID: 2

38 dequeue operation head tail 9 false null 4 false true null 10 true false null phase pending enqueue node state dequeue ID: 2 Announcing a new operation

39 dequeue operation head tail 9 false null 4 false true null 10 true false phase pending enqueue node state dequeue ID: 2 Updating state to refer the first node CAS

40 dequeue operation head tail 9 false null 4 false true null 10 true false phase pending enqueue node state dequeue ID: 2 Step 1: Initial change of the internal structure CAS

41 dequeue operation head tail 9 false null 4 false true null 10 false phase pending enqueue node state dequeue ID: 2 Step 2: Updating the state of the operation-in-progress as being performed CAS

42 dequeue operation head tail 9 false null 4 false true null 10 false phase pending enqueue node state dequeue ID: 2 Step 3: Fixing the internal structure CAS

43 Performance evaluation Architecture two 2.5 GHz quadcore Xeon E5420 processors two 1.6 GHz quadcore Xeon E5310 processors # threads888 RAM16GB OS CentOS 5.5 Server Ubuntu 8.10 Server RedHat Enterpise 5.3 Server JavaSun’s Java SE Runtime update 22, 64-bit Server VM

44 Benchmarks  Enqueue-Dequeue benchmark  the queue is initially empty  each thread iteratively performs enqueue and then dequeue  1,000,000 iterations per thread  50%-Enqueue benchmark  the queue is initialized with 1000 elements  each thread decides uniformly and random which operation to perform, with equal odds for enqueue and dequeue  1,000,000 operations per thread

45 Tested algorithms Compared implementations:  MS-queue  Base wait-free queue  Optimized wait-free queue  Opt 1: optimized helping (help one thread at a time)  Opt 2: atomic counter-based phase calculation  Measure completion time as a function of # threads

46 Enqueue-Dequeue benchmark  TBD: add figures

47 The impact of optimizations  TBD: add figures

48 Optimizing further: false sharing  Created on accesses to state array  Resolved by stretching the state with dummy pads  TBD: add figures

49 Optimizing further: memory management  Every attempt to update state is preceded by an allocation of a new record  these records can be reused when the attempt fails  (more) validation checks can be performed to reduce the number of failed attempts  When an operation is finished, remove the reference from state to a list node  help garbage collector

50 Implementing the queue without GC  Apply Hazard Pointers technique [Michael’04]  each thread is associated with hazard pointers  single-writer multi-reader registers  used by threads to point on objects they may access later  when an object should be deleted, a thread stores its address in a special stack  once in a while, it scans the stack and recycle objects only if there are no hazard pointers pointing on it  In our case, the technique can be applied with a slight modification in the dequeue method

51 Summary  First wait-free queue implementation supporting multiple enqueuers and dequeuers  Wait-freedom incurs an inherent trade-off  bounds the completion time of a single operation  has a cost in a “typical” case  The additional cost can be reduced and become tolerable  Proposed design scheme might be applicable for other wait-free data structures

52 Thank you! Questions?


Download ppt "Wait-Free Queues with Multiple Enqueuers and Dequeuers Alex Kogan Erez Petrank Computer Science, Technion, Israel."

Similar presentations


Ads by Google