Presentation is loading. Please wait.

Presentation is loading. Please wait.

Wait-Free Queues with Multiple Enqueuers and Dequeuers

Similar presentations


Presentation on theme: "Wait-Free Queues with Multiple Enqueuers and Dequeuers"— Presentation transcript:

1 Wait-Free Queues with Multiple Enqueuers and Dequeuers
Alex Kogan Erez Petrank Computer Science, Technion, Israel

2 Outline Queue data structure Progress guarantees
Previous work on concurrent queues Review of the MS-queue Our ideas in a nutshell Review of the KP-queue Performance results Performance optimizations Summary

3 FIFO queues One of the most fundamental and common data structures
enqueue dequeue 5 3 2 9

4 Concurrent FIFO queues
Concurrent implementation supports “correct” concurrent adding and removing elements correct = linearizable The access to the shared memory should be synchronized enqueue dequeue 3 2 9 dequeue dequeue empty! dequeue

5 Non-blocking synchronization
No thread is blocked in waiting for another thread to complete e.g., no locks / critical sections Progress guarantees: Obstruction-freedom progress is guaranteed only in the eventual absence of interference Lock-freedom among all threads trying to apply an operation, one will succeed Wait-freedom a thread completes its operation in a bounded number of steps

6 Lock-freedom Among all threads trying to apply an operation, one will succeed opportunistic approach make attempts until succeeding global progress all but one threads may starve Many efficient and scalable lock-free queue implementations

7 Wait-freedom A thread completes its operation in a bounded number of steps regardless of what other threads are doing A highly desired property of any concurrent data structure but, commonly regarded as inefficient and too costly to achieve Particularly important in several domains real-time systems operating under SLA heterogeneous environments

8 Related work: existing wait-free queues
Limited concurrency one enqueuer and one dequeuer multiple enqueuers, one concurrent dequeuer multiple dequeuers, one concurrent enqueuer Universal constructions generic method to transform any (sequential) object into lock- free/wait-free concurrent object expensive impractical implementations (Almost) no experimental results [Lamport’83] [David’04] [Jayanti&Petrovic’05] [Herlihy’91]

9 Related work: lock-free queue
[Michael & Scott’96] One of the most scalable and efficient lock-free implementations Widely adopted by industry part of Java Concurrency package Relatively simple and intuitive implementation Based on singly-linked list of nodes 12 4 17 head tail

10 MS-queue brief review: enqueue
CAS 12 4 17 9 CAS head tail enqueue 9

11 MS-queue brief review: enqueue
CAS 12 4 17 9 5 CAS head tail CAS enqueue enqueue 5 9

12 MS-queue brief review: dequeue
12 4 17 9 12 head CAS tail dequeue

13 Our idea (in a nutshell)
Based on the lock-free queue by Michael & Scott Helping mechanism each operation is applied in a bounded time “Wait-free” implementation scheme each operation is applied exactly once

14 Helping mechanism Each operation is assigned a dynamic age-based priority inspired by the Doorway mechanism used in Bakery mutex Each thread accessing a queue chooses a monotonically increasing phase number writes down its phase and operation info in a special state array helps all threads with a non-larger phase to apply their operations phase: long pending: boolean enqueue: boolean node: Node state entry per thread

15 Helping mechanism in action
phase pending enqueue node 4 true ref 9 false true null 9 true ref 3 false true ref

16 Helping mechanism in action
phase pending enqueue node 4 true ref 9 false true null 9 true ref 10 true ref I need to help!

17 Helping mechanism in action
phase pending enqueue node 4 true ref 9 false true null 9 true ref 10 true ref I do not need to help!

18 Helping mechanism in action
phase pending enqueue node 4 true ref 9 false true null 11 true false null 10 true ref I need to help! I do not need to help!

19 Helping mechanism in action
The number of operations that may linearize before any given operation is bounded hence, wait-freedom phase pending enqueue node 4 true ref 9 false true null 11 true false null 10 true ref

20 Optimized helping The basic scheme has two drawbacks:
the number of steps executed by each thread on every operation depends on n (the number of threads) even when there is no contention creates scenarios where many threads help same operations e.g., when many threads access the queue concurrently large redundant work Optimization: help one thread at a time, in a cyclic manner faster threads help slower peers in parallel reduces the amount of redundant work

21 How to choose the phase numbers
Every time ti chooses a phase number, it is greater than the number of any thread that made its choice before ti defines a logical order on operations and provides wait- freedom Like in Bakery mutex: scan through state calculate the maximal phase value + 1 requires O(n) steps Alternative: use an atomic counter requires O(1) steps 4 true ref 3 false null 5 6!

22 “Wait-free” design scheme
Break each operation into three atomic steps can be executed by different threads cannot be interleaved Initial change of the internal structure concurrent operations realize that there is an operation-in-progress Updating the state of the operation-in-progress as being performed (linearized) Fixing the internal structure finalizing the operation-in-progress

23 1 4 2 head tail Internal structures 9 false null 4 true phase pending
enqueue node state

24 head tail Internal structures 1 4 2 1 -1 -1
these elements were enqueued by Thread 0 this element was enqueued by Thread 1 enqTid: int 1 4 1 -1 2 -1 holds ID of the thread that performs / has performed the insertion of the node into the queue head tail 9 false null 4 true phase pending enqueue node state

25 head tail Internal structures 1 4 2 1 -1 -1
this element was dequeued by Thread 1 deqTid: int 1 4 1 -1 2 -1 holds ID of the thread that performs / has performed the removal of the node into the queue head tail 9 false null 4 true phase pending enqueue node state

26 head tail enqueue operation 12 4 17 6 Creating a new node -1 1 -1 -1 2
-1 4 1 -1 17 -1 6 2 -1 head tail 9 false null 4 true phase pending enqueue node enqueue 6 ID: 2 state

27 head tail enqueue operation 12 4 17 6 Announcing a new operation -1 1
-1 4 1 17 6 2 head tail 9 false null 4 true 10 phase pending enqueue node enqueue 6 ID: 2 state

28 head tail enqueue operation 12 4 17 6
Step 1: Initial change of the internal structure CAS 12 -1 4 1 17 6 2 head tail 9 false null 4 true 10 phase pending enqueue node enqueue 6 ID: 2 state

29 head tail enqueue operation 12 4 17 6
Step 2: Updating the state of the operation-in-progress as being performed 12 -1 4 1 17 6 2 head tail CAS 9 false null 4 true 10 phase pending enqueue node enqueue 6 ID: 2 state

30 head tail enqueue operation 12 4 17 6
Step 3: Fixing the internal structure 12 -1 4 1 17 6 2 head CAS tail 9 false null 4 true 10 phase pending enqueue node enqueue 6 ID: 2 state

31 head tail enqueue operation 12 4 17 6
Step 1: Initial change of the internal structure 12 -1 4 1 17 6 2 head tail 9 false null 4 true 10 phase pending enqueue node enqueue 3 ID: 0 enqueue 6 ID: 2 state

32 head tail enqueue operation 12 4 17 6 3 Creating a new node
Announcing a new operation 12 -1 4 1 17 6 2 3 -1 head tail 11 true 4 false null 10 phase pending enqueue node enqueue 3 ID: 0 enqueue 6 ID: 2 state

33 head tail enqueue operation 12 4 17 6 3
Step 2: Updating the state of the operation-in-progress as being performed 12 -1 4 1 -1 17 -1 6 3 -1 2 -1 head tail 11 true 4 false null 10 phase pending enqueue node enqueue 3 ID: 0 enqueue 6 ID: 2 state

34 head tail enqueue operation 12 4 17 6 3
Step 2: Updating the state of the operation-in-progress as being performed 12 -1 4 1 -1 17 -1 6 3 -1 2 -1 head tail CAS 11 true 4 false null 10 phase pending enqueue node enqueue 3 ID: 0 enqueue 6 ID: 2 state

35 head tail enqueue operation 12 4 17 6 3
Step 3: Fixing the internal structure 12 -1 4 1 -1 17 -1 6 3 -1 2 -1 CAS head tail 11 true 4 false null 10 phase pending enqueue node enqueue 3 ID: 0 enqueue 6 ID: 2 state

36 head tail enqueue operation 12 4 17 6 3
Step 1: Initial change of the internal structure CAS 12 -1 4 1 17 6 3 -1 2 -1 head tail 11 true 4 false null 10 phase pending enqueue node enqueue 3 ID: 0 enqueue 6 ID: 2 state

37 head tail dequeue operation 12 4 17 1 -1 9 false null 4 true phase
-1 4 1 17 head tail 9 false null 4 true phase pending enqueue node dequeue ID: 2 state

38 head tail dequeue operation 12 4 17 Announcing a new operation 1 -1 9
-1 4 1 17 head tail 9 false null 4 true 10 phase pending enqueue node dequeue ID: 2 state

39 head tail dequeue operation 12 4 17
Updating state to refer the first node 12 -1 4 1 17 head tail 9 false null 4 true 10 phase pending enqueue node dequeue ID: 2 CAS state

40 head tail dequeue operation 12 4 17
Step 1: Initial change of the internal structure 12 2 4 1 -1 17 CAS head tail 9 false null 4 true 10 phase pending enqueue node dequeue ID: 2 state

41 head tail dequeue operation 12 4 17 Step 2: Updating the state of the
operation-in-progress as being performed 12 2 4 1 -1 17 head tail CAS 9 false null 4 true 10 phase pending enqueue node dequeue ID: 2 state

42 head tail dequeue operation 12 4 17
Step 3: Fixing the internal structure 12 2 4 1 -1 17 head CAS tail 9 false null 4 true 10 phase pending enqueue node dequeue ID: 2 state

43 Performance evaluation
Architecture two 2.5 GHz quadcore Xeon E5420 processors two 1.6 GHz quadcore Xeon E5310 processors # threads 8 RAM 16GB OS CentOS 5.5 Server Ubuntu Server RedHat Enterpise 5.3 Server Java Sun’s Java SE Runtime update 22, 64-bit Server VM

44 Benchmarks Enqueue-Dequeue benchmark 50%-Enqueue benchmark
the queue is initially empty each thread iteratively performs enqueue and then dequeue 1,000,000 iterations per thread 50%-Enqueue benchmark the queue is initialized with 1000 elements each thread decides uniformly and random which operation to perform, with equal odds for enqueue and dequeue 1,000,000 operations per thread

45 Tested algorithms Compared implementations: MS-queue
Base wait-free queue Optimized wait-free queue Opt 1: optimized helping (help one thread at a time) Opt 2: atomic counter-based phase calculation Measure completion time as a function of # threads

46 Enqueue-Dequeue benchmark
TBD: add figures

47 The impact of optimizations
TBD: add figures

48 Optimizing further: false sharing
Created on accesses to state array Resolved by stretching the state with dummy pads TBD: add figures

49 Optimizing further: memory management
Every attempt to update state is preceded by an allocation of a new record these records can be reused when the attempt fails (more) validation checks can be performed to reduce the number of failed attempts When an operation is finished, remove the reference from state to a list node help garbage collector

50 Implementing the queue without GC
Apply Hazard Pointers technique [Michael’04] each thread is associated with hazard pointers single-writer multi-reader registers used by threads to point on objects they may access later when an object should be deleted, a thread stores its address in a special stack once in a while, it scans the stack and recycle objects only if there are no hazard pointers pointing on it In our case, the technique can be applied with a slight modification in the dequeue method

51 Summary First wait-free queue implementation supporting multiple enqueuers and dequeuers Wait-freedom incurs an inherent trade-off bounds the completion time of a single operation has a cost in a “typical” case The additional cost can be reduced and become tolerable Proposed design scheme might be applicable for other wait-free data structures

52 Thank you! Questions?


Download ppt "Wait-Free Queues with Multiple Enqueuers and Dequeuers"

Similar presentations


Ads by Google