שירן חליבה Concurrent Queues. Outline: Some definitions 3 queue implementations : A Bounded Partial Queue An Unbounded Total Queue An Unbounded Lock-Free.

Slides:



Advertisements
Similar presentations
Operating Systems Semaphores II
Advertisements

Operating Systems: Monitors 1 Monitors (C.A.R. Hoare) higher level construct than semaphores a package of grouped procedures, variables and data i.e. object.
– R 7 :: 1 – 0024 Spring 2010 Parallel Programming 0024 Recitation Week 7 Spring Semester 2010.
Practice Session 7 Synchronization Liveness Deadlock Starvation Livelock Guarded Methods Model Thread Timing Busy Wait Sleep and Check Wait and Notify.
1 Chapter 4 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.
Ch 7 B.
Chapter 6 Process Synchronization Bernard Chen Spring 2007.
Chapter 6: Process Synchronization
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
Interprocess Communication
Monitors & Blocking Synchronization 1. Producers & Consumers Problem Two threads that communicate through a shared FIFO queue. These two threads can’t.
Concurrent Queues and Stacks The Art of Multiprocessor Programming Spring 2007.
Scalable Synchronous Queues By William N. Scherer III, Doug Lea, and Michael L. Scott Presented by Ran Isenberg.
Lecture 6-2 : Concurrent Queues and Stacks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Concurrent Queues Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Art of Multiprocessor Programming1 Concurrent Queues Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Modified.
Concurrent Queues and Stacks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Concurrent Queues and Stacks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
CY2003 Computer Systems Lecture 05 Semaphores - Theory.
Monitors Chapter 7. The semaphore is a low-level primitive because it is unstructured. If we were to build a large system using semaphores alone, the.
Threading Part 3 CS221 – 4/24/09. Teacher Survey Fill out the survey in next week’s lab You will be asked to assess: – The Course – The Teacher – The.
Monitors and Blocking Synchronization By Tal Walter.
Concurrency: Mutual Exclusion, Synchronization, Deadlock, and Starvation in Representative Operating Systems.
Synchronization in Java Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Concurrent Queues and Stacks Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
CS533 - Concepts of Operating Systems 1 Class Discussion.
Instructor: Umar KalimNUST Institute of Information Technology Operating Systems Process Synchronization.
Discussion Week 3 TA: Kyle Dewey. Overview Concurrency overview Synchronization primitives Semaphores Locks Conditions Project #1.
Semaphores, Locks and Monitors By Samah Ibrahim And Dena Missak.
Practice Session 8 Blocking queues Producers-Consumers pattern Semaphore Futures and Callables Advanced Thread Synchronization Methods CountDownLatch Thread.
Producer-Consumer Problem The problem describes two processes, the producer and the consumer, who share a common, fixed-size buffer used as a queue.bufferqueue.
Maged M.Michael Michael L.Scott Department of Computer Science Univeristy of Rochester Presented by: Jun Miao.
4061 Session 21 (4/3). Today Thread Synchronization –Condition Variables –Monitors –Read-Write Locks.
CSC321 Concurrent Programming: §5 Monitors 1 Section 5 Monitors.
11/18/20151 Operating Systems Design (CS 423) Elsa L Gunter 2112 SC, UIUC Based on slides by Roy Campbell, Sam.
JAVA MEMORY MODEL AND ITS IMPLICATIONS Srikanth Seshadri
11/21/20151 Operating Systems Design (CS 423) Elsa L Gunter 2112 SC, UIUC Based on slides by Roy Campbell, Sam.
ICS 313: Programming Language Theory Chapter 13: Concurrency.
A Methodology for Creating Fast Wait-Free Data Structures Alex Koganand Erez Petrank Computer Science Technion, Israel.
Advanced Concurrency Topics Nelson Padua-Perez Bill Pugh Department of Computer Science University of Maryland, College Park.
1 CMSC421: Principles of Operating Systems Nilanjan Banerjee Principles of Operating Systems Acknowledgments: Some of the slides are adapted from Prof.
SPL/2010 Guarded Methods and Waiting 1. SPL/2010 Reminder! ● Concurrency problem: asynchronous modifications to object states lead to failure of thread.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Software Systems Advanced Synchronization Emery Berger and Mark Corner University.
Monitors and Blocking Synchronization Dalia Cohn Alperovich Based on “The Art of Multiprocessor Programming” by Herlihy & Shavit, chapter 8.
Concurrent Queues Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
1 Condition Variables CS 241 Prof. Brighten Godfrey March 16, 2012 University of Illinois.
Gal Milman Based on Chapter 10 (Concurrent Queues and the ABA Problem) in The Art of Multiprocessor Programming by Herlihy and Shavit Seminar 2 (236802)
13/03/07Week 21 CENG334 Introduction to Operating Systems Erol Sahin Dept of Computer Eng. Middle East Technical University Ankara, TURKEY URL:
Programming Language Basics Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
COSC 3407: Operating Systems Lecture 9: Readers-Writers and Language Support for Synchronization.
1 Previous Lecture Overview  semaphores provide the first high-level synchronization abstraction that is possible to implement efficiently in OS. This.
Semaphores Chapter 6. Semaphores are a simple, but successful and widely used, construct.
Concurrency: Locks and synchronization Slides by Prof. Cox.
pThread synchronization
CS703 - Advanced Operating Systems
Monitors and Blocking Synchronization
Background on the need for Synchronization
Lecture 25 More Synchronized Data and Producer/Consumer Relationship
Concurrent Objects Companion slides for
Distributed Algorithms (22903)
Concurrent Queues and Stacks
Concurrent Queues and Stacks
Thread Synchronization
Producer-Consumer Problem
Chapter 30 Condition Variables
Semaphores Chapter 6.
EECE.4810/EECE.5730 Operating Systems
Software Engineering and Architecture
Synchronization and liveness
Presentation transcript:

שירן חליבה Concurrent Queues

Outline: Some definitions 3 queue implementations : A Bounded Partial Queue An Unbounded Total Queue An Unbounded Lock-Free Queue

Introduction and some definitions: Pools show up in many places in concurrent systems. For example, in many applications, one or more producer threads produce items to be consumed by one or more consumer threads. To allow consumers to keep up, we can place a buffer between the producers and the consumers. Often, pools act as producer–consumer buffers. A pool allows the same item to appear more than once.

Introduction and some definitions cont. A queue is a special kind of pool with FIFO fairness. It provides an enq(x) method that puts item x at one end of the queue, called the tail, and a deq() method that removes and returns the item at the other end of the queue, called the head.

Bounded vs. Unbounded A pool can be bounded or unbounded. Bounded Fixed capacity Good when resources an issue Unbounded Holds any number of objects

Blocking vs. Non-Blocking Problem cases: Removing from empty pool Adding to full (bounded) pool Blocking Caller waits until state changes Non-Blocking Method throws exception

Total vs. Partial Pool methods may be total or partial. A method is total if calls do not wait for certain conditions to become true. For example, a get() call that tries to remove an item from an empty pool immediately returns a failure code or throws an exception. A total interface makes sense when the producer (or consumer) thread has something better to do than wait for the method call to take effect.

Total vs. Partial A method is partial if calls may wait for conditions to hold. For example, a partial get() call that tries to remove an item from an empty pool blocks until an item is available to return. A partial interface makes sense when the producer (or consumer) has nothing better to do than to wait for the pool to become nonempty (or non full).

Queue: Concurrency enq(x) y=deq() enq() and deq() work at different ends of the object tailhead

Concurrency enq(x) Challenge: what if the queue is empty or full? y=deq() tail head

A Bounded Partial Queue head tail deqLock enqLock Permission to enqueue 8 items permits 8 Lock out other enq() calls Lock out other deq() calls First actual item Sentinel

Enqueuer head tail deqLock enqLock permits 8 Lock enqLock Read permits OK No need to lock tail?

Enqueuer head tail deqLock enqLock permits 8 Enqueue Node 7 getAndDecrement() (Why atomic?)

Enqueuer head tail deqLock enqLock permits 8 Release lock 7 If queue was empty, notify waiting dequeuers

Unsuccesful Enqueuer head tail deqLock enqLock permits 0 Uh-oh Read permits

Dequeuer head tail deqLock enqLock permits 7 Lock deqLock Read sentinel’s next field OK

Dequeuer head tail deqLock enqLock permits 7 Read value

Dequeuer head tail deqLock enqLock permits 7 Make first Node new sentinel

Dequeuer head tail deqLock enqLock permits 7 Release deqLock

Dequeuer head tail deqLock enqLock permits 8 Increment permits (no need lock?) Answer: we had to hold the lock while enqueuing to prevent lots of enqueuers from proceeding without noticing that the capacity had been exceeded. Dequeuers will notice the queue is empty when they observe that the sentinel’s next field is null

Unsuccesful Dequeuer head tail deqLock enqLock permits 8 Read sentinel’s next field uh-oh

Bounded Queue public class BoundedQueue { ReentrantLock enqLock, deqLock; Condition notEmptyCondition, notFullCondition; AtomicInteger permits; Node head; Node tail; int capacity; enqLock = new ReentrantLock(); notFullCondition = enqLock.newCondition(); deqLock = new ReentrantLock(); notEmptyCondition = deqLock.newCondition(); }

The ReentrantLock is a monitor ( The mechanism that Java uses to support synchronization ). Allows blocking on a condition rather than spinning. How do we use it? (* More on monitors: )

Lock Conditions public interface Condition { void await(); boolean await(long time, TimeUnit unit); … void signal(); void signalAll(); }

Await Releases lock associated with q Sleeps (gives up processor) Awakens (resumes running) Reacquires lock & returns q.await()

Signal Awakens one waiting thread Which will reacquire lock q.signal();

A Monitor Lock Critical Section waiting room Lock() unLock()

Unsuccessful Deq Critical Section waiting room Lock() await() Deq() Oh no, Empty!

Another One Critical Section waiting room Lock() await() Deq() Oh no, Empty!

Enqueur to the Rescue Critical Section waiting room Lock() signalAll() Enq( ) unLock() Yawn!

Monitor Signalling Critical Section waiting room Yawn! Awakend thread might still lose lock to outside contender…

Dequeurs Signalled Critical Section waiting room Found it Yawn!

Dequeurs Signalled Critical Section waiting room Still empty!

Dollar Short + Day Late Critical Section waiting room

Why not signal()?

Lost Wake-Up Critical Section waiting room Lock() signal () Enq( ) unLock() Yawn!

Lost Wake-Up Critical Section waiting room Lock() Enq( ) unLock() Yawn!

Lost Wake-Up Critical Section waiting room Yawn!

Lost Wake-Up Critical Section waiting room Found it

What’s Wrong Here? Critical Section waiting room zzzz….!

Enq Method public void enq(T x) { boolean mustWakeDequeuers = false; enqLock.lock(); try { while (permits.get() == 0) notFullCondition.await(); Node e = new Node(x); tail.next = e; tail = e; if (permits.getAndDecrement() == capacity) mustWakeDequeuers = true; } finally { enqLock.unlock(); } … }

Cont… public void enq(T x) { … if (mustWakeDequeuers) { deqLock.lock(); try { notEmptyCondition.signalAll(); } finally { deqLock.unlock(); }

The Enq() & Deq() Methods Share no locks That’s good But do share an atomic counter Accessed on every method call That’s not so good Can we alleviate this bottleneck? What is the problem?

Split the Counter The enq() method Decrements only Cares only if value is zero The deq() method Increments only Cares only if value is capacity

Split Counter Enqueuer decrements enqSidePermits Dequeuer increments deqSidePermits When enqueuer runs out Locks deqLock Transfers permits (dequeuer doesn't need permits- check head.next) Intermittent( תקופתי ) synchronization Not with each method call Need both locks! (careful …)

An Unbounded Total Queue Queue can hold an unbounded number of items. The enq() method always enqueues its item. The deq() throws EmptyException if there is no item to dequeue. No deadlock- each method acquires only one lock. Both the enq() and deq() methods are total as they do not wait for the queue to become empty or full.

An Unbounded Total Queue

A Lock-Free Queue Sentinel head tail Extension of the unbounded total queue Quicker threads help the slower threads Each node’s next field is an: AtomicReference The queue itself consists of two AtomicReference fields: head and tail

Compare and Set CAS

LockFreeQueue class

Enqueue head tail Enq( )

Enqueue head tail

Logical Enqueue head tail CAS

Physical Enqueue head tail Enqueue Node CAS

Enqueue These two steps are not atomic The tail field refers to either Actual last Node (good) Penultimate* Node (not so good) Be prepared! (*Penultimate :next to the last)

Enqueue What do you do if you find A trailing tail? Stop and fix it If tail node has non-null next field CAS the queue’s tail field to tail.next

When CASs Fail During logical enqueue Abandon hope, restart During physical enqueue Ignore it (why?)

LockFreeQueue class

Enq() Creates a new node with the new value to be enqueued reads tail, and finds the node that appears to be last checks whether that node has a successor If not - appends the new node by calling compareAndSet() If the compareAndSet() succeeds, the thread uses a second compareAndSet() to advance tail to the new node second compareAndSet() call fails, the thread can still return successfully If the tail node has a successor, then the method tries to “help”other threads by advancing tail to refer directly to the successor before trying again to insert its own node.

Dequeuer head tail Read value

Dequeuer head tail Make first Node new sentinel CAS

What is the problem here?

LockFreeQueue class

Deq() If the queue is nonempty(the next field of the head node is not null), the dequeuer calls compareAndSet() to change head from the sentinel node to its successor before advancing head one must make sure that tail is not left referring to the sentinel node which is about to be removed from the queue test: if head equals tail and the (sentinel) node they refer to has a non-null next field, then the tail is deemed to be lagging behind. deq() then attempts to help make tail consistent by swinging it to the sentinel node’s successor, and only then updates head to remove the sentinel

Summary A thread fails to enqueue or dequeue a node only if another thread’s method call succeeds in changing the reference, so some method call always completes. As it turns out, being lock-free substantially enhances the performance of queue implementations, and the lock-free algorithms tend to outperform the most efficient blocking ones.