A Two-Lock Concurrent Queue Algorithm Maged M. Michael, Michael L. Scott University of Rochester Presented by Hussain Tinwala.

Slides:



Advertisements
Similar presentations
1 Synchronization A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast. Types of Synchronization.
Advertisements

Synchronization without Contention
Wait-Free Queues with Multiple Enqueuers and Dequeuers
1 Chapter 5 Concurrency: Mutual Exclusion and Synchronization Principals of Concurrency Mutual Exclusion: Hardware Support Semaphores Readers/Writers Problem.
1 Chapter 4 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.
Architecture-aware Analysis of Concurrent Software Rajeev Alur University of Pennsylvania Amir Pnueli Memorial Symposium New York University, May 2010.
Chapter 8-1 : Multiple Processor Systems Multiple Processor Systems Multiple Processor Systems Multiprocessor Hardware Multiprocessor Hardware UMA Multiprocessors.
Maged M. Michael, “Hazard Pointers: Safe Memory Reclamation for Lock- Free Objects” Presentation Robert T. Bauer.
Local-spin, Abortable Mutual Exclusion Joe Rideout.
Synchronization without Contention John M. Mellor-Crummey and Michael L. Scott+ ECE 259 / CPS 221 Advanced Computer Architecture II Presenter : Tae Jun.
Parallel Processing (CS526) Spring 2012(Week 6).  A parallel algorithm is a group of partitioned tasks that work with each other to solve a large problem.
Iterative Context Bounding for Systematic Testing of Multithreaded Programs Madan Musuvathi Shaz Qadeer Microsoft Research.
Multiple Processor Systems
Transactional Memory Overview Olatunji Ruwase Fall 2007 Oct
Concurrent Data Structures in Architectures with Limited Shared Memory Support Ivan Walulya Yiannis Nikolakopoulos Marina Papatriantafilou Philippas Tsigas.
Transactional Memory Yujia Jin. Lock and Problems Lock is commonly used with shared data Priority Inversion –Lower priority process hold a lock needed.
The Performance of Spin Lock Alternatives for Shared-Memory Microprocessors Thomas E. Anderson Presented by David Woodard.
CS510 Concurrent Systems Class 1b Spin Lock Performance.
1 Lecture 21: Synchronization Topics: lock implementations (Sections )
CS510 Advanced OS Seminar Class 10 A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy.
Simple, Fast, and Practical Non- Blocking and Blocking Concurrent Queue Algorithms Presenter: Jim Santmyer By: Maged M. Micheal Michael L. Scott Department.
Synchronization Todd C. Mowry CS 740 November 1, 2000 Topics Locks Barriers Hardware primitives.
CS533 - Concepts of Operating Systems 1 Class Discussion.
1 Lecture 20: Protocols and Synchronization Topics: distributed shared-memory multiprocessors, synchronization (Sections )
Multiprocessor Cache Coherency
Symmetric Multiprocessors and Performance of SpinLock Techniques Based on Anderson’s paper “Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors”
CS 153 Design of Operating Systems Spring 2015 Lecture 11: Scheduling & Deadlock.
Understanding Performance of Concurrent Data Structures on Graphics Processors Daniel Cederman, Bapi Chatterjee, Philippas Tsigas Distributed Computing.
1 Nasser Alsaedi. The ultimate goal for any computer system design are reliable execution of task and on time delivery of service. To increase system.
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors THOMAS E. ANDERSON Presented by Daesung Park.
MULTIVIE W Slide 1 (of 23) The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors Paper: Thomas E. Anderson Presentation: Emerson.
Synchronization Transformations for Parallel Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara.
Maged M.Michael Michael L.Scott Department of Computer Science Univeristy of Rochester Presented by: Jun Miao.
CompSci 143A1 5. Process and thread scheduling 5.1 Organization of Schedulers – Embedded and Autonomous Schedulers – Priority Scheduling 5.2 Scheduling.
DISTRIBUTED ALGORITHMS AND SYSTEMS Spring 2014 Prof. Jennifer Welch CSCE
A Methodology for Creating Fast Wait-Free Data Structures Alex Koganand Erez Petrank Computer Science Technion, Israel.
Executing Parallel Programs with Potential Bottlenecks Efficiently Yoshihiro Oyama Kenjiro Taura Akinori Yonezawa {oyama, tau,
Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects Maged M. Michael Presented by Abdulai Sei.
Page 1 Concurrency Control Paul Krzyzanowski Distributed Systems Except as otherwise noted, the content of this presentation.
The read-copy-update mechanism for supporting real-time applications on shared-memory multiprocessor systems with Linux Guniguntala et al.
Performance Performance is about time and the software system’s ability to meet timing requirements.
A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy Slides by Vincent Rayappa.
1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )
Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects MAGED M. MICHAEL PRESENTED BY NURIT MOSCOVICI ADVANCED TOPICS IN CONCURRENT PROGRAMMING,
Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Queue Locks and Local Spinning Some Slides based on: The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Lecture 4 CPU scheduling. Basic Concepts Single Process  one process at a time Maximum CPU utilization obtained with multiprogramming CPU idle :waiting.
CPU scheduling.  Single Process  one process at a time  Maximum CPU utilization obtained with multiprogramming  CPU idle :waiting time is wasted 2.
EE 382 Processor DesignWinter 98/99Michael Flynn 1 EE382 Processor Design Winter 1998 Chapter 8 Lectures Multiprocessors, Part I.
Slide 1 Insert your own content.. Slide 2 Insert your own content.
Lecture 20: Consistency Models, TM
Lock-Free Linked Lists Using Compare-and-Swap
Department of Computer Science, University of Rochester
Lecture 18: Coherence and Synchronization
Reactive Synchronization Algorithms for Multiprocessors
Multiprocessor Cache Coherency
CS510 Concurrent Systems Jonathan Walpole.
CS510 Concurrent Systems Jonathan Walpole.
Designing Parallel Algorithms (Synchronization)
Symmetric Multiprocessing (SMP)
Slide 1 Insert your own content.. Slide 1 Insert your own content.
Yiannis Nikolakopoulos
Slide 1 Insert your own content.. Slide 1 Insert your own content.
Slide 1 Insert your own content.. Slide 1 Insert your own content.
Lecture 22: Consistency Models, TM
Lecture: Coherence and Synchronization
CS510 Concurrent Systems Jonathan Walpole.
Lecture: Consistency Models, TM
Lecture 19: Coherence and Synchronization
Lecture 18: Coherence and Synchronization
Presentation transcript:

A Two-Lock Concurrent Queue Algorithm Maged M. Michael, Michael L. Scott University of Rochester Presented by Hussain Tinwala

Two-Lock Algorithm Goals Provide a higher degree of concurrency Use better blocking techniques Improve performance on shared memory multiprocessor machines

Queue Operations: Enqueue Enqueue Head (1) Tail (4) Node (2) Node (3) 1) Create a new node Node (5) 2) Insert the node and update the Tail Head (1) Node (4) Node (2) Node (3) Tail (5)

Queue Operations: Dequeue Dequeue Head (1) Node (4) Node (2) Node (3) Tail (5) 2) Free the first node Node (4) Node (2) Node (3) Tail (5) 1) Update the Head Node (1) Node (4) Head (2) Node (3) Tail (5)

What do single-lock algorithms do? They lock the entire queue Example: 10 processes want to operate on the queue at the same time {P 1, …, P 10 } High Contention Head (1) Tail (4) Node (2) Node (3) P1P1 P2P2 P3P3 P4P4 P5P5 P6P6 P7P7 P8P8 P9P9 P 10

What do two locks do? (1) Only lock the Head node or the Tail node Enqueue: only needs to read/write the Tail node Dequeue: only needs to read/write the Head node

What do two locks do? (2) Example: 10 processes. Enqueuing processes {E 1, …, E 5 } Dequeuing processes {D 1, …, D 5 } Head (1) Tail (4) Node (2) Node (3) D1D1 D3D3 D4D4 D5D5 E1E1 E2E2 E3E3 E5E5 D2D2 E4E4 Increases Concurrency And Decreases Contention

Two Critical Mechanisms There are two kinds of processes: 1.Processes that have acquired a lock 2.Processes that are trying to acquire a lock Issues: -Dealing with preemption of processes with locks (Preemption-safe locking) -Dealing with processes that keep trying to acquire a lock (Bounded exponential backoff)

Preemption-Safe Locking or Temporary Non-Preemption (processes holding locks) Note: There is a bound on how much extra time the process can take after which the scheduler automatically forces preemption.

Bounded Exponential Backoff (processes trying to acquire a lock) Bounded Exponential Backoff is an algorithm that uses feedback to multiplicatively decrease the rate of some process. Repeated attempts are exponentially delayed (1, 2, 4, 8…) up to a predefined bound. The feedback here is that the process fails to acquire the lock

Performance of Backoff From: T. E. Anderson. The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors. IEEE Transactions on Parallel and Distributed Systems.

Single-Lock v. Two-Lock (1) The two techniques: preemption-safe locking and bounded exponential backoff can also be used with a single-lock algorithm If both algorithms (single-lock and two- lock) use the two techniques, who wins? Performance depends on: –Number of processors –Number of programs per processor (level of multiprogramming)

Single Lock v. Two Lock (2) Dedicated Multiprocessor: One process per processor Crossover points 2 processes/processor: 5 3 processes / processor: 7

Verdict It depends Single-lock better at MP=1, #Processors=2 due to cache misses Higher degree of multiprogramming means higher chance of preemption while holding a lock, therefore, two-lock suffers Two-lock good candidate for dedicated multiprocessor machines

End of Slide Show Questions/Comments?