Scalable Reader-Writer Synchronization for Shared- Memory Multiprocessors Mellor-Crummey and Scott Presented by Robert T. Bauer.

Slides:



Advertisements
Similar presentations
Operating Systems Semaphores II
Advertisements

CS 603 Process Synchronization: The Colored Ticket Algorithm February 13, 2002.
1 Lecture 20: Synchronization & Consistency Topics: synchronization, consistency models (Sections )
1 Synchronization A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast. Types of Synchronization.
Synchronization without Contention
Operating Systems Part III: Process Management (Process Synchronization)
Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.
Concurrency: Mutual Exclusion and Synchronization Chapter 5.
Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.
Mutual Exclusion By Shiran Mizrahi. Critical Section class Counter { private int value = 1; //counter starts at one public Counter(int c) { //constructor.
Chapter 6 Process Synchronization Bernard Chen Spring 2007.
Chapter 6: Process Synchronization
Background Concurrent access to shared data can lead to inconsistencies Maintaining data consistency among cooperating processes is critical What is wrong.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
Multiprocessor Synchronization Algorithms ( ) Lecturer: Danny Hendler The Mutual Exclusion problem.
Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores.
CH7 discussion-review Mahmoud Alhabbash. Q1 What is a Race Condition? How could we prevent that? – Race condition is the situation where several processes.
Synchronization without Contention John M. Mellor-Crummey and Michael L. Scott+ ECE 259 / CPS 221 Advanced Computer Architecture II Presenter : Tae Jun.
CS 5704 Fall 00 1 Monitors in Java Model and Examples.
Scalable Synchronous Queues By William N. Scherer III, Doug Lea, and Michael L. Scott Presented by Ran Isenberg.
Parallel Processing (CS526) Spring 2012(Week 6).  A parallel algorithm is a group of partitioned tasks that work with each other to solve a large problem.
Loops (Part 1) Computer Science Erwin High School Fall 2014.
Concurrency (3) CSE 132. iClicker/WUTexter Question The following four numbers are in 8-bit 2’s complement form: Which.
Concurrency in Shared Memory Systems Synchronization and Mutual Exclusion.
The Performance of Spin Lock Alternatives for Shared-Memory Microprocessors Thomas E. Anderson Presented by David Woodard.
6: Process Synchronization 1 1 PROCESS SYNCHRONIZATION I This is about getting processes to coordinate with each other. How do processes work with resources.
Concurrency: Deadlock and Starvation Chapter 6. Revision Describe three necessary conditions for deadlock Which condition is the result of the three necessary.
1 Lecture 21: Synchronization Topics: lock implementations (Sections )
Concurrency: Mutual Exclusion, Synchronization, Deadlock, and Starvation in Representative Operating Systems.
Scalable Reader Writer Synchronization John M.Mellor-Crummey, Michael L.Scott.
CS533 - Concepts of Operating Systems 1 Class Discussion.
CS533 - Concepts of Operating Systems 1 CS533 Concepts of Operating Systems Class 8 Synchronization on Multiprocessors.
Instructor: Umar KalimNUST Institute of Information Technology Operating Systems Process Synchronization.
Operating Systems CSE 411 CPU Management Oct Lecture 13 Instructor: Bhuvan Urgaonkar.
Synchronization (Barriers) Parallel Processing (CS453)
Chapter 6 Concurrency: Deadlock and Starvation Operating Systems: Internals and Design Principles, 6/E William Stallings Dave Bremer Otago Polytechnic,
Tutorial 5 Even More Synchronization! presented by: Antonio Maiorano Paul Di Marco.
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors THOMAS E. ANDERSON Presented by Daesung Park.
Jeremy Denham April 7,  Motivation  Background / Previous work  Experimentation  Results  Questions.
1 Program5 Due Friday, March Prog4 user_thread... amount = … invoke delegate transact (amount)... mainThread... Total + = amount … user_thread...
CALTECH cs184c Spring DeHon CS184c: Computer Architecture [Parallel and Multithreaded] Day 10: May 8, 2001 Synchronization.
Kernel Locking Techniques by Robert Love presented by Scott Price.
DOUBLE INSTANCE LOCKING A concurrency pattern with Lock-Free read operations Pedro Ramalhete Andreia Correia November 2013.
Operating Systems CMPSC 473 Mutual Exclusion Lecture 11: October 5, 2010 Instructor: Bhuvan Urgaonkar.
Monitors and Blocking Synchronization Dalia Cohn Alperovich Based on “The Art of Multiprocessor Programming” by Herlihy & Shavit, chapter 8.
1 Condition Variables CS 241 Prof. Brighten Godfrey March 16, 2012 University of Illinois.
Caltech CS184 Spring DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 22: May 20, 2005 Synchronization.
Concurrency in Shared Memory Systems Synchronization and Mutual Exclusion.
CS 2200 Presentation 18b MUTEX. Questions? Our Road Map Processor Networking Parallel Systems I/O Subsystem Memory Hierarchy.
CS4315A. Berrached:CMS:UHD1 Process Synchronization Chapter 8.
Queue Locks and Local Spinning Some Slides based on: The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Semaphores Chapter 6. Semaphores are a simple, but successful and widely used, construct.
Homework-6 Questions : 2,10,15,22.
Mutual Exclusion -- Addendum. Mutual Exclusion in Critical Sections.
Chapter 6 Synchronization Dr. Yingwu Zhu. The Problem with Concurrent Execution Concurrent processes (& threads) often access shared data and resources.
Chapter 5 Concurrency: Mutual Exclusion and Synchronization Operating Systems: Internals and Design Principles, 6/E William Stallings Patricia Roy Manatee.
Background on the need for Synchronization
Lecture 19: Coherence and Synchronization
Chapter 5: Process Synchronization
Designing Parallel Algorithms (Synchronization)
Last Week Introduced operating systems Discussed the Kernel
Lecture 21: Synchronization and Consistency
Synchronization Hank Levy 1.
Lecture: Coherence and Synchronization
Semaphores Chapter 6.
Synchronization Hank Levy 1.
CSE 153 Design of Operating Systems Winter 19
Lecture: Coherence and Synchronization
Lecture 18: Coherence and Synchronization
Presentation transcript:

Scalable Reader-Writer Synchronization for Shared- Memory Multiprocessors Mellor-Crummey and Scott Presented by Robert T. Bauer

Problem Efficient SMMP Reader/Writer Synchronization

Basics Readers can “share” a data structure Writers need exclusive access –Write appears to be atomic Issues: –Fairness: Fair  every “process” eventually runs –Preference: Reader preference  Writer can starve Writer preference  Reader can starve

Organization Algorithm 1 – simple mutual exclusion Algorithm 2 – RW with reader preference Algorithm 3 – A fair lock Algoirthm 4 – local only spinning (Fair) Algorithm 5– local only reader preference Algorithm 6 – local only writer preference Conclusions Paper’s Contrib

Algorithm I – just a spin lock Idea is that processors spin on their own lock record Lock records form a linked list When a lock is released, the “next” processor waiting on the lock is signaled by passing the lock By using “compare-swap” when releasing, the algorithm guarantees FIFO Spinning is “local” by design

Algorithm 1 Acquire Lock pred := fetch_and_store(L, I) pred /= null  I->locked := true pred  next := I repeat while I  locked Release Lock I  next == null  compare_and_swap(L,I,null)  return repeat while I  next == null I  next  locked := false

Algorithm 2 – Simple RW lock with reader preference Bit 0 – writer active?Bit 31:1 – count of interested readers start_write – repeat until compare_and_swap(L,0, 0x1) start_read – atomic_add(L,2);repeat until ((L & 0x1) = 0) end_write – atomic_add(L, -1) end_read – atomic_add(L, -2)

Algorithm 3 – Fair Lock Writer CountReader Count start_write prev = fetch_clear_then_add(L  requests, MASK, 1) // ++ write requests repeat until completions = prev // wait for previous readers and writers to go first end_write – clear_then_add(L  completions, MASK,1) // ++ write completions start_read // ++ read request, get count of prev writers prev_writer = fetch_clear_then_add(L  requests, MASK, 1) & MASK repeat until (completions & MASK) = prev_writer // wait for prev writers to go first end_read – clear_then_add(L  completions, MASK,1) // ++ read completions Requests Writer CountReader Count Completions

So far so good, but … Algorithm 2 and 3 spin on a shared memory location. What we want is for the algorithms to spin on processor local variables. Note – results weren’t presented for Algorithms 2 and 3. We can guess the performance though, since we know the general characteristics of contention.

Algorithm 4 Fair R/W Lock: Local-Only Spinning Fairness Algorithm –read request granted when all previous write requests have completed –write request granted when all previous read and write requests have completed

Lock and Local Data Layout

Case 1: Just a Read Pred == nil Lock.tail  I Upon exit: Lock.tail  I Lock.reader_count == 1

Case 1: Exit Read next == nil Lock.tail  I, so cas ret T Lock.reader_count == 1 Lock.next_writer == nil Upon Exit: Lock.tail == nil Lock.reader_count == 0

Case 2: Overlapping Read After first read: Lock.tail  I 1 Lock.reader_count == 1 not nil !!!! pred  class == reading Pred->state == [false,none] Locked.reader_count == 2

Case 2: Overlapping Read After the 2nd read enters: Locked.tail  I 2 I 1  next == I 2

Case 2: Overlapping reads I1 finishes  next != nil I2 finishes  Locked.tail = nil count goes to zero after I1 and I2 finish

Case 3: Read Overlaps Write The previous cases weren’t interesting, but they did help us get familiar with the data structures and (some of) the code. Now we need to consider the case where a “write” has started, but a read is requested. The read should block (spin) until the write completes. We need to “prove” that the spinning occurs on a locally cached memory location.

Case 3: Read Overlaps Write The Write Upon exit: Locked.tail  I Locked.next_writer = nil I.class = writing, I.next = nil I.blocked = false, success… = none pred == nil reset blocked to false

Case 3: Read Overlaps Write The Read pred  class == writing wait here for write to complete

Case 3: Read Overlaps Write The Write Completes I.next  The Read Yes! Works, but is “uncomfortable” because concerns aren’t separated unlock the reader

Case 3: What if there were more than 1 reader? change the predecessor reader wait here Yes! Changed by the successor unblock the successor

Case 4: Write Overlaps Read Overlapping reads form a chain The overlapping write, “spins” waiting for the read chain to complete Reads that “enter” after the write as “enter”, but before the write completes (even while the write is “spinning”), form a chain following the write (as with case 3).

Case 4: Write Overlaps Read wait here

Algorithm 5  Reader Preference R/W Local-Only Spinning We’ll look at the Reader-Writer-Reader case and demonstrate that the second Reader completes before the Writer is signaled to start.

1 st Reader ++reader_count Waflag == 0  false 1 st reader just runs!

Overlapping Write queue the write Register writer interest, result not zero, since there is a reader We have a reader, so the cas fails. The writer blocks here waiting for a reader set blocked = false

2 nd Reader Still no active reader ++reader_count

Reader Completes Only last reader will satisfy equality Last reader to complete will set WAFLAG and unblock writer

Algorithm 6  Writer Preference R/W Local-Only Spinning We’ll look at the Writer-Reader-Writer case and demonstrate that the second Writer completes before the Reader is signaled to start.

1 st Writer 1 st writer

“set_next_writer” 1 st writer writer interested or active no readers, just writer writer should run

1 st Writer 1 st writer blocked = false, so writer starts

Reader put reader on queue “register” reader, see if there are writers wait here for writer to complete

2 nd Writer queue this write behind the other write and wait

Writer Completes start the queued write

Last Writer Completes clear write flags signal readers

Unblock Readers ++reader count, clear rdr’s interested no writers waiting or active empty the “waiting” reader list when this reader continues, it will unblock the “next” reader -- which will unblock the “next” reader, etc. reader count gets bumped

Results & Conclusion The authors reported results for a different algorithm than was presented here. The “algorithms” used were “more” costly in a multiprocessor environment; so they’re claiming that the algorithms presented here would be “better.”

Timing Results Latency is costly because of the number of atomic operations.