Spinlocks and all the rest. Synchronization Overview Cache coherency Single versus Multi-core Under versus Oversubscribed Atomic operations …

Slides:



Advertisements
Similar presentations
1 Episode III in our multiprocessing miniseries. Relaxed memory models. What I really wanted here was an elephant with sunglasses relaxing On a beach,
Advertisements

1 Interprocess Communication 1. Ways of passing information 2. Guarded critical activities (e.g. updating shared data) 3. Proper sequencing in case of.
Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas.
Scalable Flat-Combining Based Synchronous Queues Danny Hendler, Itai Incze, Nir Shavit and Moran Tzafrir Presentation by Uri Golani.
1 Lecture 20: Synchronization & Consistency Topics: synchronization, consistency models (Sections )
1 Synchronization A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast. Types of Synchronization.
Synchronization without Contention
The University of Adelaide, School of Computer Science
Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.
John M. Mellor-Crummey Algorithms for Scalable Synchronization on Shared- Memory Multiprocessors Joseph Garvey & Joshua San Miguel Michael L. Scott.
CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.
ECE 454 Computer Systems Programming Parallel Architectures and Performance Implications (II) Ding Yuan ECE Dept., University of Toronto
Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.
Chapter 6: Process Synchronization
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores.
Synchronization without Contention John M. Mellor-Crummey and Michael L. Scott+ ECE 259 / CPS 221 Advanced Computer Architecture II Presenter : Tae Jun.
Scalable Synchronous Queues By William N. Scherer III, Doug Lea, and Michael L. Scott Presented by Ran Isenberg.
Parallel Processing (CS526) Spring 2012(Week 6).  A parallel algorithm is a group of partitioned tasks that work with each other to solve a large problem.
1 Lecture 21: Synchronization Topics: lock implementations (Sections )
Chapter 6: Process Synchronization. 6.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Feb 8, 2005 Objectives Understand.
Concurrency: Mutual Exclusion, Synchronization, Deadlock, and Starvation in Representative Operating Systems.
Synchronization Todd C. Mowry CS 740 November 1, 2000 Topics Locks Barriers Hardware primitives.
Instructor: Umar KalimNUST Institute of Information Technology Operating Systems Process Synchronization.
1 Lecture 20: Protocols and Synchronization Topics: distributed shared-memory multiprocessors, synchronization (Sections )
Operating Systems CSE 411 CPU Management Oct Lecture 13 Instructor: Bhuvan Urgaonkar.
Chapter 6 Concurrency: Deadlock and Starvation Operating Systems: Internals and Design Principles, 6/E William Stallings Dave Bremer Otago Polytechnic,
CS510 Concurrent Systems Introduction to Concurrency.
Understanding Performance of Concurrent Data Structures on Graphics Processors Daniel Cederman, Bapi Chatterjee, Philippas Tsigas Distributed Computing.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
Chapter 6: Synchronization. 6.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Module 6: Synchronization Background The Critical-Section.
Jeremy Denham April 7,  Motivation  Background / Previous work  Experimentation  Results  Questions.
Kernel Locking Techniques by Robert Love presented by Scott Price.
Chapter 6: Process Synchronization. 6.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Module 6: Process Synchronization Background The.
Chapter 6: Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware.
CS399 New Beginnings Jonathan Walpole. 2 Concurrent Programming & Synchronization Primitives.
Operating Systems CSE 411 CPU Management Dec Lecture Instructor: Bhuvan Urgaonkar.
1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )
SYNAR Systems Networking and Architecture Group CMPT 886: The Art of Scalable Synchronization Dr. Alexandra Fedorova School of Computing Science SFU.
Scalable lock-free Stack Algorithm Wael Yehia York University February 8, 2010.
Spin Locks and Contention Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
Queue Locks and Local Spinning Some Slides based on: The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.
CS510 Concurrent Systems Jonathan Walpole. Introduction to Concurrency.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Chapter 6: Process Synchronization.
Mutual Exclusion -- Addendum. Mutual Exclusion in Critical Sections.
Multiprocessors – Locks
Chapter 6: Process Synchronization
CS703 – Advanced Operating Systems
Lecture 19: Coherence and Synchronization
Lecture 5: Synchronization
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Chapter 5: Process Synchronization
Anders Gidenstam Håkan Sundell Philippas Tsigas
The University of Adelaide, School of Computer Science
Cache Coherence Protocols 15th April, 2006
Lecture 21: Synchronization and Consistency
Lecture: Coherence and Synchronization
Lecture 4: Synchronization
Kernel Synchronization II
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
CSE 153 Design of Operating Systems Winter 19
CS333 Intro to Operating Systems
Lecture 17 Multiprocessors and Thread-Level Parallelism
Lecture: Coherence and Synchronization
Lecture 19: Coherence and Synchronization
Lecture 18: Coherence and Synchronization
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

Spinlocks and all the rest

Synchronization Overview Cache coherency Single versus Multi-core Under versus Oversubscribed Atomic operations …

Synchronization Overview Spinlock acquire_lock(lock) { while (TAS(lock) == true); } TAS – test and set Puts true in address, returns old value

Synchronization Mellor-Crummey, Scott 1991 Analyzed spinlocks and barriers Linear, Proportional, Exponential Backoff Ticket locks -> now serving Proposed the mcs lock, a queue based lock

Overview Synchronization Types to be Discussed Further Developments Implementation Details

Types to be Discussed Mutual Exclusion Spinlock Mutex Reader Writer Lock Execution Point Barrier Queues, etc (time permitting)

Spinlocks Spin until lock is acquired Simple Implementation Contention on lock

Queued Spinlock Create a local lock Spin on it On release, signal next waiter Additional operations Reduced contention

Mutex Wait to acquire May use thread scheduler to wait

Reader Writer Lock Readers can operate simultaneously with other readers Only writers cause problems Often spinlock plus count of readers

Barrier Keep a group of threads in sync Barrier has to recognize two events Old barrier as some threads may not be active New barrier as threads may have reached it

Further Developments

Scalable RW Lock Modification to MCS lock Count of Readers + Writer Waiting Flag Queue of waiting threads Readers unblock readers on acquire Writers unblock next thread on release John M. Mellor-Crummey and Michael L. Scott Scalable reader-writer synchronization for shared-memory multiprocessors. In Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming (PPOPP '91). ACM, New York, NY, USA,

Scalable RW Lock cont. Split up the reader access Since readers can acquire the lock with readers, have multiple locks Writers, however, need all of the reader locks Wilson C. Hsieh and William E. Weihl Scalable Reader-Writer Locks for Parallel Systems. In Proceedings of the 6th International Parallel Processing Symposium, Viktor K. Prasanna and Larry H. Canter (Eds.). IEEE Computer Society, Washington, DC, USA,

Scalable RW Lock cont. Or use a C-SNZI Closable scalable nonzero indicator Like a semaphore, but can be closed What about write upgrade? Yossi Lev, Victor Luchangco, and Marek Olszewski Scalable reader-writer locks. In Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures (SPAA '09). ACM, New York, NY, USA,

Biased Locks First and second class citizens Like readers / writers, but all exclusive Secondary locks request the lock Primary holder grants them the lock Nalini Vasudevan, Kedar S. Namjoshi, and Stephen A. Edwards Simple and fast biased locks. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques (PACT '10). ACM, New York, NY, USA,

MCS Extensions Queue based locks What if threads are preempted? Add a time component to the lock Stale elements are skipped Michael L. Scott and William N. Scherer Scalable queue- based spin locks with timeout. In Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming (PPoPP '01). ACM, New York, NY, USA, B. He, W. N. Scherer III, and M. L. Scott. Preemption Adaptivity in Time-Published Queue-Based Spin Locks, 11th Intl. Conf. on High Performance Computing, Goa, India, Dec

Spinning vs Blocking Spinning = busy-waiting Blocking = thread scheduling What is the trade-off between the two schemes? Tested Solaris pthread implementation that does both Ryan Johnson, Manos Athanassoulis, Radu Stoica, and Anastasia Ailamaki A new look at the roles of spinning and blocking. In Proceedings of the Fifth International Workshop on Data Management on New Hardware (DaMoN '09). ACM, New York, NY, USA,

Trees, etc Barriers Lots of threads all signaling a single count Sounds bad Signal and Wakeup trees, with different degrees

Hardware Supported Barriers Introduce dedicated on-chip connections Single Centralized Controller Transmission lines Jungju Oh, Milos Prvulovic, and Alenka Zajic TLSync: support for multiple fast barriers using on-chip transmission lines. In Proceeding of the 38th annual international symposium on Computer architecture (ISCA '11). ACM, New York, NY, USA,

Implementation Details

Architectural Primitives Compare and Swap(mem, old, new) If (*mem == old) *mem = new Return what was in mem LL/SC LL – load value SC to same address succeeds only if data unmodified

Test and Test-and-Set Synchronization instructions are expensive So dont do them until likely to succeed Test the lock, then Test-and-set the lock Caveat emptor Can lead to races if used incorrectly Can save time like TryToAcquire rather than release

Queued Spinlock Details void acquire_queued_spinlock(void* lock, entry* me) { me->next = NULL; me->state = UNLOCKED; entry* prev = atomic_swap(lock, me); if (prev == NULL) return; me->state = LOCKED; prev->next = me; while (me->state == LOCKED); }

Queued Spinlock Details cont void release_queued_spinlock(void* lock, entry* me) { while (me->next == NULL) { if (me == CAS(lock, me, NULL)) return; } me->next->state = UNLOCKED; }

Bibliography Dave Dice, Virendra J. Marathe, and Nir Shavit Flat-combining NUMA locks. In Proceedings of the 23rd ACM symposium on Parallelism in algorithms and architectures (SPAA '11). ACM, New York, NY, USA, B. He, W. N. Scherer III, and M. L. Scott. Preemption Adaptivity in Time-Published Queue-Based Spin Locks, 11th Intl. Conf. on High Performance Computing, Goa, India, Dec Wilson C. Hsieh and William E. Weihl Scalable Reader-Writer Locks for Parallel Systems. In Proceedings of the 6th International Parallel Processing Symposium, Viktor K. Prasanna and Larry H. Canter (Eds.). IEEE Computer Society, Washington, DC, USA, Ryan Johnson, Manos Athanassoulis, Radu Stoica, and Anastasia Ailamaki A new look at the roles of spinning and blocking. In Proceedings of the Fifth International Workshop on Data Management on New Hardware (DaMoN '09). ACM, New York, NY, USA, Yossi Lev, Victor Luchangco, and Marek Olszewski Scalable reader-writer locks. In Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures (SPAA '09). ACM, New York, NY, USA, Peter S. Magnusson, Anders Landin, and Erik Hagersten Queue Locks on Cache Coherent Multiprocessors. In Proceedings of the 8th International Symposium on Parallel Processing, Howard Jay Siegel (Ed.). IEEE Computer Society, Washington, DC, USA,

Bibliography cont John M. Mellor-Crummey and Michael L. Scott Algorithms for scalable synchronization on shared-memory multiprocessors. ACM Trans. Comput. Syst. 9, 1 (February 1991), John M. Mellor-Crummey and Michael L. Scott Scalable reader- writer synchronization for shared-memory multiprocessors. In Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming (PPOPP '91). ACM, New York, NY, USA, Jungju Oh, Milos Prvulovic, and Alenka Zajic TLSync: support for multiple fast barriers using on-chip transmission lines. In Proceeding of the 38th annual international symposium on Computer architecture (ISCA '11). ACM, New York, NY, USA, Michael L. Scott and William N. Scherer Scalable queue-based spin locks with timeout. In Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming (PPoPP '01). ACM, New York, NY, USA, Nalini Vasudevan, Kedar S. Namjoshi, and Stephen A. Edwards Simple and fast biased locks. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques (PACT '10). ACM, New York, NY, USA,

Lock free list Store head pointer Atomic update head void push(node head, node n) { now = old = *head do { old = now n->next = old } while ((now = CAS(head, old, n)) != old) }

ABA Problem Push C // pending Pop A Pop B Push A // Does Push C complete successfully now?

ABA Problem cont. Pop A // pending Pop A Pop B Push A Does Pop A succeed?