Kernel Synchronization II

Slides:



Advertisements
Similar presentations
1 Interprocess Communication 1. Ways of passing information 2. Guarded critical activities (e.g. updating shared data) 3. Proper sequencing in case of.
Advertisements

Global Environment Model. MUTUAL EXCLUSION PROBLEM The operations used by processes to access to common resources (critical sections) must be mutually.
CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.
Chapter 6: Process Synchronization
5.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 5: CPU Scheduling.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores.
Parallel Processing (CS526) Spring 2012(Week 6).  A parallel algorithm is a group of partitioned tasks that work with each other to solve a large problem.
Synchronization. Shared Memory Thread Synchronization Threads cooperate in multithreaded environments – User threads and kernel threads – Share resources.
PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.
Race Conditions CS550 Operating Systems. Review So far, we have discussed Processes and Threads and talked about multithreading and MPI processes by example.
Instructor: Umar KalimNUST Institute of Information Technology Operating Systems Process Synchronization.
Synchronization CSCI 444/544 Operating Systems Fall 2008.
Operating Systems CSE 411 CPU Management Oct Lecture 13 Instructor: Bhuvan Urgaonkar.
Chapter 6 Concurrency: Deadlock and Starvation Operating Systems: Internals and Design Principles, 6/E William Stallings Dave Bremer Otago Polytechnic,
CS510 Concurrent Systems Introduction to Concurrency.
Cosc 4740 Chapter 6, Part 3 Process Synchronization.
Implementing Synchronization. Synchronization 101 Synchronization constrains the set of possible interleavings: Threads “agree” to stay out of each other’s.
Chapter 28 Locks Chien-Chung Shen CIS, UD
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
Kernel Locking Techniques by Robert Love presented by Scott Price.
1 Interprocess Communication (IPC) - Outline Problem: Race condition Solution: Mutual exclusion –Disabling interrupts; –Lock variables; –Strict alternation.
Chapter 6: Process Synchronization. 6.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Module 6: Process Synchronization Background The.
CS399 New Beginnings Jonathan Walpole. 2 Concurrent Programming & Synchronization Primitives.
1 Previous Lecture Overview  semaphores provide the first high-level synchronization abstraction that is possible to implement efficiently in OS. This.
Implementing Lock. From the Previous Lecture  The “too much milk” example shows that writing concurrent programs directly with load and store instructions.
CSC 660: Advanced Operating SystemsSlide #1 CSC 660: Advanced OS Synchronization.
1 Critical Section Problem CIS 450 Winter 2003 Professor Jinhua Guo.
CS510 Concurrent Systems Jonathan Walpole. Introduction to Concurrency.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Chapter 6: Process Synchronization.
Read-Copy-Update Synchronization in the Linux Kernel 1 David Ferry, Chris Gill CSE 522S - Advanced Operating Systems Washington University in St. Louis.
Implementing Mutual Exclusion Andy Wang Operating Systems COP 4610 / CGS 5765.
Mutual Exclusion -- Addendum. Mutual Exclusion in Critical Sections.
December 1, 2006©2006 Craig Zilles1 Threads & Atomic Operations in Hardware  Previously, we introduced multi-core parallelism & cache coherence —Today.
Kernel Synchronization David Ferry, Chris Gill CSE 522S - Advanced Operating Systems Washington University in St. Louis St. Louis, MO
CSE 120 Principles of Operating
CS703 – Advanced Operating Systems
Process Synchronization
Background on the need for Synchronization
Synchronization.
Midterm Review David Ferry, Chris Gill
Atomic Operations in Hardware
Atomic Operations in Hardware
Chapter 5: Process Synchronization
Designing Parallel Algorithms (Synchronization)
Kernel Synchronization II
Semaphore Originally called P() and V() wait (S) { while S <= 0
Module 7a: Classic Synchronization
Lecture 2 Part 2 Process Synchronization
Coordination Lecture 5.
Top Half / Bottom Half Processing
Implementing Mutual Exclusion
Concurrency: Mutual Exclusion and Process Synchronization
Kernel Synchronization I
Implementing Mutual Exclusion
Userspace Synchronization
CSE 451: Operating Systems Autumn 2004 Module 6 Synchronization
CSE 451: Operating Systems Autumn 2003 Lecture 7 Synchronization
CSE 451: Operating Systems Autumn 2005 Lecture 7 Synchronization
CSE 451: Operating Systems Winter 2004 Module 6 Synchronization
CSE 451: Operating Systems Winter 2003 Lecture 7 Synchronization
CSE 153 Design of Operating Systems Winter 19
CSE 153 Design of Operating Systems Winter 2019
CS333 Intro to Operating Systems
Chapter 6: Synchronization Tools
Atomicity, Mutex, and Locks
Process/Thread Synchronization (Part 2)
CSE 542: Operating Systems
CSE 542: Operating Systems
Presentation transcript:

Kernel Synchronization II David Ferry, Chris Gill, Brian Kocoloski CSE 422S - Operating Systems Organization Washington University in St. Louis St. Louis, MO 63143

CSE 422S – Operating Systems Organization Race Conditions Recall the problem from the last lecture, if “X” is a regular integer and two threads want to increment its value; the final value should be X+2 Thread 1: X++ Thread 2: CSE 422S – Operating Systems Organization

CSE 422S – Operating Systems Organization Race Conditions Problem: if “X” is a regular integer and is accessed via Load/Store instructions, we have a ”race condition” Thread 1: load X X = X + 1 store X Thread 2: What are possible values for X after both threads have finished? X + 1 X + 2 CSE 422S – Operating Systems Organization

CSE 422S – Operating Systems Organization Race Conditions E.g., considering the following sequence of operations where threads 1 and 2 execute concurrently on two different cores Thread 1: load X X = X + 1 store X Thread 2: “finish time” CSE 422S – Operating Systems Organization

CSE 422S – Operating Systems Organization Race Conditions A race condition exists if the output of a program is non-deterministic Relies on the sequence or timing of events that are not programmatically controlled Parallel programs that execute deterministically need to be free of race conditions How might we go about resolving the problem in the previous example? CSE 422S – Operating Systems Organization

CSE 422S – Operating Systems Organization Atomic Operations We need to have stronger atomicity guarantees on the execution of multiple semantically separate operations Why the term “atomic” An atom was (originally) thought to be an indivisible particle The term atomic should be thought of as guaranteeing that a specified sequence of operations – which we say is “made atomic” --- either fully executes or does not execute at all CSE 422S – Operating Systems Organization

CSE 422S – Operating Systems Organization Atomicity Thread 1: load X X = X + 1 store X Thread 2: CSE 422S – Operating Systems Organization

CSE 422S – Operating Systems Organization Atomicity Thread 1: start_atomic(){ load X X = X + 1 store X end_atomic() Thread 2: Assume we have constructs start_atomic() and end_atomic() that make the load/increment/store sequence of instructions to be atomic Thus, we assume the operations of loading X, adding 1 to it, and storing its new value is a single indivisible operation CSE 422S – Operating Systems Organization

How to provide atomicity The Linux kernel provides several synchronization mechanisms to achieve atomicity. We will discuss the following, in order from coarsest granularity (i.e., atomizing large sequences of instructions) to finest (atomizing just an increment of an integer) The Big Kernel Lock (BKL) First lock introduced to the kernel Locked the entire kernel – only one thread at a time can execute kernel code Mutexes Spin locks Atomic variables CSE 422S – Operating Systems Organization

The BKL is No Longer Used The Big Kernel Lock (BKL) was the first lock introduced to the kernel Locked the entire kernel Provides correctness, but very slow New uses are prohibited Gradually replaced with finer-grained locking schemes CSE 422S – Operating Systems Organization

CSE 422S – Operating Systems Organization Spinlocks A spinlock is used to provide mutual exclusion to a critical section Conceptually, a spinlock can be thought of as an integer variable that can have only two values: 0: unlocked 1: locked How might a spinlock address our race condition problem? To acquire a spinlock, a thread must wait until its value is 0 (unlocked), then set it to 1 (locked) To release a spinlock, set the value to 0 (unlocked) CSE 422S – Operating Systems Organization

CSE 422S – Operating Systems Organization Spinlocks DEFINE_SPINLOCK(lock); // … Thread 1: spin_lock(&lock) load X X = X + 1 store X spin_unlock(&lock) Thread 2: CSE 422S – Operating Systems Organization

CSE 422S – Operating Systems Organization Spinlocks Semantics of locking a spinlock: void spin_lock(spinlock_t * lock) { while (lock->value == 1); lock->value = 1; } void spin_unlock(spinlock_t * lock) { lock->value = 0; CSE 422S – Operating Systems Organization

CSE 422S – Operating Systems Organization Spinlocks void spin_lock(spinlock_t * lock) { while (lock->value == 1); lock->value = 1; } void spin_unlock(spinlock_t * lock) { lock->value = 0; Sounds great …. or does it? Does anyone see any issues? What happens if multiple threads invoke spin_lock() at the “exact” same time? CSE 422S – Operating Systems Organization

CSE 422S – Operating Systems Organization Atomic Instructions We’re back to the same problem – we’ve just shifted the race condition from the integer X to the integer that is used to implement the spinlock The solution relies on hardware support for atomic operations Because this support is hardware specific, the specific instructions used to provide atomicity vary on different architectures x86: test_and_set(), compare_and_exchange(), etc. ARM: ldrex, strex CSE 422S – Operating Systems Organization

CSE 422S – Operating Systems Organization Test and set Semantics (remember: the whole thing is atomized by hardware) int test_and_set(int * v, int to) { int old = *v; *v = to; return old; } CSE 422S – Operating Systems Organization

CSE 422S – Operating Systems Organization Test and set How to implement spin_lock() with test-and-set? #define LOCKED 1 #define UNLOCKED 0 void spin_lock(spinlock_t * lock) { while (test_and_set(&(lock->value), ?) == ?); } void spin_unlock(spinlock_t * lock) { assert(test_and_set(&(lock->value), ?) == ?); CSE 422S – Operating Systems Organization

CSE 422S – Operating Systems Organization Test and set How to implement spin_lock() with test-and-set? #define LOCKED 1 #define UNLOCKED 0 void spin_lock(spinlock_t * lock) { while (test_and_set(&(lock->value), LOCKED) == LOCKED); } void spin_unlock(spinlock_t * lock) { assert(test_and_set(&(lock->value), ?) == ?); CSE 422S – Operating Systems Organization

CSE 422S – Operating Systems Organization Test and set How to implement spin_lock() with test-and-set? #define LOCKED 1 #define UNLOCKED 0 void spin_lock(spinlock_t * lock) { while (test_and_set(&(lock->value), LOCKED) == LOCKED); } void spin_unlock(spinlock_t * lock) { assert(test_and_set(&(lock->value), UNLOCKED) == LOCKED); CSE 422S – Operating Systems Organization

Compare-and-Exchange Semantics (remember: the whole thing is atomized by hardware) int compare_and_exchange(int * v, int from, int to) { int old = *v; if (old == from) { *v = to; } return old; CSE 422S – Operating Systems Organization

Compare-and-Exchange How to implement spin_lock() with compare-and-exchange? #define LOCKED 1 #define UNLOCKED 0 void spin_lock(spinlock_t * lock) { while (compare_and_exchange(&(lock->value), UNLOCKED, LOCKED) == LOCKED); } void spin_unlock(spinlock_t * lock) { assert(compare_and_exchange(&(lock->value), LOCKED, UNLOCKED) == LOCKED); CSE 422S – Operating Systems Organization

CSE 422S – Operating Systems Organization Atomics on ARM ARM does not provide either of these instructions It provides multiple instructions that are used with a monitor to provide synchronizaton ldrex: load a word from memory strex: conditional store of a word to memory CSE 422S – Operating Systems Organization

Semantics of ldrex/strex LDREX R1, p // (p <- memory address) ADD R1, 1 STREX R2, R1, p i.e., Load a word from memory address p into register R1 Increment register R1 by 1 Conditionally store the value to p Determine whether the value of p was stored by checking the return value stored in register R2 What if the return failed? Go back and retry the whole thing CSE 422S – Operating Systems Organization

Synchronization Constructs in the Linux Kernel Big Kernel Lock Global lock on entire kernel; deprecated today Atomic Variables Very short, hardware-implemented modifications to variables Spinlocks Simple mutually exclusive locks implemented with a simple atomic variable Semaphores (+ mutexes) Sleeping locks CSE 422S – Operating Systems Organization

Spinlocks vs Sleeping Locks Thread 1 Acquires spinlock <enters critical section> Releases spinlock Thread 2 Tries to acquire spinlock < spins (does not sleep) … …> Acquires spinlock <enters critical section> Releases spinlock CSE 422S – Operating Systems Organization

Spinlocks vs Sleeping Locks Thread 1 Acquires mutex <enters critical section> Releases mutex Must also wake up processes waiting to acquire the mutex Thread 2 Tries to acquire mutex < mutex not available, goes to sleep> … < woken up > Acquires mutex <enters critical section> Releases mutex CSE 422S – Operating Systems Organization

Sleeping while holding Locks Sleeping while holding a lock should be avoided (unless absolutely necessary) Delays other processes Deadlock risk: is it guaranteed to wake up (at least eventually) and release the lock? A process cannot sleep holding a spinlock Calling schedule() while holding a spinlock will trigger a kernel oops() Required to release lock, then sleep … or, use a mutex instead of a spin lock CSE 422S – Operating Systems Organization

CSE 422S – Operating Systems Organization Spinlocks vs Mutexes What to use (LKD pp. 197) Requirement Low overhead locking Short lock hold time Long lock hold time Need to lock from interrupt context Need to sleep while holding lock Solution Spin lock Mutex CSE 422S – Operating Systems Organization

When to Use Different Types of Locks Atomics - When shared data fits inside a single word of memory, or a lock-free synchronization protocol can be implemented Spin locks - very short lock durations and/or very low overhead requirements Mutexes - Long lock duration and/or process needs to sleep while holding lock Read-Copy-Update (RCU) - Modern locking approach that may improve performance (link provided on course website) CSE 422S – Operating Systems Organization

Synchronization Design Challenge A common kernel problem: Multiple threads share a data structure. Some are reading Some are writing Reads and writes should not interfere! Shared data CSE 422S – Operating Systems Organization

Synchronization Design Tradeoffs All synchronization methods need to balance the needs of readers and writers: Reads tend to be more common Lots of reads Few writes Balanced Reads and writes Lots of writes Few reads Synchronization can prevent concurrency… Reader/writer locks Mutual exclusion Or it can allow concurrency at the expense of overheads: Lock free / wait free algorithms Transactional memory CSE 422S – Operating Systems Organization

CSE 422S – Operating Systems Organization Reader/Writer Locks All readers can execute concurrently with no locking or synchronization Writers are mutually exclusive with all readers and all other writers Use case? Inserting/deleting(writers) or iterating through (readers) a linked list CSE 422S – Operating Systems Organization