Kernel Synchronization

Kernel Synchronization
國立中正大學資訊工程研究所羅習五　老師

Chapter 9: Kernel Synchronization
A multithreaded & fully pre-emptible kernel r/w spin locks Sequential spin locks RCU (read-copy-update) Atomic Operations Ordering and Barriers Semaphores Spin Locks Counting semaphores Basic spin lock methods r/w semaphores Conclusion

Kernel You could think of the kernel as a server that answers requests; these requests can come either from a process running on a CPU or an external device issuing an interrupt request. Top halves Bottom halves

Kernel Control Paths Kernel Control Path (KCP)
a sequence of instructions executed by the kernel to handle interrupts (/exception) of different kinds Each kernel request is handled by a different KCP system call request (software interrupt): system_call … ret_from_sys_call

Kernel Requests Exception: Interrupt:
A process executing in User Mode causes an exception. (e.g., x/0) A process executing in Kernel Mode causes a Page Fault exception. Interrupt: An external device sends a signal to a programmable interrupt controller (PIC), and the corresponding interrupt is enabled A process running raises an interprocessor interrupt (IPI).

Causes of Concurrency Interrupts (top halves)
Softirqs and tasklets (bottom halves, kernel threads) Kernel preemption Sleeping and context switches Symmetrical multiprocessing (multicore)

Synchronization Primitives
Technique Description Scope Atomic operation Atomic read-modify-write instruction to a counter All CPUs Memory barrier Avoid instruction re-ordering Local CPU Spin lock Lock with busy wait Semaphore Lock with blocking wait Local interrupt disabling Forbid interrupt handling on a single CPU Local softirq disabling Forbid deferrable function handling on a single CPU Global interrupt disabling Forbid interrupt and softirq handling on all CPUs

Atomic Operations Many instructions not atomic in hw (MP)
rmw instructions: inc, test-and-set, swap unaligned memory access rep instructions Compiler may not generate atomic code even i++ is not necessarily atomic! (i=i+1) Linux – atomic_ macros atomic_t – 24 bit atomic counters Intel implementation (atomic, for MP) lock prefix byte 0xf0 – locks memory bus

Atomic operations in Linux
Function Description atomic_read(v) Return *v atomic_set(v,i) Set *v to i atomic_add(i,v) Add i to *v atomic_sub(i,v) Subtract i from *v atomic_sub_and_test(i, v) Subtract i from *v and return 1 if the result is zero; 0 otherwise atomic_inc(v) Add 1 to *v atomic_dec(v) Subtract 1 from *v atomic_dec_and_test(v) Subtract 1 from *v and return 1 if the result is zero; 0 otherwise atomic_inc_and_test(v) Add 1 to *v and return 1 if the result is zero; 0 otherwise atomic_add_negative(i, v) Add i to *v and return 1 if the result is negative; 0 otherwise

Atomic bit handling functions in Linux
Description test_bit(nr, addr) Return the value of the nrth bit of *addr set_bit(nr, addr) Set the nrth bit of *addr clear_bit(nr, addr) Clear the nrth bit of *addr change_bit(nr, addr) Invert the nrth bit of *addr test_and_set_bit(nr, addr) Set the nrth bit of *addr and return its old value test_and_clear_bit(nr, addr) Clear the nrth bit of *addr and return its old value test_and_change_bit(nr, addr) Invert the nrth bit of *addr and return its old value atomic_clear_mask(mask, addr) Clear all bits of addr specified by mask atomic_set_mask(mask, addr) Set all bits of addr specified by mask

Ordering and Barriers Compilers and hw re-order memory accesses
as an optimization true on SMP and even UP systems! Memory barrier – instruction to hw/compiler to complete all pending accesses before issuing more read memory barrier – acts on read requests write memory barrier – acts on write requests Linux macros for UP and MP: mb(), rmb(), wmb() for MP only: smp_mp(), smp_rmb(), smp_wmb()

Memory barriers in Linux
Macro Description mb( ) Memory barrier for MP and UP rmb( ) Read memory barrier for MP and UP wmb( ) Write memory barrier for MP and UP smp_mb( ) Memory barrier for MP only smp_rmb( ) Read memory barrier for MP only smp_wmb( ) Write memory barrier for MP only

Example: Peterson’s Solution
Two process solution Assume that the LOAD and STORE instructions are atomic; that is, cannot be interrupted. The two processes share two variables: int turn; Boolean flag[2] The variable turn indicates whose turn it is to enter the critical section. The flag array is used to indicate if a process is ready to enter the critical section. flag[i] = true implies that process Pi is ready!

Algorithm for Process Pi
while (true) { flag[i] = TRUE; turn = j; while ( flag[j] && turn == j); /*CRITICAL SECTION*/ flag[i] = FALSE; /*REMAINDER SECTION*/ } ?

flag[i] = False turn = i Task_i Task_j while (true) { while (true) {
turn = j; flag[i] = TRUE; while ( flag[j] && turn == j); /*CRITICAL SECTION*/ flag[i] = FALSE; /*REMAINDER SECTION*/ } while (true) { turn = i; flag[j] = TRUE; while ( flag[i] && turn == i); /*CRITICAL SECTION*/ flag[i] = FALSE; /*REMAINDER SECTION*/ } flag[i] = False turn = i

Peterson’s Solution while (true) { flag[i] = TRUE; mb( ); turn = j; while ( flag[j] && turn == j); /*CRITICAL SECTION*/ flag[i] = FALSE; /*REMAINDER SECTION*/ }

Spin Lock A special kind of lock designed to work in a multiprocessor environment. Spin lock R/W spin lock Sequential lock Useless in a uniprocessor environment (?)

Spin lock functions spin_lock_init( ) spin_lock( ) spin_unlock( )
Description spin_lock_init( ) Set the spin lock to 1 (unlocked) spin_lock( ) Cycle until spin lock becomes 1 (unlocked), then set it to 0 (locked) spin_unlock( ) spin_unlock_wait( ) Wait until the spin lock becomes 1 (unlocked) spin_is_locked( ) Return 0 if the spin lock is set to 1 (unlocked); 1 otherwise spin_trylock( ) Set the spin lock to 0 (locked), and return 1 if the lock is obtained; 0 otherwise

Spin lock functions spin_lock(slp) 1: lock; decb slp jns 3f
2: cmpb $0,slp pause jle 2b jmp 1b 3: spin_unlock(slp) lock; movb $1, slp

Reader-writer spin lock
Reader-writer spin locks provide separate reader and writer variants of the lock. One or more readers can concurrently hold the reader lock. The writer lock, conversely, can be held by at most one writer with no concurrent readers.

Read/Write Spin Locks initial 0x01 000000 lock # of reading write
One read 0x00ffffff Two read 0x00fffffe

Read Spin Lock read_lock(rwlp) movl $rwlp,%eax lock; subl $1,(%eax)
jns 1f call __read_lock_failed 1: __read_lock_failed: lock; incl (%eax) 1:cmpl $1,(%eax) js 1b lock; decl (%eax) js __read_lock_failed ret read_unlock(rwlp) lock; incl rwlp

Write Spin Lock write_lock(rwlp) movl $rwlp,%eax
lock; subl $0x ,(%eax) jz 1f call write_lock_failed 1: __write_lock_failed: lock; addl $0x ,(%eax) 1: cmpl $0x ,(%eax) jne 1b lock; subl $0x ,(%eax) jnz __write_lock_failed ret write_unlock(rwlp) lock; addl $0x ,rwlp

Seqlock (sequential lock)
A seqlock is a locking mechanism Linux for supporting fast writes of shared variables. seqlock := sequence number + lock The lock is to support synchronization between two writers the counter is for indicating consistency in readers

Seqlock (sequential lock)
the writer increments the sequence number, both after acquiring the lock and before releasing the lock. Readers read the sequence number before and after reading the shared data. do { while (((old_seq_num = seq_num)%2) != 0); //READER: critical section } while (old_seq_num != seq_num); Seqlock was first applied to system time counter updating.

Example: jiffies_64 on 32b machines
reader writer do { seq = read_seqbegin(&xtime_lock); ret = jiffies_64; } while (read_seqretry(&xtime_lock, seq)); write_seqlock(&xtime_lock); jiffies_64 += 1; write_sequnlock(&xtime_lock);

seqlock_t typedef struct { unsigned sequence; spinlock_t lock; } seqlock_t;

seqlock_t.operations

MONITOR & MWAIT (x86, for thread synchronization)
MONITOR defines an address range used to monitor write-back stores. MWAIT is used to indicate that the software thread is waiting for a write-back store to the address range defined by the MONITOR instruction.

Read-copy-update (RCU)
It allows extremely low overhead, wait-free reads. RCU updates can be expensive they must leave the old versions of the data structure in place to accommodate pre-existing readers. These old versions are reclaimed after all pre-existing readers finish their accesses. RCU is a new addition in Linux 2.6; it is used in the networking layer and in the virtual file system (VFS). Reference: Paul E. McKenney: Read-copy-update (RCU), IPDPS 2006 Best Paper

reader data Local_PTR PTR RCU allows extremely low overhead, wait-free reads.

reader data Local_PTR PTR writer data (new) kmalloc + copy + update New_PTR RCU updates can be very expensive…

reader data PTR An atomic operation writer data (new) PTR = New_PTR PTR Remove pointers to a data structure, so that subsequent readers cannot gain a reference to it.

reader data PTR writer data (new) PTR = new_PTR PTR Wait for all previous readers to complete their RCU read-side critical sections.

data writer or GC data (new) kfree(old_ptr) PTR The “GC” can safely reclaim the data (the old version).

data (new) PTR

Lock scheduler Unlock scheduler CTX_SW reader kfree writer Lock_scheduler := preempt_count++ Unlock_scheduler := preempt_count--

Semaphores Kernel semaphores System V IPC semaphores
used by kernel control paths. can be acquired only by functions that are allowed to sleep; interrupt handlers and deferrable functions cannot use them. System V IPC semaphores used by User Mode processes

interruptible? The function down_interruptible() places the calling process to sleep in the TASK_INTERRUPTIBLE Alternatively, the function down() places the task in the TASK_UNINTERRUPTIBLE state when it sleeps. The use of down_interruptible() is much more common (and correct) than down() .

Semaphores up: movl $sem,%ecx lock; incl (%ecx) jg 1f pushl %eax
pushl %edx pushl %ecx call _ _up popl %ecx popl %edx popl %eax 1: down: movl $sem,%ecx lock; decl (%ecx); jns 1f pushl %eax pushl %edx pushl %ecx call _ _down popl %ecx popl %edx popl %eax 1:

_ _down WaitingQ.ins WaitingQ.del

Read/Write Semaphores
FIFO complex implementation similar to regular semaphores operations: down_read(), down_write() up_read(), up_write()

Read/Write Semaphores
The first process is always awoken. If it is a writer, the other processes in the wait queue continue to sleep. If it is a reader, any other reader following the first process is also woken up and gets the lock. However, readers that have been queued after a writer continue to sleep. R R R W R W R R

Conclusion We started with the simplest method of ensuring synchronization, atomic operations. We then looked at spin locks which provide a lightweight lock that busy waits. Next, we discussed semaphores, a sleeping lock. Following mutexes, we studied less common, more specialized locking primitives such as completion variables and seq locks.

Global Interrupt Disabling
A typical scenario consists of a driver that needs to reset the hardware device. Global interrupt disabling significantly lowers the system concurrency level. An interrupt service routine should never execute the cli( ) macro.

_ _global_cli() wait for top and bottom halves to complete
disable local interrupts grab spinlock disable all interrupts

Disabling Deferrable Functions
disabling interrupts disables deferred functions possible to disable deferred functions but not all interrupts ops (macros): local_bh_disable() local_bh_enable()

Choosing Synch Primitives
avoid synch if possible! (clever instruction ordering) example: inserting in linked list (needs barrier still) Example: task migration use atomics or rw spinlocks if possible use semaphores if you need to sleep complicated structures accessed by deferred functions

Kernel Synchronization

Similar presentations

Presentation on theme: "Kernel Synchronization"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Kernel Synchronization

Similar presentations

Presentation on theme: "Kernel Synchronization"— Presentation transcript:

Similar presentations

About project

Feedback