Presentation is loading. Please wait.

Presentation is loading. Please wait.

By Michael Greenwald and David Cheriton Presented by Jonathan Walpole

Similar presentations


Presentation on theme: "By Michael Greenwald and David Cheriton Presented by Jonathan Walpole"— Presentation transcript:

1 By Michael Greenwald and David Cheriton Presented by Jonathan Walpole
The Synergy Between Non-blocking Synchronization and Operating System Structure By Michael Greenwald and David Cheriton Presented by Jonathan Walpole CS510 – Concurrent Systems

2 CS510 – Concurrent Systems
The Cache Kernel Yet Another Attempt at a µ-kernel Minimal privileged mode code: Caching model of kernel functionality OS functionality in user-mode class libraries allowing application-customizability of OS Shared memory and signal-based communication are the only communication mechanisms Goal: scalable, robust, flexible operating system design CS510 – Concurrent Systems

3 Synergy of NBS and kernel structure
Claim: NBS and Good Design go together in the Cache Kernel design and implementation NBS allows better OS structuring Good OS structure simplifies NBS CS510 – Concurrent Systems

4 Non-blocking Synchronization
Basic idea of NBS Associate version number with data structure Read version number at start of critical section Atomically check and increment version number at end of critical section at the same time as applying the update(s), using a Double Compare and Swap (DCAS) instruction. Roll back and retry on failure (i.e., if version number changed while you were in the critical section). An optimistic synchronization technique similar to that of the Synthesis Kernel CS510 – Concurrent Systems

5 Example: Deletion from Linked List
do { retry: backoffIfNeeded(); version = list->version; /* Read version # */ for (p = list->head;(p->next != elt); p = p->next) { if (p == NULL) { /* Not found */ if (version != list->version) { goto retry; } /* List changed */ return NULL; /* Really not found */ } } while( !DCAS( &(list->version), &(p->next), version, elt, version+1, elt->next ) CS510 – Concurrent Systems

6 Double Compare and Swap
int DCAS( int *addr1, int *addr2, int old1, int old2, int new1, int new2) { <begin atomic> if ((*addr1 == old1) && (*addr2 == old2)) { *addr1 = new1; *addr2 = new2; return(TRUE); } else { return(FALSE); } <end atomic> } CS510 – Concurrent Systems

7 Hardware Support for DCAS
Could be supported through extension to LL/SC (load-linked/store-conditional) sequence LLP/SCP LLP (load-linked-pipelined) Load and link a second address after an earlier LL LLP is linked to the following SCP SCP (store-conditional-pipelined) Store depends on both previous LLP and LL not having been invalidated in this CPU’s cache If LLP/SCP fails, so does enclosing LL/SC CS510 – Concurrent Systems

8 Software Support for DCAS
Basic idea: make DCAS operations atomic by putting them in a critical section protected by a lock! OS manages the lock (how many locks do you need?) Presumably, DCAS is a system call Must make sure that delayed lock holders release the lock and roll back Ensures that DCAS still has non-blocking properties Problems implementing this efficiently How to support readers? How to avoid lock contention? CS510 – Concurrent Systems

9 Back to the List Example
do { retry: backoffIfNeeded(); version = list->version; /* Read version # */ for (p = list->head;(p->next != elt); p = p->next) { if (p == NULL) { /* Not found */ if (version != list->version) { goto retry; } /* List changed */ return NULL; /* Really not found */ } } while( !DCAS( &(list->version), &(p->next), version, elt, version+1, elt->next ) CS510 – Concurrent Systems

10 CS510 – Concurrent Systems
Problem … What happens if another thread deletes an element while we are traversing the list? We may end up with an invalid pointer in p What if we traverse it into the free pool? What if the memory has been reused? What if we end up in a different data structure? What if the memory is reused for a different type? How can we prevent this kind of reader hijacking? Or at least, how can we prevent it from crashing the reader? CS510 – Concurrent Systems

11 Type-stable memory management (TSM)
Descriptor that is allocated as type T is guaranteed to remain of type T for at least tstable A generalization of existing technique – allocation pools: e.g. process-descriptors are statically allocated at system init, and are type-stable for lifetime of system But is this sufficient to ensure that read-side code will reach the DCAS in the presence of concurrent modification? And how long is tstable ? When is it safe to reuse memory? CS510 – Concurrent Systems

12 Other solutions to reader hijacking
If memory is reclaimed immediately upon updates, readers must protect themselves from hijacking But how? Must ensure that a pointer is valid before dereferencing it Additional checks at the end of the read side critical section are required to ensure no concurrent write access took place CS510 – Concurrent Systems

13 CS510 – Concurrent Systems
Claim: TSM aids NBS Type stability ensures safety of pointer dereferences. Without TSM, delete example is too big to fit on slide And very expensive to execute Need to check for changes on each loop iteration TSM makes NBS code simpler and faster CS510 – Concurrent Systems

14 Other benefits of NBS in Cache Kernel
Signals are only form of IPC in Cache Kernel NBS simplifies synchronization in signal handlers Makes it easier to write efficient signal-safe code CS510 – Concurrent Systems

15 Contention Minimizing Data Structures (CMDS)
Locality-based structuring used in Cache kernel: Replication: Per-processor structures (ie. run queue) Hierarchical data structures with read-mostly high levels (ie. hash tables with per bucket synchronization) Cache block alignment of descriptors to minimize false sharing and improve cache efficiency Well-known techniques used in other systems (Tornado, Synthesis, …) CS510 – Concurrent Systems

16 CS510 – Concurrent Systems
Benefits of CMDS Minimizes logical and physical contention Minimizes (memory) consistency overhead Minimizes false sharing Reduces lock conflicts/convoys in lock-based system Reduces synchronization contention with NBS fewer retries from conflict at point of DCAS CMDS is good for locks, cache consistency and NBS! NBS needs CMDS CS510 – Concurrent Systems

17 Minimizing the Window of Inconsistency
Delaying writes, and grouping them together at the end minimizes the window of inconsistency Advantages of a small window of inconsistency Less probability of failure leaving inconsistency Preemption-safe: easy to back-out of critical section Reduces lock hold time & contention (in lock-based systems) Its good for system design whether you use blocking or non-blocking synchronization! CS510 – Concurrent Systems

18 How Does NBS Affect Contention?
What is the relationship between window of inconsistency and probability of conflict (contention)? NBS approaches have a critical section just like locking approaches NBS critical section is defined by the scope of the version number This is NOT the same as the window of inconsistency! CS510 – Concurrent Systems

19 Priority Inversion Issues
NBS allows synchronization to be subordinate to scheduling It avoids priority inversion problem of locks Also, page fault,I/O, and other blocking operations Does the highest priority process always make progress? Is there a retry-based equivalent to priority inversion? CS510 – Concurrent Systems

20 Why NBS is good for OS structure
Fail-stop safe: OS class libraries tolerant of user threads being terminated. Most Cache Kernel OS functionality implemented in user-mode class libraries Portable: same code on uniprocessor, multiprocessor, and signal handlers Deadlock free (almost) See examples of deletion and insertion without going through reclamation CS510 – Concurrent Systems

21 Implementing non-blocking Synchronization
Basic approach Read version number Rely on TSM for type safety of pointers Increment version number and check with every modification (abort if changed). Straightforward transformation from locking … so long as you have DCAS and can tolerate TSM Replace acquire with read of version number Replace release with DCAS … CS510 – Concurrent Systems

22 Implementing non-blocking Synchronization (cont.)
Variants of the basic approach: N reads, 1 write: No backout 2 reads, 2 writes: No version number In the Cache kernel, every case of synchronization falls into the special cases CS510 – Concurrent Systems

23 Complexity and Correctness
DCAS reduces size of algorithms (lines of code) by 75% compared to CAS, for linked lists, queues and stacks Special case uses of DCAS reduce complexity further Relatively straightforward transformation from locking Similar code size CS510 – Concurrent Systems

24 CS510 – Concurrent Systems
Performance Issues Simulation-based study With non-preemptive scheduling: DCAS-based NBS almost as fast as spin-locks CAS-based NBS slower With preemptive scheduling: DCAS and CAS-based NBS better than spin-locks Note, this was using mid 1990’s CPUs The cost of CAS is much higher on today’s CPUs What are the implications of this? CS510 – Concurrent Systems

25 Hardware-Based Optimizations
A hardware-based optimization using advisory locking Cache based advisory locking can help avoid contention in the form of “useless parallelism” Instead of just checking at the end and forcing retries, check at the beginning and back off if retry is inevitable Uses Cload instruction (conditional load) which succeeds only if the location does not have an advisory lock set, and sets the lock on success Initially load version # with Cload, wait and retry on failure (seems like using TSL in a spin-lock?) CS510 – Concurrent Systems

26 CS510 – Concurrent Systems
Conclusions Claim: “Good OS structure” can support Non-blocking synchronization Type-Stable Mem. Mgmt (TSM) Data Structures that Minimize Contention (CMDS) Minimizing the window of inconsistency Claim: Non-blocking synchronization can support convenient OS structure Avoids deadlock; allows signals as sole IPC mechanism Fault tolerant; kernel functionality can be moved to user space Performance; isolates scheduling from synchronization Claim: Strong synergy between non-blocking synchronization & good OS structure. CS510 – Concurrent Systems

27 Advantages of Non-blocking Synchronization
No deadlock! Especially useful for signal handlers. Portability: same code on uni and multiprocessors and in signal handlers Performance: Minimizes interference between synchronization and process scheduling Recovery: Insulation from process failures So why isn’t it universally deployed? CS510 – Concurrent Systems

28 Obstacles to deployment
Complexity: confusing to design and write efficient algorithms, especially in the absence of a DCAS primitive Correctness: Its hard to be convinced that there are no subtle bugs, especially on weak memory consistency architectures. Is TSM enough to preserve critical section invariants? Performance: Poor performance under contention due to excessive retry high overhead due to expense of CAS on modern CPUs CS510 – Concurrent Systems


Download ppt "By Michael Greenwald and David Cheriton Presented by Jonathan Walpole"

Similar presentations


Ads by Google