CS510 – Advanced Operating Systems 1 The Synergy Between Non-blocking Synchronization and Operating System Structure By Michael Greenwald and David Cheriton.

Slides:



Advertisements
Similar presentations
Basic Operating System Concepts
Advertisements

Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.
CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.
Chapter 6: Process Synchronization
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.
1 Operating Systems, 122 Practical Session 5, Synchronization 1.
Maged M. Michael, “Hazard Pointers: Safe Memory Reclamation for Lock- Free Objects” Presentation Robert T. Bauer.
Parallel Processing (CS526) Spring 2012(Week 6).  A parallel algorithm is a group of partitioned tasks that work with each other to solve a large problem.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Review: Chapters 1 – Chapter 1: OS is a layer between user and hardware to make life easier for user and use hardware efficiently Control program.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
The Performance of Spin Lock Alternatives for Shared-Memory Microprocessors Thomas E. Anderson Presented by David Woodard.
Contiki A Lightweight and Flexible Operating System for Tiny Networked Sensors Presented by: Jeremy Schiff.
1 Lecture 7: Transactional Memory Intro Topics: introduction to transactional memory, “lazy” implementation.
1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
Computer Laboratory Practical non-blocking data structures Tim Harris Computer Laboratory.
CS510 Advanced OS Seminar Class 10 A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy.
CS 333 Introduction to Operating Systems Class 18 - File System Performance Jonathan Walpole Computer Science Portland State University.
CS510 Concurrent Systems Class 2 A Lock-Free Multiprocessor OS Kernel.
1 Concurrency: Deadlock and Starvation Chapter 6.
User-Level Interprocess Communication for Shared Memory Multiprocessors Brian N. Bershad, Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy Presented.
Synchronization CSCI 444/544 Operating Systems Fall 2008.
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.
Chapter 6 Concurrency: Deadlock and Starvation Operating Systems: Internals and Design Principles, 6/E William Stallings Dave Bremer Otago Polytechnic,
CS510 Concurrent Systems Jonathan Walpole. A Lock-Free Multiprocessor OS Kernel.
Cosc 4740 Chapter 6, Part 3 Process Synchronization.
Scheduler Activations: Effective Kernel Support for the User- Level Management of Parallelism. Thomas E. Anderson, Brian N. Bershad, Edward D. Lazowska,
1 Announcements The fixing the bug part of Lab 4’s assignment 2 is now considered extra credit. Comments for the code should be on the parts you wrote.
Operating Systems CSE 411 Multi-processor Operating Systems Multi-processor Operating Systems Dec Lecture 30 Instructor: Bhuvan Urgaonkar.
Operating Systems CSE 411 Kernel synchronization, deadlocks Kernel synchronization, deadlocks Dec Lecture 31 Instructor: Bhuvan Urgaonkar.
Optimistic Design 1. Guarded Methods Do something based on the fact that one or more objects have particular states  Make a set of purchases assuming.
1 File Systems: Consistency Issues. 2 File Systems: Consistency Issues File systems maintains many data structures  Free list/bit vector  Directories.
COMP 111 Threads and concurrency Sept 28, Tufts University Computer Science2 Who is this guy? I am not Prof. Couch Obvious? Sam Guyer New assistant.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
Maged M.Michael Michael L.Scott Department of Computer Science Univeristy of Rochester Presented by: Jun Miao.
11/18/20151 Operating Systems Design (CS 423) Elsa L Gunter 2112 SC, UIUC Based on slides by Roy Campbell, Sam.
CS510 Concurrent Systems Why the Grass May Not Be Greener on the Other Side: A Comparison of Locking and Transactional Memory.
Operating Systems CSE 411 CPU Management Sept Lecture 10 Instructor: Bhuvan Urgaonkar.
Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects Maged M. Michael Presented by Abdulai Sei.
CS333 Intro to Operating Systems Jonathan Walpole.
CS399 New Beginnings Jonathan Walpole. 2 Concurrent Programming & Synchronization Primitives.
CS510 Concurrent Systems Jonathan Walpole. A Methodology for Implementing Highly Concurrent Data Objects.
Processes and Virtual Memory
CS510 Concurrent Systems Jonathan Walpole. RCU Usage in Linux.
A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy Slides by Vincent Rayappa.
CSE 153 Design of Operating Systems Winter 2015 Midterm Review.
Kernel Synchronization in Linux Uni-processor and Multi-processor Environment By Kathryn Bean and Wafa’ Jaffal (Group A3)
1 Why Threads are a Bad Idea (for most purposes) based on a presentation by John Ousterhout Sun Microsystems Laboratories Threads!
Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects MAGED M. MICHAEL PRESENTED BY NURIT MOSCOVICI ADVANCED TOPICS IN CONCURRENT PROGRAMMING,
Slides created by: Professor Ian G. Harris Operating Systems  Allow the processor to perform several tasks at virtually the same time Ex. Web Controlled.
December 1, 2006©2006 Craig Zilles1 Threads & Atomic Operations in Hardware  Previously, we introduced multi-core parallelism & cache coherence —Today.
Multiprocessors – Locks
Jonathan Walpole Computer Science Portland State University
By Michael Greenwald and David Cheriton Presented by Jonathan Walpole
CS703 – Advanced Operating Systems
Lock-Free Linked Lists Using Compare-and-Swap
Atomic Operations in Hardware
Atomic Operations in Hardware
Designing Parallel Algorithms (Synchronization)
Lecture 6: Transactions
Process Description and Control
Lecture 2 Part 2 Process Synchronization
Fast Communication and User Level Parallelism
Software Transactional Memory Should Not be Obstruction-Free
Presented by Neha Agrawal
CS333 Intro to Operating Systems
CSE 451 Section 1/27/2000.
CSE 153 Design of Operating Systems Winter 2019
CSE 542: Operating Systems
CSE 542: Operating Systems
Presentation transcript:

CS510 – Advanced Operating Systems 1 The Synergy Between Non-blocking Synchronization and Operating System Structure By Michael Greenwald and David Cheriton Presented by Jonathan Walpole

CS510 – Advanced Operating Systems 2 The Cache Kernel  Yet Another Attempt at a µ-kernel  Minimal privileged mode code: Caching model of kernel functionality  OS functionality in user-mode class libraries allowing application-customizablility of OS  Memory and signal-based communication  Goal: scalable, robust, flexible operating system design

CS510 – Advanced Operating Systems 3 Synergy of NBS and kernel structure  Claim: NBS and Good Design go together in the Cache Kernel design and implementation o NBS allows better OS structuring o Good OS structure simplifies NBS

CS510 – Advanced Operating Systems 4 Non-blocking Synchronization  Basic idea o Associate version number with data structure o Read version number at start of critical section o Atomically check and increment version number at end of critical section at the same time as applying the update(s), using a Double Compare and Swap (DCAS) instruction. o Try again on failure  Similar to Synthesis approach

CS510 – Advanced Operating Systems 5 Example: Deletion from Linked List do { retry: backoffIfNeeded(); version = list->version;/* Read version # */ for (p = list->head;(p->next != elt); p = p->next) { if (p == NULL) {/* Not found */ if (version != list->version) { goto retry; }/* List changed */ return NULL;/* Really not found */ } } while( !DCAS(&(list->version),&(p->next), version,elt, version+1,elt->next )

CS510 – Advanced Operating Systems 6 Double Compare and Swap int DCAS(int *addr1,int *addr2, int old1,int old2, int new1,int new2) { if ((*addr1 == old1) && (*addr2 == old2)) { *addr1 = new1; *addr2 = new2; return(TRUE); } else { return(FALSE); }

CS510 – Advanced Operating Systems 7 Problem …  What happens if someone else deletes an element while we are traversing the list? o What if we end up traversing into the free pool? o What if the memory is reused? o What if we end up in a different data structure? o What if the memory is reused for a different type?  How can we avoid these problems?

CS510 – Advanced Operating Systems 8 Solutions to reader hijacking  If memory is reclaimed immediately upon updates, readers must protect themselves from hijacking  Possible solution 1? o Readers use a load-linked / store-conditional (LL/SC) sequence to detect changed pointers at the end of their critical sections … or at the end of each use of a pointer  Solution 2: o Readers verify that version numbers associated with data structures have not changed (2 memory barriers) at the end of each use of a pointer

CS510 – Advanced Operating Systems 9 Type-stable memory management (TSM)  Descriptor that is allocated as type T is guaranteed to remain of type T for at least t stable o I.e. cannot switch quickly from T1 to T2 by reallocation o Restrict or prevent memory reuse??? o Generalization of existing technique: e.g. process-descriptors are statically allocated at system init, and are type-stable for lifetime of system  Advantage o T * ptr remains a pointer to an instance of T even if ptr is changed asynchronously by another process  Problem o Unacceptable restriction on memory reuse for production systems

CS510 – Advanced Operating Systems 10 TSM aids Non-Blocking Synchronization  Type stability ensures safety of pointer dereferences. o Without TSM, delete example is too big to fit on slide And very expensive to execute o Need to check for changes on each loop iteration o TSM makes NBS code simpler and faster

CS510 – Advanced Operating Systems 11 Other benefits of NBS in Cache Kernel  Signals are only form of IPC in Cache Kernel  NBS simplifies synchronization in signal handlers o Makes it easier to write efficient signal-safe code

CS510 – Advanced Operating Systems 12 Contention Minimizing Data Structures (CMDS)  Localization-based structuring used in Cache kernel: o Replication: Per-processor structures (ie. run queue) o Extra level of read-mostly hierarchy (ie. hash tables with per bucket synchronization) o Cache block alignment of descriptors  Well-known techniques used in other systems (Tornado, Synthesis, …)

CS510 – Advanced Operating Systems 13 Benefits of CMDS  Minimizes logical and physical contention o Minimizes (memory) consistency overhead o Minimizes false sharing  Reduces lock conflicts/convoys in lock-based system  Reduces synchronization contention with NBS o fewer retries from conflict at point of DCAS  CMDS is good for locks, cache consistency and NBS! o NBS needs CMDS

CS510 – Advanced Operating Systems 14 Minimizing the Window of Inconsistency Another general strategy to improve performance 1) Delay writes, and group together at end 2) Find intermediate consistent states 3) Keep log to allow undo/backout  Small W of I reduces probability of leaving inconsistent data structures after failure.

CS510 – Advanced Operating Systems 15 Minimizing the Window of Inconsistency  Preemption-safe: Easy to back-out of a critical section  Reduces lock hold time, and thus, contention (in lock- based systems)  Less probability of failure leaving inconsistency  Good for system design whether you use blocking or non-blocking synchronization!

CS510 – Advanced Operating Systems 16 Does NBS minimize the window of inconsistency?  What is the relationship between window of inconsistency and probability of conflict (contention)?

CS510 – Advanced Operating Systems 17 Priority Inversion Issues  NBS allows synchronization to be subordinate to scheduling o It avoids priority inversion problem of locks o Also, page fault,I/O, and other blocking  Highest priority process always makes progress

CS510 – Advanced Operating Systems 18 Why NBS is good for OS structure  Fail-stop safe: OS class libraries tolerant of user threads being terminated. o Most Cache Kernel OS functionality implemented in user- mode class libraries.  Portable: same code on uniprocessor, multiprocessor, and signal handlers.  Deadlock free (almost)  NBS supports class library implementation of OS functions

CS510 – Advanced Operating Systems 19 Implementing non-blocking Synchronization  Basic approach o Read version number o Rely on TSM for type safety of pointers o Increment version number and check with every modification (abort if changed).  Straightforward transformation from locking o … so long as you have DCAS and can tolerate TSM o Replace acquire with read of version number o Replace release with DCAS …

CS510 – Advanced Operating Systems 20 Implementing non-blocking Synchronization (cont.)  Variants of the basic approach: o N reads, 1 write: No backout o 2 reads, 2 writes: No version number  Optimization: o Cache based advisory locking avoids contention (& “useless parallelism”) o Uses Cload instruction  In the Cache kernel, every case of synchronization falls into the special cases

CS510 – Advanced Operating Systems 21 Complexity and Correctness  DCAS reduces size of algorithms (lines of code) by 75% compared to CAS, for linked lists, queues and stacks o Special case used of DCAS reduce complexity further  Relatively straightforward transformation from locking o Similar code size

CS510 – Advanced Operating Systems 22 Performance Issues  Simulation-based study  With non-preemptive scheduling: o DCAS-based NBS almost as fast as spin-locks o CAS-based NBS slower  With preemptive scheduling: o DCAS and CAS-based NBS better than spin-locks

CS510 – Advanced Operating Systems 23 Conclusions  “Good OS structure” can support Non-blocking synchronization o Type-Stable Mem. Mgmt (TSM) o Data Structures that Minimize Contention (CMDS) o Minimizing the window of inconsistency  Non-blocking synchronization can support convenient OS structure o Avoids deadlock; allows signals as sole IPC o Fault tolerant; enables functionality to be moved from kernel. o Performance; isolates scheduling from synch.  Strong synergy between non-blocking synch & good OS structure.

CS510 – Advanced Operating Systems 24 Advantages of Non-blocking Synchronization  Non-blocking: o Signal handlers don’t deadlock  Portability: o same code on uni and multiprocessors and signal handlers  Performance: o Minimizes interference between synchronization and process scheduling  Recovery: o Insulation from process failures  So why isn’t it universally deployed?

CS510 – Advanced Operating Systems 25 Obstacles to deployment  Complexity: o confusing to design and write efficient algorithms  Correctness: o hard to convince that there are no subtle bugs  Performance: o increased contention, high overhead  Unfamiliarity and insufficient system support