CS510 Advanced OS Seminar Class 10 A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy.

Slides:



Advertisements
Similar presentations
Time-based Transactional Memory with Scalable Time Bases Torvald Riegel, Christof Fetzer, Pascal Felber Presented By: Michael Gendelman.
Advertisements

Part IV: Memory Management
§3 The Stack ADT 1. ADT A stack is a Last-In-First-Out (LIFO) list, that is, an ordered list in which insertions and deletions are.
Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.
CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.
Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.
CS510 – Advanced Operating Systems 1 The Synergy Between Non-blocking Synchronization and Operating System Structure By Michael Greenwald and David Cheriton.
Mutual Exclusion By Shiran Mizrahi. Critical Section class Counter { private int value = 1; //counter starts at one public Counter(int c) { //constructor.
Lecture 10: Heap Management CS 540 GMU Spring 2009.
Scalable and Lock-Free Concurrent Dictionaries
Wait-Free Reference Counting and Memory Management Håkan Sundell, Ph.D.
A Pipeline for Lockless Processing of Sound Data David Thall Insomniac Games.
Parallel Processing (CS526) Spring 2012(Week 6).  A parallel algorithm is a group of partitioned tasks that work with each other to solve a large problem.
CS6290 Synchronization. Synchronization Shared counter/sum update example –Use a mutex variable for mutual exclusion –Only one processor can own the mutex.
CPSC 388 – Compiler Design and Construction
CS 536 Spring Automatic Memory Management Lecture 24.
Memory Management Chapter 4. Memory hierarchy Programmers want a lot of fast, non- volatile memory But, here is what we have:
Introduction to Lock-free Data-structures and algorithms Micah J Best May 14/09.
Computer Laboratory Practical non-blocking data structures Tim Harris Computer Laboratory.
1 Lecture 21: Synchronization Topics: lock implementations (Sections )
Simple, Fast, and Practical Non- Blocking and Blocking Concurrent Queue Algorithms Presenter: Jim Santmyer By: Maged M. Micheal Michael L. Scott Department.
CS510 Concurrent Systems Class 2 A Lock-Free Multiprocessor OS Kernel.
Memory Management 1 CS502 Spring 2006 Memory Management CS-502 Spring 2006.
CS-3013 & CS-502, Summer 2006 Memory Management1 CS-3013 & CS-502 Summer 2006.
Memory ManagementCS-502 Fall Memory Management CS-502 Operating Systems Fall 2006 (Slides include materials from Operating System Concepts, 7 th.
Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.
CS533 - Concepts of Operating Systems 1 Class Discussion.
Memory ManagementCS-3013 C-term Memory Management CS-3013 Operating Systems C-term 2008 (Slides include materials from Operating System Concepts,
Operating Systems (CSCI2413) Lecture 3 Processes phones off (please)
1 Lecture 20: Protocols and Synchronization Topics: distributed shared-memory multiprocessors, synchronization (Sections )
Transactions and Reliability. File system components Disk management Naming Reliability  What are the reliability issues in file systems? Security.
This module was created with support form NSF under grant # DUE Module developed by Martin Burtscher Module B1 and B2: Parallelization.
CS 11 C track: lecture 5 Last week: pointers This week: Pointer arithmetic Arrays and pointers Dynamic memory allocation The stack and the heap.
Simple Wait-Free Snapshots for Real-Time Systems with Sporadic Tasks Håkan Sundell Philippas Tsigas.
1 Computer System Overview Chapter 1. 2 n An Operating System makes the computing power available to users by controlling the hardware n Let us review.
Software Transactional Memory for Dynamic-Sized Data Structures Maurice Herlihy, Victor Luchangco, Mark Moir, William Scherer Presented by: Gokul Soundararajan.
CS510 Concurrent Systems Jonathan Walpole. A Lock-Free Multiprocessor OS Kernel.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
Optimistic Design 1. Guarded Methods Do something based on the fact that one or more objects have particular states  Make a set of purchases assuming.
1 File Systems: Consistency Issues. 2 File Systems: Consistency Issues File systems maintains many data structures  Free list/bit vector  Directories.
Maged M.Michael Michael L.Scott Department of Computer Science Univeristy of Rochester Presented by: Jun Miao.
SPECULATIVE EXECUTION IN A DISTRIBUTED FILE SYSTEM E. B. Nightingale P. M. Chen J. Flint University of Michigan.
1 Contention Management and Obstruction-free Algorithms Niloufar Shafiei.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.
A Methodology for Creating Fast Wait-Free Data Structures Alex Koganand Erez Petrank Computer Science Technion, Israel.
Non-Blocking Concurrent Data Objects With Abstract Concurrency By Jack Pribble Based on, “A Methodology for Implementing Highly Concurrent Data Objects,”
Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009.
Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects Maged M. Michael Presented by Abdulai Sei.
CS333 Intro to Operating Systems Jonathan Walpole.
The read-copy-update mechanism for supporting real-time applications on shared-memory multiprocessor systems with Linux Guniguntala et al.
CS510 Concurrent Systems Jonathan Walpole. A Methodology for Implementing Highly Concurrent Data Objects.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
CS510 Concurrent Systems Jonathan Walpole. RCU Usage in Linux.
A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy Slides by Vincent Rayappa.
1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )
Optimistic Design CDP 1. Guarded Methods Do something based on the fact that one or more objects have particular states Make a set of purchases assuming.
Memory Management Program must be brought (from disk) into memory and placed within a process for it to be run Main memory and registers are only storage.
Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects MAGED M. MICHAEL PRESENTED BY NURIT MOSCOVICI ADVANCED TOPICS IN CONCURRENT PROGRAMMING,
CS510 Concurrent Systems Tyler Fetters. A Methodology for Implementing Highly Concurrent Data Objects.
Chapter 2 Memory and process management
By Michael Greenwald and David Cheriton Presented by Jonathan Walpole
Advanced OS Concepts (For OCR)
Processes Overview: Process Concept Process Scheduling
Anders Gidenstam Håkan Sundell Philippas Tsigas
A Methodology for Implementing Highly Concurrent Data Objects
Designing Parallel Algorithms (Synchronization)
Tim Ehrlich Growing Arrays in C.
CS5123 Software Validation and Quality Assurance
Kernel Synchronization II
Presentation transcript:

CS510 Advanced OS Seminar Class 10 A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy

CS533 - Concepts of Operating Systems 2 Motivation  A concurrent object is a data structure shared by concurrent processes o It is blocking if delays in one process can cause delays in others (all processes might fail to complete) o Non-blocking means some process will complete in finite number of steps o Wait-free means it is free from starvation, ie. All processes will complete in a finite number of steps  General methodology for constructing non-blocking and wait- free concurrent objects o Automatic transformation from sequential implementation o Using universal primitives such as LL/SC or CAS CAS algorithms are more complex and less efficient than LL/SC ones

CS533 - Concepts of Operating Systems 3 Overview  Methodology o Start with sequential objects and operations o Apply synchronization and memory management algorithms to transform sequential objects and operations into concurrent objects and operations o Simple enough to be applied by a compiler

CS533 - Concepts of Operating Systems 4 Small objects  Slightly different approach for small and large objects  Small objects o can be copied efficiently o object occupies a fixed size contiguous region of memory called a “block”  Restrictions o Sequential operations must be “total” ie. well-defined for all states of the object

CS533 - Concepts of Operating Systems 5 Small object transformation  The basic idea o Load-linked pointer to current version o Copy version to new (local) block of memory o Apply the sequential operation to the copy o Store-conditional pointer to current version to point to new version retry (to load-linked) on failure o Reclaim memory of old version

CS533 - Concepts of Operating Systems 6 Problem!  What if you are slow and another process reclaims and reuses the memory while you are making your copy? o You may generate an inconsistent copy o During your modifications you may later dereference a corrupted pointer o You may get a divide by zero error … o All before you reach the store-conditional that would let you know to fail and retry

CS533 - Concepts of Operating Systems 7 Solution  Check the consistency of the copy before you use it  But how can you know? o Use two checks Updaters increment check 0 before updating and increment check 1 after updating –The two checks have the same value unless an update is in progress Copiers read check 1 before copying and check 0 after copying –If they are the same then the copy is consistent –If not retry the copy This is a classic application for memory barriers

CS533 - Concepts of Operating Systems 8 Concurrent priority queue Load linked queue pointer

CS533 - Concepts of Operating Systems 9 Concurrent priority queue Read first part of consistency check

CS533 - Concepts of Operating Systems 10 Concurrent priority queue Copy object

CS533 - Concepts of Operating Systems 11 Concurrent priority queue Read second part of consistency check

CS533 - Concepts of Operating Systems 12 Concurrent priority queue If object is consistent

CS533 - Concepts of Operating Systems 13 Concurrent priority queue Perform sequential operation on copy

CS533 - Concepts of Operating Systems 14 Concurrent priority queue Commit changes by overwriting queue pointer with pointer to new version. Exit loop on success … retry on failure

CS533 - Concepts of Operating Systems 15 Concurrent priority queue Take ownership of old queue’s memory to replace the memory given away for new version.

CS533 - Concepts of Operating Systems 16 Performance evaluation  How expensive is the copying overhead?  What kind of contention characteristics does this approach have?

CS533 - Concepts of Operating Systems 17 Benchmark: million enqueue/dequeue pairs

CS533 - Concepts of Operating Systems 18 Performance results

CS533 - Concepts of Operating Systems 19 Contention and fairness

CS533 - Concepts of Operating Systems 20 Problem  How to reduce contention?  Introduce delay via exponential back-off o Like in spin-locks

CS533 - Concepts of Operating Systems 21 Concurrent priority queue with backoff Busy wait Set wait time

CS533 - Concepts of Operating Systems 22 Performance impact back-off

CS533 - Concepts of Operating Systems 23 More problems  Performance o Requires backoff to reduce contention  Fairness o Non-blocking algorithm is open to starvation Especially of enqueue’s which are slower (longer) and frequently have to retry If a short fast thing runs concurrently with a long slow thing, the short fast thing nearly always wins and the long slow thing nearly always has to retry –It may never complete! o How can we make the approach wait-free?

CS533 - Concepts of Operating Systems 24 Wait-free algorithm  Processes register their attempted invocations in an n-element “announce” array o one element per process o fields are o toggle field is complemented on each invocation  Processes register their results in an n-element “responses” array o one element per process o fields are  Processes help each other complete invocations by running the “apply” function before performing own operation

CS533 - Concepts of Operating Systems 25 Apply function – basic idea  Before performing an operation o look to see if any other processes have operations in progress o if so, try to perform those operations for them and place the result somewhere they can pick it up  Guarantees that whoever finishes first will succeed in performing every process’ operations o no processes operations can be starved o sounds like lots of wasted work and lots of contention!

CS533 - Concepts of Operating Systems 26 Apply function For all processes

CS533 - Concepts of Operating Systems 27 Apply function Are any other operations active concurrently?

CS533 - Concepts of Operating Systems 28 Apply function If so, complete the operation on behalf of the other process (in case its slower than you and would be forced to retry by the operation you are about to do)

CS533 - Concepts of Operating Systems 29 Apply function Mark operation as having completed

CS533 - Concepts of Operating Systems 30 Wait-free concurrent pqueue

CS533 - Concepts of Operating Systems 31 Wait-free concurrent pqueue Announce your attempted operation

CS533 - Concepts of Operating Systems 32 Wait-free concurrent pqueue Distinguish it from the last one, which has a result already registered (… is one bit enough??)

CS533 - Concepts of Operating Systems 33 Wait-free concurrent pqueue Check (twice!) that someone else has not completed the operation for you?? (… hmm, why does it help to read this twice?)

CS533 - Concepts of Operating Systems 34 Wait-free concurrent pqueue Apply pending operations ?? … is there code missing here??? Should include apply(announce, new_pqueue);

CS533 - Concepts of Operating Systems 35 Issues with the wait-free approach  Aside from the bugs in this tech report version of the paper ….  Avoids starvation by having all processes attempt to complete any concurrent activity of all other processes o One will succeed and commit all results o All others will fail to commit their version, but should notice that their operation was completed by someone else How do they pick up the result? How is concurrency managed for the announce and result data structures?

CS533 - Concepts of Operating Systems 36 Large objects  Copying the entire object is too costly  Programmers must construct logical versions o Memory is shared among versions o New memory is allocated for unique parts of new versions o Memory of old versions must be freed

CS533 - Concepts of Operating Systems 37 Memory management for large objects  Per-process pool of memory o 3 states: committed, allocated and freed  Operations: o set_alloc moves block from committed (freed?) to allocated and returns address o set_free moves block to freed o set_prepare marks blocks in allocated as consistent o set_commit sets committed to union of freed and committed o set_abort sets freed and allocated to the empty set

CS533 - Concepts of Operating Systems 38 Memory management for large objects  May require a global memory pool  Problem: o How to prevent this from becoming a synchronization bottleneck?

CS533 - Concepts of Operating Systems 39 ACM version – corrected simply algorithm

CS533 - Concepts of Operating Systems 40 ACM version – corrected delay algorithm

CS533 - Concepts of Operating Systems 41 ACM version – corrected apply function

CS533 - Concepts of Operating Systems 42 ACM version – corrected wait-free algorithm