6.852 Lecture 21 ● Techniques for highly concurrent objects – coarse-grained mutual exclusion – read/write locking – fine-grained locking (mutex and read/write)

Slides:

Advertisements

Similar presentations

5.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 5: CPU Scheduling.

Advertisements

Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 6: Process Synchronization.

Multiprocessor Synchronization Algorithms ( ) Lecturer: Danny Hendler The Mutual Exclusion problem.

Scalable and Lock-Free Concurrent Dictionaries

Linked Lists: Locking, Lock-Free, and Beyond … The Art of Multiprocessor Programming Spring 2007.

Linked Lists: Lazy and Non-Blocking Synchronization Based on the companion slides for The Art of Multiprocessor Programming (Maurice Herlihy & Nir Shavit)

Progress Guarantee for Parallel Programs via Bounded Lock-Freedom Erez Petrank – Technion Madanlal Musuvathi- Microsoft Bjarne Steensgaard - Microsoft.

Lock-free Cuckoo Hashing Nhan Nguyen & Philippas Tsigas ICDCS 2014 Distributed Computing and Systems Chalmers University of Technology Gothenburg, Sweden.

Introduction to Lock-free Data-structures and algorithms Micah J Best May 14/09.

CS510 Advanced OS Seminar Class 10 A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy.

CS510 Concurrent Systems Class 2 A Lock-Free Multiprocessor OS Kernel.

Linked Lists: Locking, Lock- Free, and Beyond … Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Skip Lists.

Simple, Fast and Practical Non- Blocking and Blocking Concurrent Queue Algorithms Maged M. Michael & Michael L. Scott Presented by Ahmed Badran.

1 Lock-Free Linked Lists Using Compare-and-Swap by John Valois Speaker’s Name: Talk Title: Larry Bush.

1 Thread Synchronization: Too Much Milk. 2 Implementing Critical Sections in Software Hard The following example will demonstrate the difficulty of providing.

CS4231 Parallel and Distributed Algorithms AY 2006/2007 Semester 2 Lecture 2 (19/01/2006) Instructor: Haifeng YU.

Simple Wait-Free Snapshots for Real-Time Systems with Sporadic Tasks Håkan Sundell Philippas Tsigas.

Software Transactional Memory for Dynamic-Sized Data Structures Maurice Herlihy, Victor Luchangco, Mark Moir, William Scherer Presented by: Gokul Soundararajan.

CS510 Concurrent Systems Jonathan Walpole. A Lock-Free Multiprocessor OS Kernel.

6.3 Peterson’s Solution The two processes share two variables: Int turn; Boolean flag[2] The variable turn indicates whose turn it is to enter the critical.

4061 Session 23 (4/10). Today Reader/Writer Locks and Semaphores Lock Files.

Linked Lists: Locking, Lock- Free, and Beyond … Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Optimistic Design 1. Guarded Methods Do something based on the fact that one or more objects have particular states  Make a set of purchases assuming.

State Teleportation How Hardware Transactional Memory can Improve Legacy Data Structures Maurice Herlihy and Eli Wald Brown University.

Internet Software Development Controlling Threads Paul J Krause.

Kernel Locking Techniques by Robert Love presented by Scott Price.

Linked Lists: Optimistic, Lock-Free, … Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Modified by Pavol Černý,

Linked Lists: Locking, Lock- Free, and Beyond … Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Modified by.

Concurrent Linked Lists and Linearizability Proofs Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Modified.

DOUBLE INSTANCE LOCKING A concurrency pattern with Lock-Free read operations Pedro Ramalhete Andreia Correia November 2013.

Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009.

Advanced Locking Techniques

Practical concurrent algorithms Mihai Letia Concurrent Algorithms 2012 Distributed Programming Laboratory Slides by Aleksandar Dragojevic.

A Simple Optimistic skip-list Algorithm Maurice Herlihy Brown University & Sun Microsystems Laboratories Yossi Lev Brown University & Sun Microsystems.

Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects Maged M. Michael Presented by Abdulai Sei.

Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Lecture 11: Synchronization (Chapter 6, cont)

Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.

1 Condition Variables CS 241 Prof. Brighten Godfrey March 16, 2012 University of Illinois.

Gal Milman Based on Chapter 10 (Concurrent Queues and the ABA Problem) in The Art of Multiprocessor Programming by Herlihy and Shavit Seminar 2 (236802)

Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Skip Lists.

Linked Lists: Locking, Lock-Free, and Beyond … Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

SkipLists and Balanced Search The Art Of MultiProcessor Programming Maurice Herlihy & Nir Shavit Chapter 14 Avi Kozokin.

Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects MAGED M. MICHAEL PRESENTED BY NURIT MOSCOVICI ADVANCED TOPICS IN CONCURRENT PROGRAMMING,

Linked Lists: Locking, Lock- Free, and Beyond … Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Linked Lists: Locking, Lock- Free, and Beyond … Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Homework-6 Questions : 2,10,15,22.

Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit Concurrent Skip Lists.

Lecture 9 : Concurrent Data Structures Companion slides for The Art of Multiprocessor Programming by Maurice Herlihy & Nir Shavit.

Maurice Herlihy, Victor Luchangco, Mark Moir, William N. Scherer III

Linked Lists: Locking, Lock-Free, and Beyond …

Lock Free Linked List and Skip List

Lecture 6-1 : Concurrent Data Structures (Concurrent Linked Lists)

Linked Lists: Locking, Lock-Free, and Beyond …

Lock-Free Linked Lists Using Compare-and-Swap

CO890 CONCURRENCY & PARALLELISM

Linked Lists: Locking, Lock-Free, and Beyond …

Concurrent Data Structures Concurrent Algorithms 2016

Concurrent Data Structures Concurrent Algorithms 2017

CSCE 668 DISTRIBUTED ALGORITHMS AND SYSTEMS

Designing Parallel Algorithms (Synchronization)

Linked Lists: Locking, Lock-Free, and Beyond …

Linked Lists: Locking, Lock-Free, and Beyond …

Multicore programming

A Concurrent Lock-Free Priority Queue for Multi-Thread Systems

Kernel Synchronization II

Multicore programming

Lecture 1: Introduction

Linked Lists: Locking, Lock-Free, and Beyond …

Process/Thread Synchronization (Part 2)

Presentation transcript:

6.852 Lecture 21 ● Techniques for highly concurrent objects – coarse-grained mutual exclusion – read/write locking – fine-grained locking (mutex and read/write) – optimistic locking – lock-free/nonblocking algorithms – “lazy” synchronization – illustrate on list-based sets, apply to other data structures ● Reading: – Herlihy-Shavit Chapter 8 (Chapter 9 in draft version)

Shared-memory algorithms ● Object-oriented pseudocode – at most one memory access per atomic step – memory management: allocation and garbage collection ● Synchronization primitives – compare-and-swap (CAS) – load-linked/store-conditional (LL/SC) – assume lock and unlock methods for every object

Shared-memory algorithms ● Atomic (aka linearizable) objects ● Dominant technique: lock-based implementations ● No fault-tolerance (i.e., assume no failures) – not even always guaranteed failure-free termination ● Progress properties – deadlock-freedom, lockout-freedom (aka starvation-freedom) – nonblocking conditions: lock-freedom, wait-freedom ● Performance – worst-case (time bounds) vs. average case (throughput) – no good formal models

List-based sets ● Data type: set of integers (no duplicates) – S.add(x): Boolean: S := S  {x}; return true iff x not already in S – S.remove(x): Boolean: S := S \ {x}; return true iff x in S initially – S.contains(x): Boolean: return true iff x in S (no change to S) ● Simple ordered linked-list-based implementation – illustrate techniques useful for pointer-based data structures ● poor data structure for this specific data type -∞-∞ hea d 149 ∞

Sequential list-based set ∞∞ 49 ∞ hea d add(3 ) 3 ∞∞ 49 ∞ hea d remove(4 ) 11

Sequential list-based set S.add(x) pred := S.head curr := pred.next while (curr.key < x) pred := curr curr := pred.next if curr.key = x then return false else node = new Node(x) node.next = curr pred.next = node return true S.remove(x) pred := S.head curr := pred.next while (curr.key < x) pred := curr curr := pred.next if curr.key = x then pred.next = curr.next return true else return false S.contains(x) curr := S.head while (curr.key < x) curr := curr.next if curr.key = x then return true else return false

Sequential list-based set S.remove(x) pred := S.head curr := pred.next while (curr.key < x) pred := curr curr := pred.next if curr.key = x then pred.next = curr.next return true else return false ∞∞ 49 ∞ hea d 1 pre d cur r remove(4 )

Allowing concurrent access ● Is this algorithm “thread-safe”? ● What can go wrong? ● Can we “fix” it? ● How?

Concurrent operations (bad) S.remove(x) pred := S.head curr := pred.next while (curr.key < x) pred := curr curr := pred.next if curr.key = x then pred.next = curr.next return true else return false ∞∞ 9 ∞ hea d 1 pre d cur r remove(4 ) pre d cur r remove(9 ) 4

Coarse-grained locking S.add(x) S.lock() pred := S.head curr := pred.next while (curr.key < x) pred := curr curr := pred.next if curr.key = x then S.unlock() return false else node = new Node(x) node.next = curr pred.next = node S.unlock() return true S.remove(x) S.lock() pred := S.head curr := pred.next while (curr.key < x) pred := curr curr := pred.next if curr.key = x then pred.next = curr.next S.unlock() return true else S.unlock() return false S.contains(x) S.lock() curr := S.head while (curr.key < x) curr := curr.next S.unlock() if curr.key = x then return true else return false Why does this work? (cf. RMWfromRW algorithm) What progress guarantees do we get? Why can we unlock early here?

Coarse-grained locking ∞∞ 49 ∞ hea d 1 pre d cur r remove(4 )

Coarse-grained locking ● Easy – to write – to prove correct ● No fault-tolerance – but it is deadlock-free! – if we use queue locks, it's lockout-free ● Poor performance when contention is high – essentially no concurrent access – but often good enough for low contention For many applications, this is the best solution! (Don't underrate simplicity.)

Coarse-grained locking ∞∞ 49 ∞ hea d 1 pre d cur r remove(4 ) remove(9 ) add(6 ) contains(4 ) add(3 )

Improving coarse-grained locking ● Reader/writer locks – allow multiple readers to hold lock simultaneously – writers can easily starve ● introduce “waiting” bit to avoid this – contains takes only read lock ● can be big win if contains is the most common operation – what about add or remove that returns false? ● upgrading

Fine-grained locking ● associate locks with smaller pieces of data – methods that work on disjoint pieces can proceed concurrently ● simple to prove atomicity if locking is “two-phase” – first acquire locks, then release (no acquire after any release) ● typically release at the end of operation: strict two-phase locking ● can be expensive to acquire all the locks ● must be careful to avoid deadlock – typically acquire locks in some predetermined order ● naive two-phase application doesn't help (why not?) – it does with reader/writer locks, but tricky to avoid deadlock

Hand-over-hand locking ● Fine-grained locking, but not “two-phase” – atomicity doesn't follow from general rule; a bit tricky to prove ● Hold at most two locks at a time – acquire lock for successor before releasing lock for predecessor ∞∞ 49 ∞ hea d 1 pre d cur r remove(4 )

Hand-over-hand locking ● Must we lock the successor of a node we are trying to add? – we don't need to lock to read the key (why not?) ● Must we lock a node we are trying to remove?

Removing a Node abc d remove(b)

Removing a Node abc d remove(b)

Removing a Node abc d remove(c) remove(b)

Removing a Node abc d remove(b) remove(c)

Removing a Node abc d remove(b) remove(c)

Removing a Node abc d remove(b) remove(c)

Removing a Node abc d remove(b) remove(c)

Removing a Node abc d remove(b) remove(c)

Removing a Node abc d remove(b) remove(c)

Removing a Node abc d remove(b) remove(c)

Uh, Oh ac d remove(b) remove(c) b

Uh, Oh ac d remove(b) remove(c) b

Hand-over-hand locking ● Problems – must acquire O(k) locks, where k = |S| – threads can get stuck behind a slow thread ● can avoid this by using reader/writer locks, but then must do something to avoid deadlock ● Idea: What if we find the nodes first without locking, and then lock only the nodes we need? – must ensure that the node we modify is still in list – optimistic locking

Optimistic locking ● Search down list without locking ● Find and lock appropriate nodes ● Verify that nodes are still adjacent and in list (validation) – we can do this by traversing list again (provided that nodes are not removed from list while they are locked) ● Better than hand-over-hand if – traversing twice without locking is cheaper than once with locking ● traversal is wait-free! (we'll come back to this) – validation typically succeeds

Optimistic locking b d e a add(c) Aha!

Optimistic locking b d e a add(c)

What can go wrong? (part 1) b d e a add(c)

What can go wrong? (part 1) b d e a add(c) remove(b)

What can go wrong? (part 1) b d e a add(c)

What can go wrong? (part 2) b d e a add(c)

What can go wrong? (part 2) b d e a add(c) add(b’) b’

What can go wrong? (part 2) b d e a add(c) b’

Validate (part 1) b d e a add(c) Yes, b still reachable from head

Validate (part 2) b d e a add(c) Yes, b still points to d

Optimistic locking b d e a add(c) c

Lock-freedom ● Even without failures, locks can cause problems: – some operations take 1000x (or more) longer than others, nondeterministically due to page faults, descheduling, etc. – if this happens to anyone in their critical section, everyone else who wants to access that lock must wait ● What about lock-free algorithms? – if any thread executing a method does not fail then some method completes. – weaker than wait-free: starvation is possible – but rules out a delayed thread from blocking other threads indefinitely, and thus, no locks

Lock-free list-based set ● Idea: Use CAS to change next pointer – make sure next pointer hasn't changed since you read it ● assumes nodes aren't reused – possible because operations only change one pointer – but still nontrivial

Adding a Node acd

acdb

acdb CAS

Adding a Node acdb

Removing a Node abc d remove b remove c

Removing a Node abc d remove b remove c CAS

Removing a Node abc d remove b remove c CAS

Bad news Look Familiar? abc d remove b remove c

Lock-free list-based set ● Idea: Add “mark” to a node to indicate whether its key been removed from the set. – set mark before removing node from list ● thus, if mark is not set, node is in the list – setting the mark removes key from the set ● it is the serialization point of a successful remove operation – don't change next pointer of a marked node ● mark and next pointer must be in the same word – “steal” a low-order bit from pointers – Java provides special class: AtomicMarkableReference

Lock-free list-based set ● Traverse the list to find appropriate nodes – what if we encounter marked nodes? ● If nodes are unmarked then operate as follows: – for contains(x) or unsuccessful add/remove(x), return appropriate value based on whether curr.key = x – for successful add(x), CAS pred.next+mark to (node, false) – for successful remove(x), ● CAS curr.next+mark to (curr.next, true) [logical removal] ● CAS pred.next+mark to (curr.next, false) [“physical” removal] – if (first) CAS fails, retry operation

Lock-free list-based set ● What if we encounter marked nodes? – HELP! – if curr is marked, CAS pred.next+mark to (curr.next, false) – if CAS fails, retry operation ● This kind of helping is characteristic of lock-free and wait-free algorithms (not all have it, but most do). – next lecture, we'll see obstruction-freedom, a weaker condition that doesn't typically require helping.

Removing a Node abc d remove c CAS

Removing a Node ab d remove b remove c c CAS failed

Removing a Node ab d remove b remove c c

Removing a Node a d remove b remove c

Next time ● Transactional memory ● Reading: – Herlihy, Luchangco, Moir, Scherer paper – Dice, Shalev, Shavit paper