Practical concurrent algorithms Mihai Letia Concurrent Algorithms 2012 Distributed Programming Laboratory Slides by Aleksandar Dragojevic.

Slides:



Advertisements
Similar presentations
Chapter 22 Implementing lists: linked implementations.
Advertisements

Singly linked lists Doubly linked lists
COMP171 Fall 2005 Lists.
Mutual Exclusion By Shiran Mizrahi. Critical Section class Counter { private int value = 1; //counter starts at one public Counter(int c) { //constructor.
Maged M. Michael, “Hazard Pointers: Safe Memory Reclamation for Lock- Free Objects” Presentation Robert T. Bauer.
Scalable and Lock-Free Concurrent Dictionaries
Wait-Free Reference Counting and Memory Management Håkan Sundell, Ph.D.
Parallel Processing (CS526) Spring 2012(Week 6).  A parallel algorithm is a group of partitioned tasks that work with each other to solve a large problem.
Locality-Conscious Lock-Free Linked Lists Anastasia Braginsky & Erez Petrank 1.
Progress Guarantee for Parallel Programs via Bounded Lock-Freedom Erez Petrank – Technion Madanlal Musuvathi- Microsoft Bjarne Steensgaard - Microsoft.
Data Structures Hash Tables
Lock-free Cuckoo Hashing Nhan Nguyen & Philippas Tsigas ICDCS 2014 Distributed Computing and Systems Chalmers University of Technology Gothenburg, Sweden.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
Introduction to Lock-free Data-structures and algorithms Micah J Best May 14/09.
Computer Laboratory Practical non-blocking data structures Tim Harris Computer Laboratory.
CS510 Advanced OS Seminar Class 10 A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy.
CS510 Concurrent Systems Class 2 A Lock-Free Multiprocessor OS Kernel.
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
SUPPORTING LOCK-FREE COMPOSITION OF CONCURRENT DATA OBJECTS Daniel Cederman and Philippas Tsigas.
Maria-Cristina Marinescu Martin Rinard Laboratory for Computer Science Massachusetts Institute of Technology A Synthesis Algorithm for Modular Design of.
1 Lock-Free Linked Lists Using Compare-and-Swap by John Valois Speaker’s Name: Talk Title: Larry Bush.
Practical and Lock-Free Doubly Linked Lists Håkan Sundell Philippas Tsigas.
CMPSC 16 Problem Solving with Computers I Spring 2014 Instructor: Lucas Bang Lecture 15: Linked data structures.
Software Transactional Memory for Dynamic-Sized Data Structures Maurice Herlihy, Victor Luchangco, Mark Moir, William Scherer Presented by: Gokul Soundararajan.
CS510 Concurrent Systems Jonathan Walpole. A Lock-Free Multiprocessor OS Kernel.
November 15, 2007 A Java Implementation of a Lock- Free Concurrent Priority Queue Bart Verzijlenberg.
A Consistency Framework for Iteration Operations in Concurrent Data Structures Yiannis Nikolakopoulos A. Gidenstam M. Papatriantafilou P. Tsigas Distributed.
Challenges in Non-Blocking Synchronization Håkan Sundell, Ph.D. Guest seminar at Department of Computer Science, University of Tromsö, Norway, 8 Dec 2005.
Non-blocking Data Structures for High- Performance Computing Håkan Sundell, PhD.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
1 Chapter 9 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.
CSC 211 Data Structures Lecture 13
Maged M.Michael Michael L.Scott Department of Computer Science Univeristy of Rochester Presented by: Jun Miao.
Non-Blocking Concurrent Data Objects With Abstract Concurrency By Jack Pribble Based on, “A Methodology for Implementing Highly Concurrent Data Objects,”
Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
A Simple Optimistic skip-list Algorithm Maurice Herlihy Brown University & Sun Microsystems Laboratories Yossi Lev Brown University & Sun Microsystems.
Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects Maged M. Michael Presented by Abdulai Sei.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
CS510 Concurrent Systems Jonathan Walpole. A Methodology for Implementing Highly Concurrent Data Objects.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
CS510 Concurrent Systems Jonathan Walpole. RCU Usage in Linux.
A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy Slides by Vincent Rayappa.
Chapter 5 Linked List by Before you learn Linked List 3 rd level of Data Structures Intermediate Level of Understanding for C++ Please.
November 27, 2007 Verification of a Concurrent Priority Queue Bart Verzijlenberg.
Distributed Algorithms (22903) Lecturer: Danny Hendler The wait-free hierarchy and the universality of consensus This presentation is based on the book.
2005MEE Software Engineering Lecture 7 –Stacks, Queues.
SkipLists and Balanced Search The Art Of MultiProcessor Programming Maurice Herlihy & Nir Shavit Chapter 14 Avi Kozokin.
Chapter 17: Linked Lists. Objectives In this chapter, you will: – Learn about linked lists – Learn the basic properties of linked lists – Explore insertion.
An algorithm of Lock-free extensible hash table Yi Feng.
Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects MAGED M. MICHAEL PRESENTED BY NURIT MOSCOVICI ADVANCED TOPICS IN CONCURRENT PROGRAMMING,
CS510 Concurrent Systems Tyler Fetters. A Methodology for Implementing Highly Concurrent Data Objects.
1 CSC 211 Data Structures Lecture 11 Dr. Iftikhar Azim Niaz 1.
Scalable lock-free Stack Algorithm Wael Yehia York University February 8, 2010.
1 Linked Lists Assignment What about assignment? –Suppose you have linked lists: List lst1, lst2; lst1.push_front( 35 ); lst1.push_front( 18 ); lst2.push_front(
Distributed Algorithms (22903) Lecturer: Danny Hendler Lock-free stack algorithms.
Hazard Pointers C++ Memory Ordering Issues
Håkan Sundell Philippas Tsigas
Concepts of programming languages
Practical Non-blocking Unordered Lists
Prof. Neary Adapted from slides by Dr. Katherine Gibson
Concurrent Data Structures Concurrent Algorithms 2017
CSE 2331/5331 Topic 8: Hash Tables CSE 2331/5331.
Linked List (Part I) Data structure.
[Chapter 4; Chapter 6, pp ] CSC 143 Linked Lists [Chapter 4; Chapter 6, pp ]
Multicore programming
A Concurrent Lock-Free Priority Queue for Multi-Thread Systems
Multicore programming
Linked Lists.
EECE.3220 Data Structures Instructor: Dr. Michael Geiger Spring 2019
Presentation transcript:

Practical concurrent algorithms Mihai Letia Concurrent Algorithms 2012 Distributed Programming Laboratory Slides by Aleksandar Dragojevic

Practical Course oriented towards theory Today: A glimpse of practice – Use practical operations – Implement practical data structure Linked list Give a feeling of “real-world” concurrent algorithm 2

Outline Compare-and-swap Practical system model Lock-freedom Linked-list 3

Outline Compare-and-swap Practical system model Lock-freedom Linked-list 4

Compare-and-swap (CAS) shared val = BOT; CAS(old, new): ret = val; if val == old: val = new; return ret; 5

Why is it important? We can implement consensus among any number of processes shared CAS cas; Consensus(val): res = cas.CAS(BOT, val); if res != BOT: val = res; return val; 6

Why is it important? (cont) Consensus allows processes to agree on something – Intuitively this is powerful CAS allows any number of processes to agree on something 7

Why is it important also? It is available in hardware – On (probably) all architectures It is a staple of concurrent programming – Linked lists – Queues – Etc. 8

Outline Compare-and-swap Practical system model Lock-freedom Linked-list 9

Practical system model So far: processes operate on shared objects – Object are linearizable – Sequential specification This lecture: processes operate on memory locations – Each location supports multiple instructions – Sequential specification 10

Practical system model (cont) Processes access memory by issuing atomic load, store and CAS instructions Processes can perform local computations on the loaded values Processes are equivalent to OS threads – All processes share the same address space 11

Practical system model (cont) Main memory … load() CAS(v 0, v 1 ) store(v 3 ) 12

CAS in practice w_t CAS(w_t *addr, w_t old, w_t new) { w_t curr = *addr; if(curr == old) *addr = new; return curr; } 13

From practical model to other model It is straightforward to use practical model as the model used in the course – For operations supported by practical model Each location is treated as an object Only restricted operations are allowed on each location – Register: load/store – CAS: CAS 14

From practical model to other model (cont) class CAS { w_t value = BOT; w_t CAS(w_t old, w_t new) { return CASInst(&value, old, new); } 15

CAS example Shared variable initialization We have a shared variable initialized to BOT Multiple processes want to initialize it to a value – Each process proposes its value (e.g. a problem it needs solved) Only one initialization should succeed – Processes need to agree on the initialized value 16

CAS example (cont) w_t *prob = BOT; process() { w_t *prop = init_problem(rnd()); w_t *res = CAS(&prob, BOT, prop); if(res != BOT) uninit_problem(prop); work_on(prob); } 17

Other available operations? Depends on the architecture Available on x86: – Test-and-set – Fetch-and-increment – Fetch-and-add 18

Outline Compare-and-swap Practical system model Lock-freedom Linked-list 19

Linked list A list of nodes that are linked using references Dynamic data structure – Nodes can be allocated and deallocated – No need to know maximum size upfront Base for other data structures – Queue – Hash table – Skip list 20

Linked list API struct Node { int key; int val; Node *next; }; bool Insert(Node *node); Node *Remove(int key); Node *Lookup(int key); 21

Linked list BOT Insert() 15 Remove(20) 22

Our goal today Implement the linked list in our model using load, store and CAS instructions 23

Progress We are not going to implement a wait-free linked list – We assumed wait-free implementations so far in the course We will implement a lock-free linked list – High performance Wait-free linked list is a hard problem – In next lectures 24

Outline Compare-and-swap Practical system model Lock-freedom Linked-list (we will return to this) 25

Lock-freedom If a process performs steps of an algorithm for sufficiently long time, some process will make progress What is difference to wait-freedom? 26

Difference to wait-freedom With wait-free algorithms, when a process makes steps it is guaranteed to make progress With lock-free algorithms, some process is guaranteed to make progress – But not necessarily the process performing the steps 27

Intuition for lock-freedom Lock-freedom guarantees overall progress – No guarantee of overall progress is weaker than lock-freedom (more on this in next lectures) Eliminates live-lock It is usually fine in practice – In practice the “winner” is likely to go do something else and permit the “looser” to perform the operation later 28

Which is better? Wait-freedom is a stronger property – It is harder to achieve in efficient way – General wait-free algorithms in next lectures Lock-freedom is easier to achieve in practice efficiently – It is more often used for that reason If we had equally efficient algorithms A lock-free and A wait-free, we would choose A wait-free 29

Lock-free and Linearizable Having lock-free algorithms doesn’t change the requirement for linearizable algorithms If the operation succeeds, there has to be a linearization point Lock-freedom just means that the operation will not necessarily succeed 30

Lock-free strong counter w_t count = 0; w_t increment() { while(true) { w_t val = count; // implied load if(CAS(&count, val, val + 1) == val) return val; } 31

Lock-free strong counter (cont) Simple algorithm Starvation is possible – A process could be prevented from returning a value by another process that always succeeds Live-lock is not possible – A process is prevented from returning by a successful process In practice, starvation is not probable 32

Lock-free strong counter (cont) count P0P0 P1P1 inc() load←0 cas←ok load←1 inc() cas←ok cas←fail inc() load←2 cas←ok cas←fail … CAS can continue to fail 33

Wait-free counter? General technique exists – More complex and less efficient – Next lecture Intuition – Processes have to help each other – A process that succeeds the CAS needs to help the other processes to finish their operations 34

Outline Compare-and-swap Practical system model Lock-freedom Linked-list 35

Our goal today Implement the lock-free linked list in our model using load, store and CAS instructions 36

Lock-freedom If a process performs steps of an algorithm for sufficiently long, some process will make progress 37

Linked list API struct Node { int key; int val; Node *next; }; bool Insert(Node *node); Node *Remove(int key); Node *Lookup(int key); 38

What should we do? BOT

Overview Fields key and val are not changed after the element has been inserted into the list We only need to change next field concurrently – During insert and remove – We need to “swing” it from one node to another Keep the list sorted 40

Lookup (version 1) BOT Lookup(20) BOT

Insert First lookup an element If the node with the same key exists, return false If not, we found the position for element insertion Locally prepare node’s next field Use CAS to make sure previous node hasn’t changed and to update it to the new node 42

Insert (version 1) BOT Insert(25) 25 Locally! 22 What if another node was inserted? Use CAS to check the previous node is the same and to update it. If CAS fails, redo the search. If it succeeds, the node was inserted. CAS

Remove First lookup an element If node with the key doesn’t exist, return false If it does, we found the node to remove and the previous node Use CAS to make sure previous node hasn’t changed and to remove the found node 44

Remove (version 1) BOT Remove(20) Lookup finds node As before, use CAS to unlink the node from list CAS 25 What if another node was inserted in the meantime after node 20? Then we accidentally removed that node too!

Is Remove (version 1) correct then? It is not linerizable (lost operation) Which means that it is not correct What should we do about it? 46

Why is there a problem? We update the previous node’s next field and make it point to what we saw was the next field of the removed node But that doesn’t mean this is still the next field of the removed node CAS doesn’t ensure the next field of the removed node is unchanged It ensures the next field of the previous node is unchanged (this is also necessary) 47

Why is there a problem? (cont) We need to ensure two locations remain the same while we update one of them CAS operates on a single location We need double-compare-single-swap operation to (simply) solve this problem – This is not available in hardware 48

How to solve the problem? Mark the removed nodes using a spare bit from the next field – There is a spare bit if we assume word-aligned data structures Marking of the node is the linearization point of the operation After the node is marked, try to unlink it too Other operations can help with the unlinking 49

Remove (version 2) BOT Remove(20) Lookup finds node CAS x First mark the node’s next pointer Then, try to unlink the node CAS What if unlink fails? That is ok if lookup ignores marked nodes

Lookup (version 2) Similar to version 1 Search ignores marked nodes It tries to unlink marked nodes 51

Insert (version 2) Similar to version 1 Search ignores marked nodes It tries to unlink marked nodes The rest is the same – No need to change the insert of the node – CAS fails if the node became marked since it’s been read 52

Pseudo code Node *search(int k, Node **left) { Node *left_next, *right; search_again: do { Node *t = head; Node *t_next = head.next; /* 1: Find left_node and right_node */ do { if(!is_marked(t_next)) { (*left) = t; left_next = t_next; } t = unmark(t_next); if(t == NULL) break; t_next = t.next; } while (is_marked(t_next) || (t.key<k)); right = t; /* 2: Check nodes are adjacent */ if(left_next == right) if((right != NULL) && is_marked(right.next)) goto search_again; else return right_node; /* 3: Remove one or more marked nodes */ if(CAS(&(left.next), left_next, right)) if((right != NULL) && is_marked(right.next)) goto search_again; else return right_node; } while (true); } 53

Pseudo code (cont) Node *Lookup(int key) { Node *right, *left; right = search(key, &left); if(right == NULL || right.key != key) return NULL; else return right; } 54

Pseudo code (cont) bool Insert(Node *node) { Node *right, *left; do { right = search(node.key, &left); if((right != NULL) && (right.key == key)) return false; node.next = right; if(CAS(&(left.next), right, node)) return true; } while (true); } 55

Pseudo code (cont) Node *Remove(int key) { Node *right, *right_next, *left; do { right = search(key, &left); if((right == NULL) || (right.key != key)) return NULL; right_next = right.next; if(!is_marked(right_next)) if(CAS(&(right.next), right_next, mark(right_next))) break; } while(true); if(!CAS(&(left.next), right, right_next)) right = search(right.key, &left); return right; } 56

Important assumption Removed nodes are not reused until all operations that are in progress are finished How to ensure this? – Garbage collector – Concurrent memory managers 57

References T. Harris “A Pragmatic Implementation of Non- Blocking Linked-Lists.” DISC M. M. Michael “Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects.” IEEE Transactions on Parallel and Distributed Systems, June M. Herlihy, V. Luchangco, P. Martin, and M. Moir. “Non- blocking memory management support for dynamic-sized data structures.” ACM Transactions on Computer Systems, May