A Methodology for Creating Fast Wait-Free Data Structures Alex Koganand Erez Petrank Computer Science Technion, Israel.

Slides:



Advertisements
Similar presentations
Wait-Free Linked-Lists Shahar Timnat, Anastasia Braginsky, Alex Kogan, Erez Petrank Technion, Israel Presented by Shahar Timnat 469-+
Advertisements

CAFÉ: Scalable Task Pool with Adjustable Fairness and Contention Dmitry Basin, Rui Fan, Idit Keidar, Ofer Kiselov, Dmitri Perelman Technion, Israel Institute.
Wait-Free Queues with Multiple Enqueuers and Dequeuers
Progress with Progress Guarantees Erez Petrank - Technion Based on joint work with Anastasia Braginsky, Alex Kogan, Madanlal Musuvathi, Filip Pizlo, and.
Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.
1 Chapter 4 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.
Ch 7 B.
Ch. 7 Process Synchronization (1/2) I Background F Producer - Consumer process :  Compiler, Assembler, Loader, · · · · · · F Bounded buffer.
Mutual Exclusion By Shiran Mizrahi. Critical Section class Counter { private int value = 1; //counter starts at one public Counter(int c) { //constructor.
Silberschatz, Galvin and Gagne ©2007 Operating System Concepts with Java – 7 th Edition, Nov 15, 2006 Chapter 6 (a): Synchronization.
Silberschatz, Galvin and Gagne ©2013 Operating System Concepts – 9 th Edition Chapter 5: Process Synchronization.
Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores.
Wait-Free Reference Counting and Memory Management Håkan Sundell, Ph.D.
Scalable Synchronous Queues By William N. Scherer III, Doug Lea, and Michael L. Scott Presented by Ran Isenberg.
U NIVERSITY OF M ASSACHUSETTS, A MHERST – Department of Computer Science The Implementation of the Cilk-5 Multithreaded Language (Frigo, Leiserson, and.
E81 CSE 532S: Advanced Multi-Paradigm Software Development Chris Gill Department of Computer Science and Engineering Washington University in St. Louis.
Parallel Processing (CS526) Spring 2012(Week 6).  A parallel algorithm is a group of partitioned tasks that work with each other to solve a large problem.
Locality-Conscious Lock-Free Linked Lists Anastasia Braginsky & Erez Petrank 1.
Concurrent Data Structures in Architectures with Limited Shared Memory Support Ivan Walulya Yiannis Nikolakopoulos Marina Papatriantafilou Philippas Tsigas.
Progress Guarantee for Parallel Programs via Bounded Lock-Freedom Erez Petrank – Technion Madanlal Musuvathi- Microsoft Bjarne Steensgaard - Microsoft.
Race Conditions Critical Sections Deker’s Algorithm.
Race Conditions Critical Sections Dekker’s Algorithm.
Introduction to Lock-free Data-structures and algorithms Micah J Best May 14/09.
CS510 Advanced OS Seminar Class 10 A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy.
CS510 Concurrent Systems Class 13 Software Transactional Memory Should Not be Obstruction-Free.
CS533 - Concepts of Operating Systems 1 Class Discussion.
A Dynamic Elimination-Combining Stack Algorithm Gal Bar-Nissan, Danny Hendler and Adi Suissa Department of Computer Science, BGU, January 2011 Presnted.
02/19/2007CSCI 315 Operating Systems Design1 Process Synchronization Notice: The slides for this lecture have been largely based on those accompanying.
Company LOGO Lock-free and Wait-free Slot Scheduling Algorithms Pooja Aggarwal Smruti R. Sarangi Computer Science, IIT Delhi, India 1.
SUPPORTING LOCK-FREE COMPOSITION OF CONCURRENT DATA OBJECTS Daniel Cederman and Philippas Tsigas.
שירן חליבה Concurrent Queues. Outline: Some definitions 3 queue implementations : A Bounded Partial Queue An Unbounded Total Queue An Unbounded Lock-Free.
Synchronization (Barriers) Parallel Processing (CS453)
Parallel Programming Philippas Tsigas Chalmers University of Technology Computer Science and Engineering Department © Philippas Tsigas.
Simple Wait-Free Snapshots for Real-Time Systems with Sporadic Tasks Håkan Sundell Philippas Tsigas.
Concurrency, Mutual Exclusion and Synchronization.
Håkan Sundell, Chalmers University of Technology 1 NOBLE: A Non-Blocking Inter-Process Communication Library Håkan Sundell Philippas.
Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.
1 (Worker Queues) cs What is a Thread Pool? A collection of threads that are created once (e.g. when a server starts) That is, no need to create.
A Consistency Framework for Iteration Operations in Concurrent Data Structures Yiannis Nikolakopoulos A. Gidenstam M. Papatriantafilou P. Tsigas Distributed.
COMP 111 Threads and concurrency Sept 28, Tufts University Computer Science2 Who is this guy? I am not Prof. Couch Obvious? Sam Guyer New assistant.
Maged M.Michael Michael L.Scott Department of Computer Science Univeristy of Rochester Presented by: Jun Miao.
Jeremy Denham April 7,  Motivation  Background / Previous work  Experimentation  Results  Questions.
11/18/20151 Operating Systems Design (CS 423) Elsa L Gunter 2112 SC, UIUC Based on slides by Roy Campbell, Sam.
CSE 425: Concurrency II Semaphores and Mutexes Can avoid bad inter-leavings by acquiring locks –Guard access to a shared resource to take turns using it.
Executing Parallel Programs with Potential Bottlenecks Efficiently Yoshihiro Oyama Kenjiro Taura Akinori Yonezawa {oyama, tau,
Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009.
Computer Network Lab. Korea University Computer Networks Labs Se-Hee Whang.
The read-copy-update mechanism for supporting real-time applications on shared-memory multiprocessor systems with Linux Guniguntala et al.
CS510 Concurrent Systems Jonathan Walpole. A Methodology for Implementing Highly Concurrent Data Objects.
Monitors and Blocking Synchronization Dalia Cohn Alperovich Based on “The Art of Multiprocessor Programming” by Herlihy & Shavit, chapter 8.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy Slides by Vincent Rayappa.
Distributed Algorithms (22903) Lecturer: Danny Hendler The wait-free hierarchy and the universality of consensus This presentation is based on the book.
Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects MAGED M. MICHAEL PRESENTED BY NURIT MOSCOVICI ADVANCED TOPICS IN CONCURRENT PROGRAMMING,
CS510 Concurrent Systems Tyler Fetters. A Methodology for Implementing Highly Concurrent Data Objects.
Scalable lock-free Stack Algorithm Wael Yehia York University February 8, 2010.
Implementing Lock. From the Previous Lecture  The “too much milk” example shows that writing concurrent programs directly with load and store instructions.
1 Critical Section Problem CIS 450 Winter 2003 Professor Jinhua Guo.
Distributed Algorithms (22903) Lecturer: Danny Hendler Lock-free stack algorithms.
CSC 8420 Advanced Operating Systems Georgia State University Yi Pan Transactions are communications with ACID property: Atomicity: all or nothing Consistency:
Scalable Computing model : Lock free protocol By Peeyush Agrawal 2010MCS3469 Guided By Dr. Kolin Paul.
Background on the need for Synchronization
Practical Non-blocking Unordered Lists
Anders Gidenstam Håkan Sundell Philippas Tsigas
Designing Parallel Algorithms (Synchronization)
Yiannis Nikolakopoulos
Software Transactional Memory Should Not be Obstruction-Free
Multicore programming
Chapter 6: Synchronization Tools
CSE 542: Operating Systems
Presentation transcript:

A Methodology for Creating Fast Wait-Free Data Structures Alex Koganand Erez Petrank Computer Science Technion, Israel

Concurrency & (Non-blocking) synchronization  Concurrent data-structures require (fast and scalable) synchronization Non-blocking synchronization:  No thread is blocked in waiting for another thread to complete  no locks / critical sections 2

Lock-free (LF) algorithms Among all threads trying to apply operations on the data structure, one will succeed  Opportunistic approach  read some part of the data structure  make an attempt to apply an operation  when failed, retry  Many scalable and efficient algorithms Global progress  All but one threads may starve 3

Wait-free (WF) algorithms 4  A thread completes its operation a bounded #steps  regardless of what other threads are doing  Particularly important property in several domains  e.g., real-time systems and operating systems  Commonly regarded as too inefficient and complicated to design

The overhead of wait-freedom  Much of the overhead is because of helping  key mechanism employed by most WF algorithms  controls the way threads help each other with their operations Can we eliminate the overhead?  The goal: average-case efficiency of lock-freedom and worst-case bound of wait-freedom 5

Why is helping slow? 6  A thread helps others immediately when it starts its operation  All threads help others in exactly the same order  contention  redundant work  Each operation has to be applied exactly once  usually results in a higher # expensive atomic operations Lock-free MS-queue (PODC, 1996) Wait-free KP-queue (PPOPP, 2011) # CASs in enqueue 23 # CASs in dequeue 14

Reducing the overhead of helping Main observation:  “Bad” cases happen, but are very rare  Typically a thread can complete without any help  if only it had a chance to do that … Main ideas:  Ask for help only when you really need it  i.e., after trying several times to apply the operation  Help others only after giving them a chance to proceed on their own  delayed helping 7

Fast-path-slow-path methodology  Start operation by running its (customized) lock-free implementation  Upon several failures, switch into a (customized) wait- free implementation  notify others that you need help  keep trying  Once in a while, threads on the fast path check if their help is needed and provide help Fast path Slow path Delayed helping 8

Do I need to help ? Start yes Help Someone no Apply my op using fast path (at most N times) Success? no Apply my op using slow path (until success) Apply my op using slow path (until success) Return yes Fast-path-slow-path generic scheme 9 Different threads may run on two paths concurrently !

Fast-path-slow-path: queue example 10 Fast path (MS-queue) Slow path (KP-queue)

Thread ID Fast-path-slow-path: queue example Internal structures state 9 true false null 4 true null 9 false null phase pending enqueue node

Thread ID Fast-path-slow-path: queue example Internal structures state 9 true false null 4 true null 9 false null phase pending enqueue node Counts # ops on the slow path

Thread ID Fast-path-slow-path: queue example Internal structures state 9 true false null 4 true null 9 false null phase pending enqueue node Is there a pending operation on the slow path?

Thread ID Fast-path-slow-path: queue example Internal structures state 9 true false null 4 true null 9 false null phase pending enqueue node What is the pending operation?

Thread ID Fast-path-slow-path: queue example Internal structures curTid lastPhase nextCheck helpRecords

Thread ID Fast-path-slow-path: queue example Internal structures curTid lastPhase nextCheck helpRecords ID of the next thread that I will try to help

Thread ID Fast-path-slow-path: queue example Internal structures curTid lastPhase nextCheck helpRecords Phase # of that thread at the time the record was created

Thread ID Fast-path-slow-path: queue example Internal structures curTid lastPhase nextCheck helpRecords Decrements with every my operation. Check if my help is needed when this counter reaches 0 HELPING_DELAY controls the frequency of helping checks

Fast-path-slow-path: queue example Fast path 1. help_if_needed() 2. int trials = 0 while (trials++ < MAX_FAILURES) { apply_op_with_customized_LF_alg (finish if succeeded) } 3. switch to slow path  LF algorithm customization is required to synchronize operations run on two paths MAX_FAILURES controls the number of trials on the fast path 19

Fast-path-slow-path: queue example Slow path 1. my phase announce my operation (in state) 3. apply_op_with_customized_WF_alg (until finished)  WF algorithm customization is required to synchronize operations run on two paths 20

Performance evaluation  32-core Ubuntu server with OpenJDK 1.6  GHz quadcore AMD 8356 processors  The queue is initially empty  Each thread iteratively performs (100k times):  Enqueue-Dequeue benchmark: enqueue and then dequeue  Measure completion time as a function of # threads

Performance evaluation 22

Performance evaluation 23 MAX_FAILURES HELPING_DELAY

Performance evaluation 24

The impact of configuration parameters 25 MAX_FAILURES HELPING_DELAY

The use of the slow path 26 MAX_FAILURES HELPING_DELAY

Tuning performance parameters  Why not just always use large values for both parameters (MAX_FAILURES, HELPING_DELAY)?  (almost) always eliminate slow path  Lemma: The number of steps required for a thread to complete an operation on the queue in the worst- case is O(MAX_FAILURES + HELPING_DELAY * n 2 ) → Tradeoff between average-case performance and worst-case completion time bound 27

Summary 28  A novel methodology for creating fast wait-free data structures  key ideas: two execution paths + delayed helping  good performance when the fast path is extensively utilized  concurrent operations can proceed on both paths in parallel  Can be used in other scenarios  e.g., running real-time and non-real-time threads side-by-side

Thank you! Questions? 29