Progress with Progress Guarantees Erez Petrank - Technion Based on joint work with Anastasia Braginsky, Alex Kogan, Madanlal Musuvathi, Filip Pizlo, and.

Slides:

Advertisements

Similar presentations

Chapter 25 Lists, Stacks, Queues, and Priority Queues

Advertisements

Introduction to Algorithms

Chapter 5: CPU Scheduling

Bounded Model Checking of Concurrent Data Types on Relaxed Memory Models: A Case Study Sebastian Burckhardt Rajeev Alur Milo M. K. Martin Department of.

Cognitive Radio Communications and Networks: Principles and Practice By A. M. Wyglinski, M. Nekovee, Y. T. Hou (Elsevier, December 2009) 1 Chapter 12 Cross-Layer.

Chapter 3: Linked List Tutor: Angie Hui

Energy-Efficient Distributed Algorithms for Ad hoc Wireless Networks Gopal Pandurangan Department of Computer Science Purdue University.

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

Title Subtitle.

Lecture 22, Revision 1 Lecture notes Java code – CodeFromLectures folder* Example class sheets – 4 of these plus solutions Extra examples (quicksort) Lab.

1 Processes and Threads Creation and Termination States Usage Implementations.

1 Interprocess Communication 1. Ways of passing information 2. Guarded critical activities (e.g. updating shared data) 3. Proper sequencing in case of.

1 Data Link Protocols By Erik Reeber. 2 Goals Use SPIN to model-check successively more complex protocols Using the protocols in Tannenbaums 3 rd Edition.

Chapter 1 Introduction Copyright © Operating Systems, by Dhananjay Dhamdhere Copyright © Introduction Abstract Views of an Operating System.

CS16: Introduction to Data Structures & Algorithms

Chapter Objectives To learn about recursive data structures and recursive methods for a LinkedList class To understand how to use recursion to solve the.

Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide

Randomized Algorithms Randomized Algorithms CS648 1.

David Luebke 1 6/7/2014 ITCS 6114 Skip Lists Hashing.

Chapter 24 Lists, Stacks, and Queues

Chapter 4 Memory Management Basic memory management Swapping

Abstract Data Types and Algorithms

Data Structures Using C++

Topic 14 Searching and Simple Sorts "There's nothing in your head the sorting hat can't see. So try me on and I will tell you where you ought to be." -The.

Wait-Free Linked-Lists Shahar Timnat, Anastasia Braginsky, Alex Kogan, Erez Petrank Technion, Israel Presented by Shahar Timnat 469-+

Online Algorithm Huaping Wang Apr.21

Chapter 3 Memory Management

Memory Management.

Success Planner PREPARE FOR EXAMINATIONS Student Wall Planner and Study Guide.

Wait-Free Queues with Multiple Enqueuers and Dequeuers

Two Segments Intersect?

David Luebke 1 8/25/2014 CS 332: Algorithms Red-Black Trees.

Routing and Congestion Problems in General Networks Presented by Jun Zou CAS 744.

Copyright © 2013 by John Wiley & Sons. All rights reserved. HOW TO CREATE LINKED LISTS FROM SCRATCH CHAPTER Slides by Rick Giles 16 Only Linked List Part.

3.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Process An operating system executes a variety of programs: Batch system.

1 Processes and Threads Chapter Processes 2.2 Threads 2.3 Interprocess communication 2.4 Classical IPC problems 2.5 Scheduling.

Processes Management.

© 2004, D. J. Foreman 1 Scheduling & Dispatching.

25 seconds left…...

We will resume in: 25 Minutes.

From Approximative Kernelization to High Fidelity Reductions joint with Michael Fellows Ariel Kulik Frances Rosamond Technion Charles Darwin Univ. Hadas.

Delay Analysis and Optimality of Scheduling Policies for Multihop Wireless Networks Gagan Raj Gupta Post-Doctoral Research Associate with the Parallel.

1. We use models in an attempt to gain understanding and insights about some aspect of the real world. Attempts to model reality assume a priori the existence.

1 © R. Guerraoui The Limitations of Registers R. Guerraoui Distributed Programming Laboratory.

Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.

1 Chapter 4 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.

CS252: Systems Programming Ninghui Li Program Interview Questions.

Mutual Exclusion By Shiran Mizrahi. Critical Section class Counter { private int value = 1; //counter starts at one public Counter(int c) { //constructor.

Maged M. Michael, “Hazard Pointers: Safe Memory Reclamation for Lock- Free Objects” Presentation Robert T. Bauer.

An On-the-Fly Mark and Sweep Garbage Collector Based on Sliding Views Hezi Azatchi - IBM Yossi Levanoni - Microsoft Harel Paz – Technion Erez Petrank –

Locality-Conscious Lock-Free Linked Lists Anastasia Braginsky & Erez Petrank 1.

Tornado: Maximizing Locality and Concurrency in a Shared Memory Multiprocessor Operating System Ben Gamsa, Orran Krieger, Jonathan Appavoo, Michael Stumm.

On the limits of partial compaction Anna Bendersky & Erez Petrank Technion.

Progress Guarantee for Parallel Programs via Bounded Lock-Freedom Erez Petrank – Technion Madanlal Musuvathi- Microsoft Bjarne Steensgaard - Microsoft.

CS510 Advanced OS Seminar Class 10 A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy.

Age-Oriented Concurrent Garbage Collection Harel Paz, Erez Petrank – Technion, Israel Steve Blackburn – ANU, Australia April 05 Compiler Construction Scotland.

November 15, 2007 A Java Implementation of a Lock- Free Concurrent Priority Queue Bart Verzijlenberg.

A Consistency Framework for Iteration Operations in Concurrent Data Structures Yiannis Nikolakopoulos A. Gidenstam M. Papatriantafilou P. Tsigas Distributed.

Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.

Chapter 4 Memory Management Virtual Memory.

A Methodology for Creating Fast Wait-Free Data Structures Alex Koganand Erez Petrank Computer Science Technion, Israel.

Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects Maged M. Michael Presented by Abdulai Sei.

CSC Multiprocessor Programming, Spring, 2012 Chapter 11 – Performance and Scalability Dr. Dale E. Parson, week 12.

CS510 Concurrent Systems Jonathan Walpole. A Methodology for Implementing Highly Concurrent Data Objects.

On the limits of partial compaction Nachshon Cohen and Erez Petrank Technion.

Hazard Pointers: Safe Memory Reclamation for Lock-Free Objects MAGED M. MICHAEL PRESENTED BY NURIT MOSCOVICI ADVANCED TOPICS IN CONCURRENT PROGRAMMING,

Kendo: Efficient Deterministic Multithreading in Software M. Olszewski, J. Ansel, S. Amarasinghe MIT to be presented in ASPLOS 2009 slides by Evangelos.

Faster Data Structures in Transactional Memory using Three Paths

L21: Putting it together: Tree Search (Ch. 6)

Presentation transcript:

Progress with Progress Guarantees Erez Petrank - Technion Based on joint work with Anastasia Braginsky, Alex Kogan, Madanlal Musuvathi, Filip Pizlo, and Bjarne Steensgaard.

Responsive Parallelism Develop responsive parallel systems by developing stronger algorithms and better system support. Main tool: progress guarantees. – Guaranteed responsiveness, good scalability, avoiding deadlocks and livelocks. Particularly important in several domains – real-time systems – Interactive systems (also OS’s) – operating under service level agreement. Always good to have – If performance is not harmed too much.

A Progress Guarantee Intuitively: “ No matter which interleaving is scheduled, my program will make progress. ” “Progress” is something the developer defines. – Specific lines of code

Progress Guarantees Lock-Freedom If you schedule enough steps across all threads, one of them will make progress. Lock-Freedom If you schedule enough steps across all threads, one of them will make progress. Great guarantee, but difficult to achieve. Somewhat weak. Wait-Freedom If you schedule enough steps of any thread, it will make progress. Wait-Freedom If you schedule enough steps of any thread, it will make progress. Obstruction-Freedom If you let any thread run alone for enough steps, it will make progress. Obstruction-Freedom If you let any thread run alone for enough steps, it will make progress. The middle way ?

This Talk Advances with progress guarantees. – Some of it is work-in-progress. Attempts for lock-free garbage collection [PLDI 08, ISMM 07] Bounded lock-freedom and lock-free system services [PLDI 09] A lock-free locality-conscious linked-list [ICDCN 11] A Lock-free balanced tree [submitted] A wait-free queue [PPOPP 11] Making wait-freedom fast [submitted]

A Lock-Free Garbage Collector There is no such thing right now. A recent line of work on real-time GC, allows moving objects concurrently (while the program accesses them). See Pizlo, Petrank, and Steensgaard at [ISMM 2007, PLDI 2008]. What does it mean to “support” lock-freedom?

Services Supporting Lock- Freedom

Consider system services: event counters, memory management, micro-code interpreters, caching, paging, etc. Normally, designs of lock-free algorithms assume that the underlying system operations do not foil lock-freedom.

Can We Trust Services to Support Lock-Freedom ? Valois’ lock-free linked-list algorithm has a well known C++ implementation, which uses the C++ new operation. A hardware problem: lock Free algorithms typically use CAS or LL/SC, but LL/SC implementations are typically weak: spurious failures are permitted. Background threads increase the chaos (if syncing on shared variables). Conclusion: system support matters.

In the Paper Definition of a lock-free supporting service (including GC), Definition of a lock-free program that uses services, A composition theorem for the two. Also: bounded lock-freedom Musuvathi, Petrank, and Steensgaard. Progress Guarantee via Bounded Lock-Freedom. [PLDI 2009]

Open Questions for Progress Guarantees Some important lock-free data structures are not known – Can we build a balanced tree? Wait-free data structures are difficult to design. – Can we design some basic ones? Wait-free implementations are slow. – Can we make them fast?

A Locality-Conscious Linked-List [ICDCN’11] A Lock-Free B-Tree [Submitted] Anastasia Braginsky & Erez Petrank 12

A B-Tree An important instance of balanced trees, suitable for file systems. Typically a large node with many children. Fast access to the leaves in a very short traversal. A node is split or merged when it becomes too dense or sparse.

Lock-Free Locality-Conscious Linked Lists List of constant size ''chunks", with minimal and maximal bounds on the number of elements contained. Each chunk consists of a small linked list. When a chunk gets too sparse or dense, it is split or merged with its preceding chunk. Lock-free, locality-conscious, fast access, scalable

Chunk Split or Merge A chunk that needs splitting or merging is frozen. – No operations are applied on it anymore. – New chunks (with the updated data) replace it. 15

A list Structure (2 Chunks) Chunk A HEAD NextChunk Chunk B NextChunk NULL Key: 3 Data: G Key: 14 Data: K Key: 25 Data: A Key: 67 Data: D Key: 89 Data: M EntriesHead 16

When No More Space for Insertion Chunk A HEAD NextChunk Chunk B NextChunk Key: 3 Data: G Key: 6 Data: B Key: 9 Data: C Key: 14 Data: K Key: 25 Data: A Key: 67 Data: D Key: 89 Data: M EntriesHead Key: 12 Data: H Freeze 17 NULL

Split Chunk A HEAD NextChunk Chunk B NextChunk Key: 3 Data: G Key: 6 Data: B Key: 9 Data: C Key: 14 Data: K Key: 25 Data: A Key: 67 Data: D Key: 89 Data: M EntriesHead Key: 12 Data: H Freeze Chunk C NextChunk Key: 3 Data: G Key: 9 Data: C EntriesHead Key: 6 Data: B Chunk D NextChunk Key: 12 Data: H EntriesHead Key: 14 Data: K 18 NULL

Split Chunk A HEAD NextChunk Chunk B NextChunk Key: 3 Data: G Key: 6 Data: B Key: 9 Data: C Key: 14 Data: K Key: 25 Data: A Key: 67 Data: D Key: 89 Data: M EntriesHead Key: 12 Data: H Freeze Chunk C NextChunk Key: 3 Data: G Key: 9 Data: C EntriesHead Key: 6 Data: B Chunk D NextChunk Key: 12 Data: H EntriesHead Key: 14 Data: K 19 NULL

When a Chunk Gets Sparse HEAD Chunk B NextChunk Key: 25 Data: A Key: 67 Data: D Key: 89 Data: M EntriesHead Chunk C NextChunk Key: 3 Data: G Key: 9 Data: C EntriesHead Key: 6 Data: B Chunk D NextChunk EntriesHead Key: 14 Data: K Freeze master Freeze slave 20 NULL

Merge HEAD Chunk B NextChunk Key: 25 Data: A Key: 67 Data: D Key: 89 Data: M EntriesHead Chunk C NextChunk Key: 3 Data: G Key: 9 Data: C EntriesHead Key: 6 Data: B Chunk D NextChunk Key: 14 Data: K Freeze slave 21 NULL EntriesHead Freeze master Chunk E NextChunk Key: 3 Data: G Key: 6 Data: B Key: 9 Data: C Key: 14 Data: K EntriesHead

Merge HEAD Chunk B NextChunk Key: 25 Data: A Key: 67 Data: D Key: 89 Data: M EntriesHead Chunk C NextChunk Key: 3 Data: G Key: 9 Data: C EntriesHead Key: 6 Data: B Chunk D NextChunk Key: 14 Data: K Freeze slave 22 NULL EntriesHead Freeze master Chunk E NextChunk Key: 3 Data: G Key: 6 Data: B Key: 9 Data: C Key: 14 Data: K EntriesHead

Extending to a Lock-Free B-Tree Each node holds a chunk, handling splits and merges. Design simplified by a stringent methodology. Monotonic life span of a node: infant, normal, frozen, & reclaimed, reduces the variety of possible states. Care with replacement of old nodes with new ones (while data in both old and new node simultaneously). Care with search for a neighboring node to merge with and with coordinating the merge. 23

Having built a lock-free B-Tree, let’s look at a … Wait-Free Queue Kogan and Petrank PPOPP’10

FIFO queues A fundamental and ubiquitous data structures dequeue 32 enqueue 9

Existing wait-free queues There exist universal constructions – but are too costly in time & space. There exist constructions with limited concurrency – [Lamport 83] one enqueuer and one dequeuer. – [David 04] multiple enqueuers, one concurrent dequeuer. – [Jayanti & Petrovic 05] multiple dequeuers, one concurrent enqueuer.

A Wait-Free Queue First wait-free queue (which is dynamic, parallel, and based on CASes). We extend the lock-free design of Michael & Scott. – part of Java Concurrency package “Wait-Free Queues With Multiple Enqueuers and Dequeuers”. Kogan & Petrank [PPOPP 2011]

Building a Wait-Free Data Structure Universal construction skeleton: – Publish an operation before starting execution. – Next, help delayed threads and then start executing operation. – Eventually a stuck thread succeeds because all threads help it. Solve for each specific data structures: – The interaction between threads running the same operation. – In particular, apply each operation exactly once and obtain a consistent return code. The first makes it inefficient. The second makes it hard to design.

Results The naïve wait-free queue is 3x slower than the lock- free one. Optimizations can reduce the ratio to 2x. Can we eliminate the overhead for the wait-free guarantee?

Wait-free algorithms are typically slower than lock-free (but guarantee more). Can we eliminate the overhead completely ? Kogan and Petrank [submitted]

Reducing the Help Overhead Standard method: Help all previously registered operations before executing own operation. But this goes over all threads. Alternative: help only one thread (cyclically). This is still wait-free ! An operation can only be interrupted a limited number of times. Trade-off between efficiency and guarantee. But this is still costly !

Why is Wait-Free Slow ? Because it needs to spend a lot of time on helping others. Main idea: typically help is not needed. – Ask for help when you need it; – Provide help infrequently. Teacher: why are you late for school again? Boy: I helped an old lady cross the street. Teacher: why did it take so long? Boy: because she didn’t want to cross! Teacher: why are you late for school again? Boy: I helped an old lady cross the street. Teacher: why did it take so long? Boy: because she didn’t want to cross!

Fast-Path Slow-Path When executing an operation, start by running the fast lock-free implementation. Upon several failures “switch” to the wait-free implementation. – Ask for help, – Keep trying. Once in a while, threads on the fast path check if help is needed and provide help. The ability to switch between modes Fast path Slow path

Do I need to help ? Start yes Help Someone no Apply my op fast path (at most n times) Success ? no Apply my op using slow path Return yes

Results for the Queue There is no observable overhead ! Implication: it is possible to avoid starvation at no additional cost. – If you can create a fast- and slow- path and can switch between them.

Conclusion Some advances on lock-free garbage collection. Lock-free services, bounded lock-freedom. A few new important data structures with progress guarantees: – Lock-free chunked linked-list – Lock-free B-Tree – Wait-free Queue We have proposed a methodology to make wait- freedom as fast as lock-freedom. 36