Copyright 2007 Sun Microsystems, Inc SNZI: Scalable Non-Zero Indicator Yossi Lev (Brown University & Sun Microsystems Laboratories) Joint work with: Faith.

Slides:



Advertisements
Similar presentations
Copyright 2008 Sun Microsystems, Inc Better Expressiveness for HTM using Split Hardware Transactions Yossi Lev Brown University & Sun Microsystems Laboratories.
Advertisements

Time-based Transactional Memory with Scalable Time Bases Torvald Riegel, Christof Fetzer, Pascal Felber Presented By: Michael Gendelman.
Wait-Free Linked-Lists Shahar Timnat, Anastasia Braginsky, Alex Kogan, Erez Petrank Technion, Israel Presented by Shahar Timnat 469-+
Optimistic Methods for Concurrency Control By : H.T. Kung & John T. Robinson Presenters: Munawer Saeed.
Copyright © 2000, Daniel W. Lewis. All Rights Reserved. CHAPTER 10 SHARED MEMORY.
Privatization Techniques for Software Transactional Memory Michael F. Spear, Virendra J. Marathe, Luke Dalessandro, and Michael L. Scott University of.
Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.
CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.
Relaxed Consistency Models. Outline Lazy Release Consistency TreadMarks DSM system.
Concurrency Control II. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
Goldilocks: Efficiently Computing the Happens-Before Relation Using Locksets Tayfun Elmas 1, Shaz Qadeer 2, Serdar Tasiran 1 1 Koç University, İstanbul,
Cache Optimization Summary
Mutual Exclusion By Shiran Mizrahi. Critical Section class Counter { private int value = 1; //counter starts at one public Counter(int c) { //constructor.
McRT-Malloc: A Scalable Non-Blocking Transaction Aware Memory Allocator Ali Adl-Tabatabai Ben Hertzberg Rick Hudson Bratin Saha.
Toward High Performance Nonblocking Software Transactional Memory Virendra J. Marathe University of Rochester Mark Moir Sun Microsystems Labs.
Parallel Processing (CS526) Spring 2012(Week 6).  A parallel algorithm is a group of partitioned tasks that work with each other to solve a large problem.
CSE506: Operating Systems Block Cache. CSE506: Operating Systems Address Space Abstraction Given a file, which physical pages store its data? Each file.
Nested Parallelism in Transactional Memory Kunal Agrawal, Jeremy T. Fineman and Jim Sukha MIT.
Termination Detection Part 1. Goal Study the development of a protocol for termination detection with the help of invariants.
Nested Transactional Memory: Model and Preliminary Architecture Sketches J. Eliot B. Moss Antony L. Hosking.
PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.
1 Lecture 10: TM Pathologies Topics: scalable lazy implementation, paper on TM performance pathologies.
A Scalable Approach to Thread-Level Speculation J. Gregory Steffan, Christopher B. Colohan, Antonia Zhai, and Todd C. Mowry Carnegie Mellon University.
1 Lecture 8: Eager Transactional Memory Topics: implementation details of eager TM, various TM pathologies.
CS510 Advanced OS Seminar Class 10 A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy.
Supporting Nested Transactional Memory in LogTM Authors Michelle J Moravan Mark Hill Jayaram Bobba Ben Liblit Kevin Moore Michael Swift Luke Yen David.
Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.
Testing HCI Usability Testing. Chronological order of testing Individual program units are built and tested (white-box testing / unit testing) Units are.
NUMA coherence CSE 471 Aut 011 Cache Coherence in NUMA Machines Snooping is not possible on media other than bus/ring Broadcast / multicast is not that.
18.8 Concurrency Control by Timestamps - Dongyi Jia - CS257 ID:116 - Spring 2008.
A Parallel, Real-Time Garbage Collector Author: Perry Cheng, Guy E. Blelloch Presenter: Jun Tao.
Software Transactional Memory for Dynamic-Sized Data Structures Maurice Herlihy, Victor Luchangco, Mark Moir, William Scherer Presented by: Gokul Soundararajan.
Shared Address Space Computing: Hardware Issues Alistair Rendell See Chapter 2 of Lin and Synder, Chapter 2 of Grama, Gupta, Karypis and Kumar, and also.
School of Information Technologies Michael Cahill 1, Uwe Röhm and Alan Fekete School of IT, University of Sydney {mjc, roehm, Serializable.
Software Transactional Memory Yoav Cohen Seminar in Distributed Computing Spring 2007 Yoav Cohen Seminar in Distributed Computing Spring 2007.
CALTECH cs184c Spring DeHon CS184c: Computer Architecture [Parallel and Multithreaded] Day 8: April 26, 2001 Simultaneous Multi-Threading (SMT)
Concurrency and Transaction Processing. Concurrency models 1. Pessimistic –avoids conflicts by acquiring locks on data that is being read, so no other.
Optimistic Design 1. Guarded Methods Do something based on the fact that one or more objects have particular states  Make a set of purchases assuming.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Mutual Exclusion.
1 Chapter 9 Synchronization Algorithms and Concurrent Programming Gadi Taubenfeld © 2014 Synchronization Algorithms and Concurrent Programming Synchronization.
Caltech CS184 Spring DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 12: May 3, 2003 Shared Memory.
CALTECH cs184c Spring DeHon CS184c: Computer Architecture [Parallel and Multithreaded] Day 10: May 8, 2001 Synchronization.
Transactional Memory Lecturer: Danny Hendler. 2 2 From the New York Times…
A Methodology for Creating Fast Wait-Free Data Structures Alex Koganand Erez Petrank Computer Science Technion, Israel.
Transactions and Concurrency Control. Concurrent Accesses to an Object Multiple threads Atomic operations Thread communication Fairness.
A Simple Optimistic skip-list Algorithm Maurice Herlihy Brown University & Sun Microsystems Laboratories Yossi Lev Brown University & Sun Microsystems.
CS510 Concurrent Systems Jonathan Walpole. A Methodology for Implementing Highly Concurrent Data Objects.
Monitors and Blocking Synchronization Dalia Cohn Alperovich Based on “The Art of Multiprocessor Programming” by Herlihy & Shavit, chapter 8.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy Slides by Vincent Rayappa.
HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation.
Optimistic Design CDP 1. Guarded Methods Do something based on the fact that one or more objects have particular states Make a set of purchases assuming.
4 November 2005 CS 838 Presentation 1 Nested Transactional Memory: Model and Preliminary Sketches J. Eliot B. Moss and Antony L. Hosking Presented by:
Novel Paradigms of Parallel Programming Prof. Smruti R. Sarangi IIT Delhi.
Maurice Herlihy, Victor Luchangco, Mark Moir, William N. Scherer III
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
Alex Kogan, Yossi Lev and Victor Luchangco
Atomic Operations in Hardware
Atomic Operations in Hardware
Faster Data Structures in Transactional Memory using Three Paths
Cache Coherence for Shared Memory Multiprocessors
Introduction to NewSQL
Ivy Eva Wu.
Non-blocking data structures and transactional memory
Maurice Herlihy, Victor Luchangco, Mark Moir, William N. Scherer III
Lecture 6: Transactions
Multicore programming
Hybrid Transactional Memory
Lecture 1: Introduction
Tim Harris (MSR Cambridge)
Presentation transcript:

Copyright 2007 Sun Microsystems, Inc SNZI: Scalable Non-Zero Indicator Yossi Lev (Brown University & Sun Microsystems Laboratories) Joint work with: Faith Ellen (University of Toronto) Victor Luchangco and Mark Moir (Sun Microsystems Laboratories)

Presence Indicator Threads Arrive and Depart a room Query: Is there anybody in there? Room

01 Simple Solution: Counter Threads Arrive and Depart a room Query: Is there anybody in there? Problem: Not Scalable 210 Counter

Simple Solution: Counter Problem: Not Scalable –Arrive/Depart nonscalable –Query nonscalable Observation: Counter Semantics too strong –Answers: how many threads in room –All we asked: are there any threads in room Task: Exploit weaker semantics to develop A Scalable Non-Zero Indicator (SNZI)

SNZI Specification State: –Surplus: a nonnegative integer Operations: –Arrive: increment surplus –Depart: decrement surplus –Query: return whether surplus ≠ 0 Well Formedness: Surplus ≥ 0

Solution’s Requirements Linearizable Lock Free Query reads a 1-bit indicator in a given word –Update using LL/SC Captures modifications by the outside world (spurious failures) –Scalability: Minimize modifications to indicator bit

Agenda Two SNZI solutions –Base solution: Separate indicator and surplus data (Query scalability) –Hierarchical solution: Implement one SNZI using the other (Arrive/Depart scalability) Applications Performance

Room Separate Surplus and Indicator: Naïve Attempt A simple counter and an indicator bit Set or UnSet bit after updating counter –0  1 transition: set the bit –1  0 transition: unset the bit Counter Indicator

Room Separate Surplus and Indicator: Naïve Attempt A simple counter and an indicator bit Set or UnSet bit after updating counter What can go wrong? Counter Indicator Oh oh… I’m in the dark! Zzzz…

Separate Surplus and Indicator: Naïve Attempt A simple counter and an indicator bit Set or UnSet bit after updating counter What can go wrong? –Delay in setting bit causes unnoticed arrivals –Delay in unsetting bit causes obsolete writes

Separate Surplus and Indicator: Our solution 1. Add an “Announce” bit to the counter word: Says: “Indicator needed to be set” –Set announce on 0  1 transition –Clear announce after setting indicator –“Help” setting indicator if announce bit set

Room Separate Surplus and Indicator: Our solution false Counter & Announce bit Indicator 1. Add an “Announce” bit to the counter word: Says: “Indicator needed to be set” –Set announce on 0  1 transition –Clear announce after setting indicator –“Help” setting indicator if announce bit set

Room Separate Surplus and Indicator: Our solution true Counter & Announce bit Indicator 1. Add an “Announce” bit to the counter word: Says: “Indicator needed to be set” –Set announce on 0  1 transition –Clear announce after setting indicator –“Help” setting indicator if announce bit set Zzzz…

Room Separate Surplus and Indicator: Our solution 1. Add an “Announce” bit to the counter word: Says: “Indicator needed to be set” –Set announce on 0  1 transition –Clear announce after setting indicator –“Help” setting indicator if announce bit set true Counter & Announce bit Indicator Zzzz…

Room Separate Surplus and Indicator: Our solution 1. Add an “Announce” bit to the counter word: Says: “Indicator needed to be set” –Set announce on 0  1 transition –Clear announce after setting indicator –“Help” setting indicator if announce bit set false Counter & Announce bit Indicator Zzzz…

Room Separate Surplus and Indicator: Our solution 2. Prevent obsolete writes: Unset the indicator using LL/SC –Read counter in between, unset only if still false Counter & Announce bit Indicator

Room LL( ) Read Counter, If still 0 SC(, false) Separate Surplus and Indicator: Our solution 2. Prevent obsolete writes: Unset the indicator using LL/SC –Read counter in between, unset only if still false Counter & Announce bit Indicator Indicator wasn’t written in between Wasn’t written before LL

Hierarchical SNZI Base SNZI took care of Query Scalability Hierarchical SNZI: Arrive/Depart Scalability –Implement SNZI using a parent SNZI –Parent surplus > 0 iff a child surplus > 0 Arrange solution in tree Arrive/Depart at leaves, Query the root Base SNZI Hierarchical SNZI Filter

Hierarchical SNZI: The Basics Invariant: Parent surplus > 0 iff child surplus > 0 Similar to base SNZI, use a counter –0  1 transition triggers Arrive at parent –1  0 transition triggers Depart at parent –Help arriving at parent during 0  1 transition Unlike base SNZI –Use intermediate value ½ : 0  ½  1 –Parent is not a bit: it has a surplus Undo extra arrivals

Room Hierarchical SNZI 0 Counter Parent SNZI Surplus 0 Invariant: Parent surplus > 0 iff child surplus > 0 Increment Counter

Room Hierarchical SNZI ½ Counter Parent SNZI Surplus 0 Invariant: Parent surplus > 0 iff child surplus > 0 Increment Counter: 0  ½

Room Hierarchical SNZI ½ Counter Parent SNZI Surplus 1 Invariant: Parent surplus > 0 iff child surplus > 0 Increment Counter: 0  ½ Arrive at Parent Try ½  1 transition

Room Hierarchical SNZI 1 Counter Parent SNZI Surplus 1 Invariant: Parent surplus > 0 iff child surplus > 0 Increment Counter: 0  ½ Arrive at Parent Try ½  1 transition

Room Hierarchical SNZI ½ Counter Parent SNZI Surplus 1 Invariant: Parent surplus > 0 iff child surplus > 0 Increment Counter: 0  ½ Arrive at Parent Try ½  1 transition

Room Hierarchical SNZI ½ Counter Parent SNZI Surplus 1 Invariant: Parent surplus > 0 iff child surplus > 0 Increment Counter: 0  ½ Arrive at Parent Read Counter as ½: Help Arrive at Parent

Room Hierarchical SNZI ½ Counter Parent SNZI Surplus 2 Invariant: Parent surplus > 0 iff child surplus > 0 Increment Counter: 0  ½ Arrive at Parent Read Counter as ½: Help Arrive at Parent Try ½  1 transition

Room Hierarchical SNZI 1 Counter Parent SNZI Surplus 2 Invariant: Parent surplus > 0 iff child surplus > 0 Increment Counter: 0  ½ Arrive at Parent Try ½  1 transition Undo: Depart at Parent Read Counter as ½: Help Arrive at Parent Try ½  1 transition Update counter for myself Only one succeeds: single contribution to parent per 0  1 1

Caveats Counter must have a version number –Avoid the ABA problem –Also in Base algorithm Undoing arrives at parent must happen after the executing Arrive incremented the counter –Helper defers undoing an Arrive operation until after incrementing the counter for itself –Otherwise indicator might “flicker” Detailed scenarios in the paper

Applications Hybrid Transactional Memory (ASPLOS 06) –HW and SW transactions run concurrently –HW transactions pay overhead for conflict detection with SW transactions –Avoid overhead if no SW transactions are running: Are there any SW transactions out there? –Query performance is important Executed by HW transactions

Read Indicators STM: Read Ownership Is any transaction reading this location? In addition: –Reset operation: All readers logically “disappear” New readers can arrive and depart before old readers departed Indicator should work as if old readers are not there –Needed because writer invalidate old readers

Room SNZI-R Added an Epoch to the indicator –Indicator set iff someone in room that arrived in the current epoch Operations: –Reset starts a new epoch –Arrive at current epoch –Depart at the epoch we arrived at –Query returns bit and current epoch Indicator 1975 Epoch 1975 Reset 2007

Room SNZI-R Added an Epoch to the indicator –Indicator set iff someone in room that arrived in the current epoch Operations: –Reset starts a new epoch –Arrive at current epoch –Depart at the epoch we arrived at –Query returns bit and current epoch Indicator 2007 Epoch

Evaluation System: 48-processor Sun Fire TM 6800 Experiment: –Visiting threads: keep arriving and departing. –Query thread: keeps querying the indicator. Various tree depths Measured: –Visit (Arrive+Depart) and Query throughput when varying #visiting threads Compared with a simple counter implementation

Performance: Query Scalability

Performance: Visiting Scalability

SuperSNZI: Simple counter with a SNZI indicator Arrive by modifying counter if not contended Use SNZI otherwise Depart accordingly

Performance: SuperSNZI SuperSNZI: Simple counter with a SNZI indicator Arrive by modifying counter if not contended Use SNZI otherwise Depart accordingly

Conclusion Presence indicator –Can be implemented using a simple counter –Counter semantics too strong  Doesn’t scale –Exploits the weaker semantics we need to provide SNZI: A Scalable Non-Zero Indicator Perform much better than a simple counter –Useful in practice

Thank You!