Nonblocking Transactions Without Indirection Using Alert-on-Update Michael Spear Arrvindh Shriraman Luke Dalessandro Sandhya Dwarkadas Michael Scott University.

Slides:



Advertisements
Similar presentations
Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.
Advertisements

Enabling Speculative Parallelization via Merge Semantics in STMs Kaushik Ravichandran Santosh Pande College.
Anshul Kumar, CSE IITD CSL718 : VLIW - Software Driven ILP Hardware Support for Exposing ILP at Compile Time 3rd Apr, 2006.
CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.
Toward High Performance Nonblocking Software Transactional Memory Virendra J. Marathe University of Rochester Mark Moir Sun Microsystems Labs.
The Kernel Abstraction. Challenge: Protection How do we execute code with restricted privileges? – Either because the code is buggy or if it might be.
Chapter 6 Limited Direct Execution
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
Transactional Memory (TM) Evan Jolley EE 6633 December 7, 2012.
1 MetaTM/TxLinux: Transactional Memory For An Operating System Hany E. Ramadan, Christopher J. Rossbach, Donald E. Porter and Owen S. Hofmann Presenter:
G Robert Grimm New York University Extensibility: SPIN and exokernels.
Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.
3.5 Interprocess Communication
Translation Buffers (TLB’s)
1 Last Class: Introduction Operating system = interface between user & architecture Importance of OS OS history: Change is only constant User-level Applications.
Unbounded Transactional Memory Paper by Ananian et al. of MIT CSAIL Presented by Daniel.
CPS110: Implementing threads/locks on a uni-processor Landon Cox.
1 OS & Computer Architecture Modern OS Functionality (brief review) Architecture Basics Hardware Support for OS Features.
Why The Grass May Not Be Greener On The Other Side: A Comparison of Locking vs. Transactional Memory Written by: Paul E. McKenney Jonathan Walpole Maged.
KAUSHIK LAKSHMINARAYANAN MICHAEL ROZYCZKO VIVEK SESHADRI Transactional Memory: Hybrid Hardware/Software Approaches.
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
An Integrated Hardware-Software Approach to Transactional Memory Sean Lie Theory of Parallel Systems Monday December 8 th, 2003.
Caching and Virtual Memory. Main Points Cache concept – Hardware vs. software caches When caches work and when they don’t – Spatial/temporal locality.
TRANSACT 2006 Hardware Acceleration of Software Transactional Memory 1 Hardware Acceleration of Software Transactional Memory Arrvindh Shriraman, Virendra.
Three fundamental concepts in computer security: Reference Monitors: An access control concept that refers to an abstract machine that mediates all accesses.
Sutirtha Sanyal (Barcelona Supercomputing Center, Barcelona) Accelerating Hardware Transactional Memory (HTM) with Dynamic Filtering of Privatized Data.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
A Qualitative Survey of Modern Software Transactional Memory Systems Virendra J. Marathe Michael L. Scott.
Lowering the Overhead of Software Transactional Memory Virendra J. Marathe, Michael F. Spear, Christopher Heriot, Athul Acharya, David Eisenstat, William.
Hybrid Transactional Memory Sanjeev Kumar, Michael Chu, Christopher Hughes, Partha Kundu, Anthony Nguyen, Intel Labs University of Michigan Intel Labs.
Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.
Transactional Coherence and Consistency Presenters: Muhammad Mohsin Butt. (g ) Coe-502 paper presentation 2.
Precomputation- based Prefetching By James Schatz and Bashar Gharaibeh.
CS510 Concurrent Systems Why the Grass May Not Be Greener on the Other Side: A Comparison of Locking and Transactional Memory.
Computer Network Lab. Korea University Computer Networks Labs Se-Hee Whang.
Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill & David A. Wood Presented by: Eduardo Cuervo.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Demand Paging.
Software Transactional Memory Should Not Be Obstruction-Free Robert Ennals Presented by Abdulai Sei.
Concurrency case studies in UNIX John Chapin October 26, 1998.
Making the Fast Case Common and the Uncommon Case Simple in Unbounded Transnational Memory Qi Zhu CSE 340, Spring 2008 University of Connecticut Paper.
MULTIVIE W Slide 1 (of 21) Software Transactional Memory Should Not Be Obstruction Free Paper: Robert Ennals Presenter: Emerson Murphy-Hill.
Where Testing Fails …. Problem Areas Stack Overflow Race Conditions Deadlock Timing Reentrancy.
Running Commodity Operating Systems on Scalable Multiprocessors Edouard Bugnion, Scott Devine and Mendel Rosenblum Presentation by Mark Smith.
Chapter 6 Limited Direct Execution Chien-Chung Shen CIS/UD
Free Transactions with Rio Vista Landon Cox April 15, 2016.
December 1, 2006©2006 Craig Zilles1 Threads & Atomic Operations in Hardware  Previously, we introduced multi-core parallelism & cache coherence —Today.
Free Transactions with Rio Vista
Memory Protection: Kernel and User Address Spaces
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
Transactional Memory : Hardware Proposals Overview
PHyTM: Persistent Hybrid Transactional Memory
Virtual Memory Use main memory as a “cache” for secondary (disk) storage Managed jointly by CPU hardware and the operating system (OS) Programs share main.
Faster Data Structures in Transactional Memory using Three Paths
Memory Protection: Kernel and User Address Spaces
Address Translation for Manycore Systems
Two Ideas of This Paper Using Permissions-only Cache to deduce the rate at which less-efficient overflow handling mechanisms are invoked. When the overflow.
Memory Protection: Kernel and User Address Spaces
Memory Protection: Kernel and User Address Spaces
Arrvindh Shriraman, Michael F. Spear, Hemayet Hossain, Virendra J
Free Transactions with Rio Vista
Translation Buffers (TLB’s)
Hybrid Transactional Memory
Translation Buffers (TLB’s)
CSE 153 Design of Operating Systems Winter 19
Lecture 8: Efficient Address Translation
Lecture 23: Transactional Memory
Translation Buffers (TLBs)
Dynamic Performance Tuning of Word-Based Software Transactional Memory
Review What are the advantages/disadvantages of pages versus segments?
Memory Protection: Kernel and User Address Spaces
Presentation transcript:

Nonblocking Transactions Without Indirection Using Alert-on-Update Michael Spear Arrvindh Shriraman Luke Dalessandro Sandhya Dwarkadas Michael Scott University of Rochester

M. Spear Nonblocking Transactions Without Indirection Using AOU2 Software Transactional Memory Memory transactions –Code regions identified by the programmer –Guaranteed to be atomic, consistent, and isolated –An alternative to locks Speculative parallelism Under the hood: –Rollback / retry mechanism –Frequent checks ensure consistency of reads Attach version# to every location To read: remember {location, version#} To write: store in private buffer To commit: 1.lock all write locations 2.check version#s of reads abort/retry on conflict 3.replay writes from private buffer 4.release locks, update version#s Simple 2-phase locking STM

M. Spear Nonblocking Transactions Without Indirection Using AOU3 Nonblocking STM How can we commit speculative writes atomically without locking? Tx 1 will modify O 1 …O 4 1.Tx 1 generates speculative writes 2.Tx 1 acquires O 1 …O 4 3.Single atomic operation –Changes Tx 1 to Committed –Makes writes permanent –Releases O 1 …O 4 O 1 AAAAA Tx 1 Active Tx 1 Committed O 2 BBBBB O 4 DDDDD O 3 CCCCC O 1 ’ O 2 ’ O 3 ’ O 4 ’ 44444

M. Spear Nonblocking Transactions Without Indirection Using AOU4 Indirection-Based Nonblocking STM Locator object –Lists last version –Lists next version –Choice depends on state of owner Costs of indirection: –Increased working set –More capacity/coherence misses Existing indirection-free solutions are complex Owner Old Version New Version O1’ BBBBB O1 AAAAA Tx 1 Active DSTM-style Metadata [Herlihy et al. PODC 03]

M. Spear Nonblocking Transactions Without Indirection Using AOU5Outline Background Alert-on-Update (AOU) AOU for indirection-free STM AOU for lightweight validation Evaluation Future work Conclusions

M. Spear Nonblocking Transactions Without Indirection Using AOU6Alert-on-Update Claim: some cache coherence events are interesting Alert-on-Update (AOU) –Special instruction marks cache lines of interest –Cache controller notifies processor when marked line is evicted –Processor immediately jumps to user-mode handler No O/S involvement or context switching (but can be virtualized across context switches)

M. Spear Nonblocking Transactions Without Indirection Using AOU7 AOU Hardware Requirements Registers: –Address of handler, PC at time of alert –Extra status bits for cause of alert, disabling alerts –Extra entry in interrupt vector table Cache: –One extra bit per cache line Instructions: –Set/clear handler –Mark and load line ( aload ) –Un-mark line ( arelease ) –Un-mark all lines –Enable/disable alerts Lightweight implementation supporting only one AOU line adds one register, removes need for extra bits in cache

M. Spear Nonblocking Transactions Without Indirection Using AOU8 Current Implementation Limitations Virtualization is the responsibility of user code –Context switch clears all alert bits, calls handler on return Handler can re-aload lines –Alerts are deferred on other kernel calls Limited by size of cache Limited precision –Alerts masked within handler –Location causing alert not currently provided

M. Spear Nonblocking Transactions Without Indirection Using AOU9 Simple, Nonblocking, Indirection-Free STM Only one AOU line required per processor STM stores speculative writes in per-object buffers To write (after commit), use AOU revocable locks –Lock the object, replay stores, release lock –Only lock/replay one location/object at a time Version#/Owner/Lock Redo Log Object Contents Old Version# Master Copy In-Progress Modifications Data Pointer

M. Spear Nonblocking Transactions Without Indirection Using AOU10 Revocable Locks with AOU Our lock protects an idempotent operation –Anyone can replay stores; none may use object until replay is complete Use AOU to guard lock –Revocation immediately halts replay in current thread –Wait (briefly) before re-acquire –Lock release immediately visible to waiting threads try set_handler({throw A}) aload(lock) if (version changed) arelease(lock) goto bottom if (lock->locked) wait; overwrite lock replay writes release lock (version++) arelease(lock) catch (A) goto top

M. Spear Nonblocking Transactions Without Indirection Using AOU11 AOU for Lightweight Validation Suppose we can aload many lines Recall 2PL STM algorithm On read, don’t store {location, version#} –Instead, aload(location) At commit, don’t validate –Any conflict would have caused an alert On alert, rollback/retry Attach version# to every location To read: –remember {location, version#} –aload(location) To write: –store in private buffer To commit: 1.lock all write locations 2.check version#s of reads 3.replay writes from private buffer 4.release locks, update version#s

M. Spear Nonblocking Transactions Without Indirection Using AOU12 AOU for Lightweight Validation Many TMs validate on every load of a new location –O(n 2 ) overhead AOU eliminates this overhead for n < sizeof(cache) –Limited by associativity Fallback to validation only for additional locations

M. Spear Nonblocking Transactions Without Indirection Using AOU13Evaluation 6 Runtime Systems –RSTM (nonblocking, indirection, software only) –RTM-Lite (RSTM + AOU) –LOCK_TM (indirection free, no AOU) –AOU_1 (indirection-free, 1 AOU line) –AOU_N (indirection-free, many AOU lines) –CGL (coarse locks) Simulator –Simics/GEMS –16-way CMP (1.2GHz in-order, single issue) –Private 64KB L1 (1 cycle latency) –Shared 8MB L2 (20 cycle latency)

M. Spear Nonblocking Transactions Without Indirection Using AOU14 Indirection Reduction Reducing indirection has marginal impact - Working set is small - Fewer cache misses at high thread counts AOU adds some overhead -In-order exaggerates try/catch cost (normalized to RSTM, 1 thread)

M. Spear Nonblocking Transactions Without Indirection Using AOU15 Indirection Reduction Reducing indirection can hurt - Additional validation required (could reduce with compiler support) Quadratic validation still dominates (normalized to RSTM, 1 thread)

M. Spear Nonblocking Transactions Without Indirection Using AOU16 Validation Reduction AOU scales, doesn’t admit false positives Outperforms other validation heuristics (normalized to RSTM, 1 thread)

M. Spear Nonblocking Transactions Without Indirection Using AOU17 Validation Reduction Indirection-free has excess validation - Could reduce by cloning code paths Still almost 2x speedup, scalable (normalized to RSTM, 1 thread)

M. Spear Nonblocking Transactions Without Indirection Using AOU18 Future Work Non-TM uses (may require AOU for local writes) –Fast user-mode thread wakeup –Active messages –Debugging, watchpoints, code security –Poll-free asynchronous I/O Additional hardware acceleration for STM –Programmable Data Isolation (see our paper at ISCA tomorrow)

M. Spear Nonblocking Transactions Without Indirection Using AOU19Conclusions Alert-on-update is a simple, promising extension to modern ISAs –Enables low overhead, indirection-free nonblocking STM –Effectively removes O(n 2 ) validation overhead –Potential benefit to many shared memory algorithms The effect of indirection on STM is complex –Read-only objects are no longer immutable –Extra validation can be reduced with compiler support –Effect exaggerated by small objects, in-order simulator

Additional Performance Charts

M. Spear Nonblocking Transactions Without Indirection Using AOU21 Hash Table

M. Spear Nonblocking Transactions Without Indirection Using AOU22 Red-Black Tree

M. Spear Nonblocking Transactions Without Indirection Using AOU23 Linked List with Early Release

M. Spear Nonblocking Transactions Without Indirection Using AOU24LFUCache

M. Spear Nonblocking Transactions Without Indirection Using AOU25 Random Graph