Colorama: Architectural Support for Data-Centric Synchronization Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007 Shimin.

Slides:



Advertisements
Similar presentations
Concurrent programming for dummies (and smart people too) Tim Harris & Keir Fraser.
Advertisements

Enabling Speculative Parallelization via Merge Semantics in STMs Kaushik Ravichandran Santosh Pande College.
Multiprocessor Architectures for Speculative Multithreading Josep Torrellas, University of Illinois The Bulk Multicore Architecture for Programmability.
Alias Speculation using Atomic Regions (To appear at ASPLOS 2013) Wonsun Ahn*, Yuelu Duan, Josep Torrellas University of Illinois at Urbana Champaign.
CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.
5.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 5: CPU Scheduling.
Chapter 6 Limited Direct Execution
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
Considerations for Mondriaan-like Systems 2009 Workshop on Duplicating, Deconstructing, and Debunking Emmett Witchel University of Texas at Austin.
PARALLEL PROGRAMMING with TRANSACTIONAL MEMORY Pratibha Kona.
“THREADS CANNOT BE IMPLEMENTED AS A LIBRARY” HANS-J. BOEHM, HP LABS Presented by Seema Saijpaul CS-510.
1 MetaTM/TxLinux: Transactional Memory For An Operating System Hany E. Ramadan, Christopher J. Rossbach, Donald E. Porter and Owen S. Hofmann Presenter:
[ 1 ] Agenda Overview of transactional memory (now) Two talks on challenges of transactional memory Rebuttals/panel discussion.
Active Messages: a Mechanism for Integrated Communication and Computation von Eicken et. al. Brian Kazian CS258 Spring 2008.
Language Support for Lightweight transactions Tim Harris & Keir Fraser Presented by Narayanan Sundaram 04/28/2008.
Unbounded Transactional Memory Paper by Ananian et al. of MIT CSAIL Presented by Daniel.
Peter Juszczyk CS 492/493 - ISGS. // Is this C# or Java? class TestApp { static void Main() { int counter = 0; counter++; } } The answer is C# - In C#
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
Highly Available ACID Memory Vijayshankar Raman. Introduction §Why ACID memory? l non-database apps: want updates to critical data to be atomic and persistent.
Prospector : A Toolchain To Help Parallel Programming Minjang Kim, Hyesoon Kim, HPArch Lab, and Chi-Keung Luk Intel This work will be also supported by.
A Portable Virtual Machine for Program Debugging and Directing Camil Demetrescu University of Rome “La Sapienza” Irene Finocchi University of Rome “Tor.
1-1 Embedded Network Interface (ENI) API Concepts Shared RAM vs. FIFO modes ENI API’s.
Copyright © 2005 Elsevier Chapter 8 :: Subroutines and Control Abstraction Programming Language Pragmatics Michael L. Scott.
Thread-Level Speculation Karan Singh CS
The Performance of Spin Lock Alternatives for Shared-Memory Multiprocessors THOMAS E. ANDERSON Presented by Daesung Park.
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
Synchronization Transformations for Parallel Computing Pedro Diniz and Martin Rinard Department of Computer Science University of California, Santa Barbara.
CUDA - 2.
MAL 3 - Procedures Lecture 13. MAL procedure call The use of procedures facilitates modular programming. Four steps to transfer to and return from a procedure:
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
Java Thread and Memory Model
CS510 Concurrent Systems Why the Grass May Not Be Greener on the Other Side: A Comparison of Locking and Transactional Memory.
CSC Multiprocessor Programming, Spring, 2012 Chapter 11 – Performance and Scalability Dr. Dale E. Parson, week 12.
Processes and Virtual Memory
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Memory Management Overview.
13-1 Chapter 13 Concurrency Topics Introduction Introduction to Subprogram-Level Concurrency Semaphores Monitors Message Passing Java Threads C# Threads.
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science Computer Systems Principles Synchronization Emery Berger and Mark Corner University.
Hardware process When the computer is powered up, it begins to execute fetch-execute cycle for the program that is stored in memory at the boot strap entry.
By: Rob von Behren, Jeremy Condit and Eric Brewer 2003 Presenter: Farnoosh MoshirFatemi Jan
Efficient software-based fault isolation Robert Wahbe, Steven Lucco, Thomas Anderson & Susan Graham Presented by: Stelian Coros.
HARD: Hardware-Assisted lockset- based Race Detection P.Zhou, R.Teodorescu, Y.Zhou. HPCA’07 Shimin Chen LBA Reading Group Presentation.
OSes: 2. Structs 1 Operating Systems v Objective –to give a (selective) overview of computer system architectures Certificate Program in Software Development.
AtomCaml: First-class Atomicity via Rollback Michael F. Ringenburg and Dan Grossman University of Washington International Conference on Functional Programming.
Eraser: A dynamic Data Race Detector for Multithreaded Programs Stefan Savage, Michael Burrows, Greg Nelson, Patrick Sobalvarro, Thomas Anderson Presenter:
Flashback : A Lightweight Extension for Rollback and Deterministic Replay for Software Debugging Sudarshan M. Srinivasan, Srikanth Kandula, Christopher.
Mutual Exclusion -- Addendum. Mutual Exclusion in Critical Sections.
LECTURE 19 Subroutines and Parameter Passing. ABSTRACTION Recall: Abstraction is the process by which we can hide larger or more complex code fragments.
Chapter 6 Limited Direct Execution Chien-Chung Shen CIS/UD
Free Transactions with Rio Vista Landon Cox April 15, 2016.
Speculative Lock Elision
Memory Consistency Models
Atomic Operations in Hardware
Multithreading Tutorial
Atomic Operations in Hardware
Memory Consistency Models
Automatic Detection of Extended Data-Race-Free Regions
Chapter 9 :: Subroutines and Control Abstraction
Lecture 19: Transactional Memories III
Chap. 8 :: Subroutines and Control Abstraction
Chap. 8 :: Subroutines and Control Abstraction
Multithreading Tutorial
Hardware Works, Software Doesn’t: Enforcing Modularity with Mondriaan Memory Protection Emmett Witchel Krste Asanović MIT Lab for Computer Science.
Concurrency: Mutual Exclusion and Process Synchronization
Multithreading Tutorial
Programming with Shared Memory
Memory Consistency Models
CSE 153 Design of Operating Systems Winter 19
rePLay: A Hardware Framework for Dynamic Optimization
CSC Multiprocessor Programming, Spring, 2011
Presentation transcript:

Colorama: Architectural Support for Data-Centric Synchronization Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007 Shimin Chen LBA Reading Group Presentation

Motivation Synchronization is a challenging step in parallel programming Transactional Memory helpful but still complicated Programmers have to reason non-locally Code-centric approach Data-Centric Synchronization (DSC) desirable Associate synchronization constraints with data structures Which data items should be in the same critical section System automatically inserts sync operations into code Reason locally

What’s New? Existing DCS proposal are SW-only (S-DCS) Cannot handle C/C++ pointer aliasing Unrealistic New proposal: hardware DCS (H-DCS) Colorama HW primitives to start and exit critical sections Independent of the underlying sync mechanisms

Outline Introduction Data-Centric Synchronization (DCS) Architectures of Colorama Programming with Colorama Evaluation Conclusion

Data-Centric Synchronization (DCS) Data consistency domain Two threads cannot access the same domain at the same time For example: X, and Y are in the same domain If a thread is accessing X, no other threads can access X & Y System needs to automatically infer entry and exit points of critical sections: Entry: access to data in a domain Exit: define a simple, clear exit policy and let programmers write code to conform to this policy

Software DCS (S-DCS) Vaziri et al’s Atomic Sets Compiler and language extensions to Java Data consistency domain: atomic set, subset of fields of a Java class Entry point: compiler analysis Exit policy: insert exit point In the same method as the entry point and Right before method return

Colorama: Hardware DCS Data consistency domain: color Data item belongs to a domain: colored Entry point: detected by HW Exit policy: driven by compiler Examples:

Examples Cont’d

Outline Introduction Data-Centric Synchronization (DCS) Architectures of Colorama Programming with Colorama Evaluation Conclusion

Structures Overview Every colored data item has an entry in Palette (details next) Per-thread: all 3 structures have the same number of entries Owned color array: current critical sections CAB, CRB: used for exit policy

Palette Palette based on Mondrian Memory Protection system (Witchel et al. ASPLOS’02) – the white part Extend with color ID (the gray part) SW managed HW

Entry Point HW monitors each load and store Check cached Palette for the mem op Check owned colors array Trigger a user-level SW handler if accessing a colored region not owned Handler for entry point: Add color ID into owned colors array Start critical section (e.g. begin transaction)

Exit Policy Exit a critical section when the thread returns from the subroutine where the critical section was entered

Implementing Exit Policy Color acquire bitmap register (CAB) and color release bitmap register (CRB) CAB automatically set by HW at entry points Compiler generates the following code: Subroutine prologue: Push CAB CAB  0 Subroutine epilogue: CRB  CAB Pop CAB Upon write to CRB: HW triggers user-level handler Handler: remove Color ID from owned color array, exit critical section

Handling Pointers as Subroutine Arguments Perform multiple operations on a structure together Propose “colorcheck” instruction

Using Locks as Sync Mechanisms Colorama can also use locks Two potential problems: Longer critical section thus maybe more contention May deadlock See evaluations

Outline Introduction Data-Centric Synchronization (DCS) Architectures of Colorama Programming with Colorama Evaluation Conclusion

Correctness Critical sections of the same color are serialized Correctly colored programs  data-race free Possible programming errors: Fail to color shared data structures Use different colors to data that should be protected together

Compatibility Issues Legacy libraries that do not use Colorama OK if they explicitly protect lib data using locks, etc. Colorama protects application data outside of lib Cases requires extensions to Colorama Worker thread executes an infinite loop that processes incoming request Needs to release lock, wait, acquire lock in the same loop Colorama extensions: getcolorid etc.

Complete API

Outline Introduction Data-Centric Synchronization (DCS) Architectures of Colorama Programming with Colorama Evaluation Conclusion

Setup Evaluation is based on analyzing applications by using a Pin-based tool

Is the Exit Policy Suitable? Matched: lock acquire & release in same subroutine Almost all dynamic and 95% static critical sections Answer: Yes

Critical Section Size Increase

How often multiple independent critical sections are in the same subroutine? Potential deadlocks 1% dynamic and 4% static Detailed analysis shows that the resulting lock order always same, thus no deadlocks

Structure Sizes # palette rows: # of allocated regions + # of static data objects # of colors: # lock addr # of Owned Colors Array entries: max # of active locks held by a thread

Colorama Instruction Overheads Per-routine: Prologue & epilogue: 6 insn/routine 1 colorcheck insn per pointer argument Estimate 7 insn/routine On avg, 1.6 routines per 100 dynamic insns: so ~11% insns Entry and exit handlers: low freq of critical section enry and exit, so low overhead Coloring overheads ~ memory allocation calls # of insns between allocations: firefox, gaim, gftp – 2-4K Memory allocators can keep pools of colored memory (??)

Memory Overhead MMP: Mondrian Memory Protection Palette adds 1-2.5% more space over app footprint

Conclusions Colorama: Hardware Data-Centric Synchronization HW support for entry and exit points Evaluation suggests: Exit policy is suitable Low impact on critical section lengths Modest additional overhead over MMP This paper does not even do simulation!

Related Work monitors