Aritra Sengupta Man Cao Michael D. Bond and Milind Kulkarni

Slides:



Advertisements
Similar presentations
Dataflow Analysis for Datarace-Free Programs (ESOP 11) Arnab De Joint work with Deepak DSouza and Rupesh Nasre Indian Institute of Science, Bangalore.
Advertisements

Copyright 2008 Sun Microsystems, Inc Better Expressiveness for HTM using Split Hardware Transactions Yossi Lev Brown University & Sun Microsystems Laboratories.
compilers and interpreters
Michael Bond Milind Kulkarni Man Cao Minjia Zhang Meisam Fathi Salmi Swarnendu Biswas Aritra Sengupta Jipeng Huang Ohio State Purdue.
The Case for a SC-preserving Compiler Madan Musuvathi Microsoft Research Dan Marino Todd Millstein UCLA University of Michigan Abhay Singh Satish Narayanasamy.
CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,
Scalable and Precise Dynamic Datarace Detection for Structured Parallelism Raghavan RamanJisheng ZhaoVivek Sarkar Rice University June 13, 2012 Martin.
5.1 Silberschatz, Galvin and Gagne ©2009 Operating System Concepts with Java – 8 th Edition Chapter 5: CPU Scheduling.
DRF x A Simple and Efficient Memory Model for Concurrent Programming Languages Dan Marino Abhay Singh Todd Millstein Madan Musuvathi Satish Narayanasamy.
Formalisms and Verification for Transactional Memories Vasu Singh EPFL Switzerland.
Lock vs. Lock-Free memory Fahad Alduraibi, Aws Ahmad, and Eman Elrifaei.
Unbounded Transactional Memory Paper by Ananian et al. of MIT CSAIL Presented by Daniel.
P ARALLEL P ROCESSING I NSTITUTE · F UDAN U NIVERSITY 1.
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
Accelerating Precise Race Detection Using Commercially-Available Hardware Transactional Memory Support Serdar Tasiran Koc University, Istanbul, Turkey.
Fast Conservative Garbage Collection Rifat Shahriyar Stephen M. Blackburn Australian National University Kathryn S. M cKinley Microsoft Research.
 Chapter 13 – Dependability Engineering 1 Chapter 12 Dependability and Security Specification 1.
A Qualitative Survey of Modern Software Transactional Memory Systems Virendra J. Marathe Michael L. Scott.
1 Evaluating the Impact of Thread Escape Analysis on Memory Consistency Optimizations Chi-Leung Wong, Zehra Sura, Xing Fang, Kyungwoo Lee, Samuel P. Midkiff,
Foundations of the C++ Concurrency Memory Model Hans-J. Boehm Sarita V. Adve HP Laboratories UIUC.
DoubleChecker: Efficient Sound and Precise Atomicity Checking Swarnendu Biswas, Jipeng Huang, Aritra Sengupta, and Michael D. Bond The Ohio State University.
Colorama: Architectural Support for Data-Centric Synchronization Luis Ceze, Pablo Montesinos, Christoph von Praun, and Josep Torrellas, HPCA 2007 Shimin.
How’s the Parallel Computing Revolution Going? 1How’s the Parallel Revolution Going?McKinley Kathryn S. McKinley The University of Texas at Austin.
Aritra Sengupta, Swarnendu Biswas, Minjia Zhang, Michael D. Bond and Milind Kulkarni ASPLOS 2015, ISTANBUL, TURKEY Hybrid Static-Dynamic Analysis for Statically.
Efficient Deterministic Replay of Multithreaded Executions in a Managed Language Virtual Machine Michael Bond Milind Kulkarni Man Cao Meisam Fathi Salmi.
Low-Overhead Software Transactional Memory with Progress Guarantees and Strong Semantics Minjia Zhang, 1 Jipeng Huang, Man Cao, Michael D. Bond.
Semantics In Text: Chapter 3.
Drinking from Both Glasses: Adaptively Combining Pessimistic and Optimistic Synchronization for Efficient Parallel Runtime Support Man Cao Minjia Zhang.
Wait-Free Multi-Word Compare- And-Swap using Greedy Helping and Grabbing Håkan Sundell PDPTA 2009.
…and region serializability for all JESSICA OUYANG, PETER CHEN, JASON FLINN & SATISH NARAYANASAMY UNIVERSITY OF MICHIGAN.
A Methodology for Implementing Highly Concurrent Data Objects by Maurice Herlihy Slides by Vincent Rayappa.
Detecting Atomicity Violations via Access Interleaving Invariants
Vertical Profiling : Understanding the Behavior of Object-Oriented Applications Sookmyung Women’s Univ. PsLab Sewon,Moon.
Aritra Sengupta, Man Cao, Michael D. Bond and Milind Kulkarni PPPJ 2015, Melbourne, Florida, USA Toward Efficient Strong Memory Model Support for the Java.
Tracking Bad Apples: Reporting the Origin of Null & Undefined Value Errors Michael D. Bond UT Austin Nicholas Nethercote National ICT Australia Stephen.
GC Assertions: Using the Garbage Collector To Check Heap Properties Samuel Z. Guyer Tufts University Edward Aftandilian Tufts University.
ECE 1747: Parallel Programming Short Introduction to Transactions and Transactional Memory (a.k.a. Speculative Synchronization)
Free Transactions with Rio Vista Landon Cox April 15, 2016.
Prescient Memory: Exposing Weak Memory Model Behavior by Looking into the Future MAN CAO JAKE ROEMER ARITRA SENGUPTA MICHAEL D. BOND 1.
Lecture 20: Consistency Models, TM
Maurice Herlihy and J. Eliot B. Moss,  ISCA '93
Lightweight Data Race Detection for Production Runs
An Operational Approach to Relaxed Memory Models
Free Transactions with Rio Vista
Support for Program Analysis as a First-Class Design Constraint in Legion Michael Bauer 02/22/17.
Algorithmic Improvements for Fast Concurrent Cuckoo Hashing
15-740/ Computer Architecture Lecture 3: Performance
Computer Architecture
Minh, Trautmann, Chung, McDonald, Bronson, Casper, Kozyrakis, Olukotun
PHyTM: Persistent Hybrid Transactional Memory
Effective Data-Race Detection for the Kernel
idempotent (ī-dəm-pō-tənt) adj
Anders Gidenstam Håkan Sundell Philippas Tsigas
Threads and Memory Models Hal Perkins Autumn 2011
Man Cao Minjia Zhang Aritra Sengupta Michael D. Bond
Enforcing Isolation and Ordering in STM Systems
Jipeng Huang, Michael D. Bond Ohio State University
Inlining and Devirtualization Hal Perkins Autumn 2011
Free Transactions with Rio Vista
Threads and Memory Models Hal Perkins Autumn 2009
Coding Concepts (Basics)
Shared Memory Programming
Lecture 22: Consistency Models, TM
Language-based Security
Ahmed Bouajjani Constantin Enea Michael Emmi Serdar Tasiran
Memory Consistency Models
Xinyu Feng University of Science and Technology of China
Relaxed Consistency Part 2
Compilers, Languages, and Memory Models
Rethinking Support for Region Conflict Exceptions
Presentation transcript:

Aritra Sengupta Man Cao Michael D. Bond and Milind Kulkarni Legato: Bounded Region Serializability Using Commodity Hardware Transactional Memory Aritra Sengupta Man Cao Michael D. Bond and Milind Kulkarni

Programming Language Semantics? Data Races C++ no guarantee of semantics – “catch-fire” semantics Java provides weak semantics Posing a huge problem for programmers

Weak Semantics Data data = null; boolean done= false; data = new Data(); done = true; if (done) data.foo(); This is a Java program.

Null Pointer Exception Weak Semantics Data data = null; boolean done= false; T1 T2 data = new Data(); done = true; if (done) data.foo(); This is a Java program. Null Pointer Exception

Null Pointer Exception Weak Semantics Data data = null; boolean done= false; T1 T2 data = new Data(); done = true; if (done) data.foo(); Null Pointer Exception This is a Java program. No data dependence: Reordering effect

Need for Stronger Memory Models “The inability to define reasonable semantics for programs with data races is not just a theoretical shortcoming, but a fundamental hole in the foundation of our languages and systems…” Give better semantics to programs with data races Stronger memory models – Adve and Boehm, CACM, 2010 Main problem with strong memory models have high overhead

End-to-End Memory Models: Run-time cost vs Strength (Unbounded) Synchronization-free Region Serializability Run-time cost Bounded Region Serializability SC DRF0 (C++/Java) Strength

End-to-End Memory Models: Run-time cost vs Strength (Unbounded) Synchronization-free Region Serializability Run-time cost Dynamically Bounded Region Serializability via EnfoRSer1 SC DRF0 (C++/Java) Strength 1. Sengupta et al. Hybrid Static-Dynamic Analysis for Statically Bounded Region Serializability. ASPLOS, 2015.

Dynamically Bounded Region Serializability (DBRS) acq(lock) methodCall() Regions are bounded statically and dynamically rel(lock)

Dynamically Bounded Region Serializability Eliminates SC violation Eliminates some atomicity violations SC+Atomicity of bounded regions1 DRF0 SC DBRS hsqldb6 Infinite loop Correct sunflow9 Null pointer exception jbb2000 Corrupt output sor lufact moldyn raytracer Fails validation 1. Sengupta et al. Hybrid Static-Dynamic Analysis for Statically Bounded Region Serializability. ASPLOS, 2015.

Benchmarks: DaCapo 2006, 2009 and SPECjbb benchmarks Intel Xeon with TSX. A 14-core processor 1. Sengupta et al. Hybrid Static-Dynamic Analysis for Statically Bounded Region Serializability. ASPLOS, 2015.

End-to-End Memory Models: Run-time cost vs Strength (Unbounded) Synchronization-free Region Serializability Run-time cost Dynamically Bounded Region Serializability via EnfoRSer SC This work: HTM-based dynamically bounded region serializability DRF0 (C++/Java) Strength

Outline Challenges Naïve implementation with hardware transactional memory Limitations of using HTM for DBRS Approach Overcoming limitations Our approach to DBRS enforcement: Legato Evaluation Main problem with strong memory models have high overhead

Enforcing DBRS with Commodity Hardware Transactional Memory (HTM) Transaction start Transaction End

Enforcing DBRS with Commodity Hardware Transactional Memory (HTM) Transaction Start Transaction End Transaction Start Transaction End

Enforcing DBRS with Commodity Hardware Transactional Memory (HTM) Transaction Start Transaction End Transaction Start Transaction End

High cost of TSX instructions Start/End ≈ three atomic operations!2 2. Yoo et al. Performance Evaluation of Intel TSX for High-Performance Computing, SC 2013

Legato: Key Idea Merge several regions into a single transaction: amortize the cost of starting and stopping a transaction

Legato: Key Idea Transaction Start Transaction End Transaction Start

Legato: Key Idea Transaction Start Transaction Merge

Legato: Key Idea Transaction Start Transaction Merge Transaction Merge

Legato: Key Idea Transaction End Transaction Start Transaction Merge

Challenges Conflicts abort transactions: wasted work Capacity aborts: larger transactions have larger footprint, unknown a priori HTM-unfriendly operations: hardware interrupts, page faults etc.

Per-transaction cost vs Abort cost Transaction Abort Transaction Start Transaction Merge Transaction Merge

Per-transaction cost vs Abort cost Transient cause: conflicts, hardware interrupts => Retry transaction Capacity abort: end before culprit region Start new transaction Transaction Abort Transaction Start Transaction Merge Transaction Merge

Per-transaction cost vs Abort cost Transaction Abort Transaction Start Transaction Merge Transaction Merge

Per-transaction cost vs Abort cost Transient cause: conflicts, hardware interrupts => Retry transaction Capacity abort: end before culprit region Start new transaction Transaction End Transaction Start Transaction Merge Transaction Merge

Per-transaction cost vs Abort cost Transient cause: conflicts, hardware interrupts => Retry transaction Capacity abort: end before culprit region Start new transaction Transaction End Transaction Start

Per-transaction cost vs Abort cost Transient cause: conflicts, hardware interrupts => Retry transaction Capacity abort: end before culprit region Start new transaction Unknown abort location Transaction Abort

Per-transaction cost vs Abort cost Transient cause: conflicts, hardware interrupts => Retry transaction Capacity abort: end before culprit region Start new transaction Transaction End

Per-transaction cost vs Abort cost Transient cause: conflicts, hardware interrupts => Retry transaction Capacity abort: end before culprit region Start new transaction Future transaction behavior unknown Minimal Learning Transaction End

Legato: Our Approach Decide on a merge target: use history of previous transaction 1. Temporary target changes rapidly: capture transient effects 2. Setpoint or “steady state” target changes slowly: capture program phases

Merging Algorithm Curr Target: 8 Setpoint: 8

Merging Algorithm: Initial Phase Curr Target: 8 Setpoint: 8 Transaction Start

Merging Algorithm: Initial Phase Curr Target: 8 Setpoint: 8 Transaction Merge

Merging Algorithm: Initial Phase Curr Target: 8 Setpoint: 8 Transaction Merge

Merging Algorithm: Recover from Transient Error Curr Target: 8/2 = 4 Setpoint: 8 Rapid action:

Merging Algorithm: Recover from Transient Error Curr Target: 4 Setpoint: 8 Transaction Start

Merging Algorithm: Recover from Transient Error Curr Target: 4 Setpoint: 8 Transaction Merge

Merging Algorithm: Recover from Transient Error Curr Target: 4/2 = 2 Setpoint: 8

Merging Algorithm: Recover from Transient Error Curr Target: 2 Setpoint: 8 Transaction Start

Merging Algorithm: Recover from Transient Error Curr Target: 2 Setpoint: 8 No Action: Skeptical Transaction End

Merging Algorithm: Build up to Setpoint Curr Target: 2*2 = 4 Setpoint: 8 Transaction End

Merging Algorithm: Build up to Setpoint Curr Target: 4*2 = 8 Setpoint: 8 Aggressive Transaction End

Merging Algorithm: Change of Program Phase Curr Target: 9 Setpoint: 8 + 1 = 9 Transaction End

Merging Algorithm: Change of Program Phase Curr Target: 9 Setpoint: 8 + 1 = 9 Transaction End Enter a low-conflict program phase => more aggressive merging

Merging Algorithm: Change of Program Phase Curr Target: 10 Setpoint: 9 + 1 = 10 Transaction End

Implementation and Evaluation

Implementation Developed in Jikes RVM 3.1.3 Code publicly available in Jikes RVM Research Archive

Run-time Performance

Run-time Performance

Run-time Performance

Run-time Performance

Run-time Performance More stable overhead

JIT Compilation Time

JIT Compilation Time

JIT Compilation Time 1.45

JIT Compilation Time 1.45 Complex compiler transformations Simple instrumentation

Related Work Checks conflicts in bounded region DRFx, Marino et al., PLDI 2010 Checks conflicts in synchronization-free regions Conflict Exceptions, Lucia et al., ISCA 2010 Enforces atomicity of bounded regions BulkCompiler, Ahn et al., MICRO 2009 Atom-Aid, Lucia et al., ISCA 2008 Reducing dynamic misspeculations BlockChop, Mars and Kumar, ISCA 2012

Memory Models: Run-time cost vs Strength (Unbounded) Synchronization-free Region Serializability Run-time cost Dynamically Bounded Region Serializability via EnfoRSer1 SC Dynamically Bounded Region Serializability via Legato DRF0 (C++/Java) Strength 1. Sengupta et al. Hybrid Static-Dynamic Analysis for Statically Bounded Region Serializability. ASPLOS, 2015.