DRF x A Simple and Efficient Memory Model for Concurrent Programming Languages Dan Marino Abhay Singh Todd Millstein Madan Musuvathi Satish Narayanasamy.

Slides:



Advertisements
Similar presentations
Dataflow Analysis for Datarace-Free Programs (ESOP 11) Arnab De Joint work with Deepak DSouza and Rupesh Nasre Indian Institute of Science, Bangalore.
Advertisements

Automated Theorem Proving Lecture 1. Program verification is undecidable! Given program P and specification S, does P satisfy S?
Memory Consistency Models Kevin Boos. Two Papers Shared Memory Consistency Models: A Tutorial – Sarita V. Adve & Kourosh Gharachorloo – September 1995.
The Case for a SC-preserving Compiler Madan Musuvathi Microsoft Research Dan Marino Todd Millstein UCLA University of Michigan Abhay Singh Satish Narayanasamy.
Memory Models (1) Xinyu Feng University of Science and Technology of China.
CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,
“FENDER” AUTOMATIC MEMORY FENCE INFERENCE Presented by Michael Kuperstein, Technion Joint work with Martin Vechev and Eran Yahav, IBM Research 1.
Memory Models: A Case for Rethinking Parallel Languages and Hardware † Sarita Adve University of Illinois Acks: Mark Hill, Kourosh Gharachorloo,
Enforcing Sequential Consistency in SPMD Programs with Arrays Wei Chen Arvind Krishnamurthy Katherine Yelick.
Thread-Level Transactional Memory Decoupling Interface and Implementation UW Computer Architecture Affiliates Conference Kevin Moore October 21, 2004.
Memory Consistency in Vector IRAM David Martin. Consistency model applies to instructions in a single instruction stream (different than multi-processor.
ADVERSARIAL MEMORY FOR DETECTING DESTRUCTIVE RACES Cormac Flanagan & Stephen Freund UC Santa Cruz Williams College PLDI 2010 Slides by Michelle Goodstein.
“THREADS CANNOT BE IMPLEMENTED AS A LIBRARY” HANS-J. BOEHM, HP LABS Presented by Seema Saijpaul CS-510.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
By Sarita Adve & Kourosh Gharachorloo Review by Jim Larson Shared Memory Consistency Models: A Tutorial.
Formalisms and Verification for Transactional Memories Vasu Singh EPFL Switzerland.
1 Lecture 23: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
1 Lecture 7: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.
Lecture 13: Consistency Models
Multiscalar processors
Cormac Flanagan UC Santa Cruz Velodrome: A Sound and Complete Dynamic Atomicity Checker for Multithreaded Programs Jaeheon Yi UC Santa Cruz Stephen Freund.
Shared Memory Consistency Models: A Tutorial By Sarita V Adve and Kourosh Gharachorloo Presenter: Sunita Marathe.
RCDC SLIDES README Font Issues – To ensure that the RCDC logo appears correctly on all computers, it is represented with images in this presentation. This.
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
A Behavioral Memory Model for the UPC Language Kathy Yelick Joint work with: Dan Bonachea, Jason Duell, Chuck Wallace.
Evaluation of Memory Consistency Models in Titanium.
15-740/ Oct. 17, 2012 Stefan Muller.  Problem: Software is buggy!  More specific problem: Want to make sure software doesn’t have bad property.
Accelerating Precise Race Detection Using Commercially-Available Hardware Transactional Memory Support Serdar Tasiran Koc University, Istanbul, Turkey.
Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995.
Foundations of the C++ Concurrency Memory Model Hans-J. Boehm Sarita V. Adve HP Laboratories UIUC.
Aritra Sengupta, Swarnendu Biswas, Minjia Zhang, Michael D. Bond and Milind Kulkarni ASPLOS 2015, ISTANBUL, TURKEY Hybrid Static-Dynamic Analysis for Statically.
By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial.
Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.
Precise Dynamic Data-Race Detection At The Right Abstraction Level Dan Grossman University of Washington Facebook Faculty Summit August 6, 2013.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
Are We Trading Consistency Too Easily? A Case for Sequential Consistency Madan Musuvathi Microsoft Research Dan Marino Todd Millstein UCLAUniversity of.
Data races, informally [More formal definition to follow] “race condition” means two different things Data race: Two threads read/write, write/read, or.
CS 295 – Memory Models Harry Xu Oct 1, Multi-core Architecture Core-local L1 cache L2 cache shared by cores in a processor All processors share.
…and region serializability for all JESSICA OUYANG, PETER CHEN, JASON FLINN & SATISH NARAYANASAMY UNIVERSITY OF MICHIGAN.
Memory Consistency Zhonghai Lu Outline Introduction What is a memory consistency model? Who should care? Memory consistency models Strict.
ICFEM 2002, Shanghai Reasoning about Hardware and Software Memory Models Abhik Roychoudhury School of Computing National University of Singapore.
A Safety-First Approach to Memory Models Madan Musuvathi Microsoft Research ISMM ‘13 Keynote 1.
CoreDet: A Compiler and Runtime System for Deterministic Multithreaded Execution Tom Bergan Owen Anderson, Joe Devietti, Luis Ceze, Dan Grossman To appear.
AtomCaml: First-class Atomicity via Rollback Michael F. Ringenburg and Dan Grossman University of Washington International Conference on Functional Programming.
Aritra Sengupta, Man Cao, Michael D. Bond and Milind Kulkarni PPPJ 2015, Melbourne, Florida, USA Toward Efficient Strong Memory Model Support for the Java.
740: Computer Architecture Memory Consistency Prof. Onur Mutlu Carnegie Mellon University.
Prescient Memory: Exposing Weak Memory Model Behavior by Looking into the Future MAN CAO JAKE ROEMER ARITRA SENGUPTA MICHAEL D. BOND 1.
Lecture 20: Consistency Models, TM
An Operational Approach to Relaxed Memory Models
Aritra Sengupta Man Cao Michael D. Bond and Milind Kulkarni
Memory Consistency Models
Threads Cannot Be Implemented As a Library
Lecture 11: Consistency Models
Memory Consistency Models
Amir Kamil and Katherine Yelick
Persistency for Synchronization-Free Regions
Threads and Memory Models Hal Perkins Autumn 2011
Shared Memory Consistency Models: A Tutorial
Threads and Memory Models Hal Perkins Autumn 2009
Lecture 22: Consistency Models, TM
Shared Memory Consistency Models: A Tutorial
Store Atomicity What does atomicity really require?
Memory Consistency Models
Amir Kamil and Katherine Yelick
Xinyu Feng University of Science and Technology of China
Relaxed Consistency Part 2
Compilers, Languages, and Memory Models
Lecture: Consistency Models, TM
Rethinking Support for Region Conflict Exceptions
Presentation transcript:

DRF x A Simple and Efficient Memory Model for Concurrent Programming Languages Dan Marino Abhay Singh Todd Millstein Madan Musuvathi Satish Narayanasamy UC Los Angeles University of Michigan UC Los Angeles MSR, Redmond University of Michigan

S TATE OF THE A RT : SC FOR D ATA R ACE F REE M EMORY M ODELS sequential consistency [Lamport 79] intuitive for programmers limits compiler and hardware optimizations DRF0 [Adve&Hill 90] models balance performance and ease of programming SC behavior guaranteed for race-free programs most optimizations allowed e.g. Java and C++0x memory models [Manson et al. 2005] [Boehm et al. 2008] 2

B: init = true; C: if(init) D: x->f++; P ROGRAM B EHAVIOR UNDER DRF0 // Thread t // Thread u A: x = new X(); C: if(init) B: init = true; D: x->f++; X* x = null; bool init = false; A: x = new X(); B doesn’t depend on A. It might be faster to reorder them! B doesn’t depend on A. It might be faster to reorder them! Optimizing Compiler and Hardware Null Pointer! 3 atomic

D EFICIENCIES OF DRF0 4 weak or no semantics for racy programs unintentional data races easy to introduce problematic for DEBUGGABILITY programmer must assume non- SC behavior for all programs SAFETY [Boehm et al., PLDI 2008] optimization + data race = jump to arbitrary code! COMPILER CORRECTNESS Java must maintain safety at the cost of complexity [Ševčík&Aspinall, ECOOP 2008]

O UR S OLUTION : T HE DRF x M EMORY M ODEL 5 data race Programming Error Memory Model Exception Memory Model Exception Fatal Runtime Error DEBUGGABILITY SC for all executions SAFETY halt program before non-SC behavior exhibited COMPILER CORRECTNESS most sequentially-valid optimization permitted

DRF x A LLOWS R ELAXED D ATA R ACE D ETECTION 6 MM Exception MM Exception SC Behavior SOURCE PROGRAMOBSERVED BEHAVIOR data race free has data races precise runtime data race detection is slow in software and complex in hardware [Flanagan & Freund 2009] [Prvulovic & Torrelas 2003] simplify detection

// Thread t // Thread u A: x = new X(); C: if(init) B: init = true; D: x->f++; X* x = null; bool init = false; runtime must detect conflicting accesses in regions that execute concurrently. D ETECTING AN SC V IOLATION A: x = new X(); B: init = true; C: if(init) D: x->f++; region fence Insight: compiler can communicate to runtime the regions in which reordering may have occurred Insight: compiler can communicate to runtime the regions in which reordering may have occurred Races need not be reported between regions that do not execute concurrently! region serializable for compiled ⇒ SC for source MM Exception data race, but no SC violation region fence

DRF x C OMPILER AND R UNTIME R EQUIREMENTS DRF x Compiler communicate regions in which optimizations were made by using fence instructions synchronization in their own region no speculative memory accesses DRF x Execution Environment trap on conflicting accesses in concurrent regions global order on region fences memory order consistent with fence order 8

F ORMALIZATION compiler requirements how program is split into regions permitted optimizations all non-speculative, sequentially valid optimizations execution environment requirements when conflict may/must be reported memory orderings allowed w.r.t. fences prove no MM exception ⇒ SC behavior for source program MM exception ⇒ data race in source program 9

E FFICIENT & S IMPLE C ONFLICT D ETECTION perform detection in hardware like transactional memory hardware – but simpler no rollback we control region boundaries compiler bounds number of memory locations dynamically accessed in a region limits optimization opportunities distinguish “bounding” region fence hardware can merge regions separated by a bounding fence when resources available 10

C OMPILER I MPLEMENTATION built conservative DRF x -compliant compiler LLVM [Lattner & Adve 2004] naïve bounding analysis bounding fence at all loop back edges disable speculative optimizations measured performance PARSEC benchmark suite stock x86 hardware – no architectural simulator 11

DRF x O VERHEAD ON P ARSEC B ENCHMARKS 12 slowdown over unmodified, fully optimizing LLVM

R ELATED W ORK memory models e.g. [Lamport 1979], [Dubois et al. 1986], [Adve & Hill 1990] hardware race detection [Adve et al.1991], [Muzahid et al. 2009], [Prvulovic & Torrelas 2003] software race detection e.g. [Yu et al ],[Flanagan & Freund 2009],[Elmas et al. 2007] detecting SC violations [Gharachorloo&Gibbons, SPAA 1991] conflict exception [Lucia et al., ISCA 2010] stronger guarantee : serializability of sync-free regions requires unbounded detection scheme focused on hardware 13

DRF x C ONCLUSION 14 lightweight form of data race detection MM Exception EASY - TO - UNDERSTAND programmer gets understandable behavior for all programs compiler may perform most sequentially valid optimizations within regions EFFICIENT straightforward hardware support compiler restrictions ⇒ only 0% - 7% slowdown regions

C ONTRIBUTIONS defined DRF x in terms of end-to-end programmer guarantees established sufficient requirements on compiler and execution environment proved that requirements establish guarantees implemented a DRF x compliant compiler and measured its performance designed hardware that meets DRF x execution environment requirements 15

Soundness Safety DRF DRF x G UARANTEES DRF x provides a more intuitive memory model and introduces a dynamic Memory Model (MM) exception DRF x provides a more intuitive memory model and introduces a dynamic Memory Model (MM) exception P is data race free ⇒ all executions of P are SC and MM-exception-free SC is violated during an execution of P ⇒ an MM exception will eventually be thrown 16 an execution of P makes a system call ⇒ the system call is reachable in an SC execution

DRF x ALLOWS A RANGE OF IMPLEMENTATIONS 17 DRFSC DRF = data race freeSC = sequentially consistent All Executions must raise MM exception may or may not raise MM exception must not raise MM exception precise runtime data race detection is slow in software and complex in hardware [Flanagan & Freund 2009] [Prvulovic & Torrelas 2003]

D ETECTING AN SC V IOLATION A: x = new X(); B: init = true; C: if(init) D: x->f++; A issued B issued A retired B retired if hardware performs optimization, we can notice SC violation by detecting races on in-flight instructions Gharachorloo and Gibbons [SPAA 1991] if hardware performs optimization, we can notice SC violation by detecting races on in-flight instructions Gharachorloo and Gibbons [SPAA 1991] if compiler performs optimization, this strategy is insufficient if compiler performs optimization, this strategy is insufficient Insight: compiler can communicate to hardware regions in which reordering may have occurred Insight: compiler can communicate to hardware regions in which reordering may have occurred Races need not be reported between regions that do not execute concurrently! Concurrent Region Conflict data race, but no SC violation

R EGIONS F ACILITATE DRF x I MPLEMENTATION 19 DRFSC DRF = data race freeSC = sequentially consistent All Executions RCFRS RCF = region conflict freeRS = region serializable syncs in own region and no compiler speculation syncs in own region and no compiler speculation we can order regions consistent with memory ordering we can order regions consistent with memory ordering compiler optimizes only within regions compiler optimizes only within regions

Soundness Safety DRF E STABLISHING THE DRF x G UARANTEES Compiler and hardware cooperate to establish the DRF x guarantees Compiler and hardware cooperate to establish the DRF x guarantees P race free + syncs in own region ⇒ conflicting regions cannot execute concurrently ⇒ MM exception is not throw on any execution MM exception not thrown ⇒ no concurrent region conflict ⇒ region-serializable execution of compiled program ⇒ SC execution of source program Soundness + timely exceptions + system calls in own region ⇒ system call reachable through SC execution 20

DRF x H ARDWARE D ESIGN 21 Processor AddressWrite?Access Size add entry upon issuing memory access check for conflicts with active regions broadcast access set upon region completion bounded regions allow fixed size buffer without overflow soft fences allow processor reordering, multiple “in-flight” regions

DRF x O VERHEAD ON P ARSEC B ENCHMARKS 22