Presentation is loading. Please wait.

Presentation is loading. Please wait.

DRF x A Simple and Efficient Memory Model for Concurrent Programming Languages Dan Marino Abhay Singh Todd Millstein Madan Musuvathi Satish Narayanasamy.

Similar presentations


Presentation on theme: "DRF x A Simple and Efficient Memory Model for Concurrent Programming Languages Dan Marino Abhay Singh Todd Millstein Madan Musuvathi Satish Narayanasamy."— Presentation transcript:

1 DRF x A Simple and Efficient Memory Model for Concurrent Programming Languages Dan Marino Abhay Singh Todd Millstein Madan Musuvathi Satish Narayanasamy UC Los Angeles University of Michigan UC Los Angeles MSR, Redmond University of Michigan

2 S TATE OF THE A RT : SC FOR D ATA R ACE F REE M EMORY M ODELS sequential consistency [Lamport 79] intuitive for programmers limits compiler and hardware optimizations DRF0 [Adve&Hill 90] models balance performance and ease of programming SC behavior guaranteed for race-free programs most optimizations allowed e.g. Java and C++0x memory models [Manson et al. 2005] [Boehm et al. 2008] 2

3 B: init = true; C: if(init) D: x->f++; P ROGRAM B EHAVIOR UNDER DRF0 // Thread t // Thread u A: x = new X(); C: if(init) B: init = true; D: x->f++; X* x = null; bool init = false; A: x = new X(); B doesn’t depend on A. It might be faster to reorder them! B doesn’t depend on A. It might be faster to reorder them! Optimizing Compiler and Hardware Null Pointer! 3 atomic

4 D EFICIENCIES OF DRF0 4 weak or no semantics for racy programs unintentional data races easy to introduce problematic for DEBUGGABILITY programmer must assume non- SC behavior for all programs SAFETY [Boehm et al., PLDI 2008] optimization + data race = jump to arbitrary code! COMPILER CORRECTNESS Java must maintain safety at the cost of complexity [Ševčík&Aspinall, ECOOP 2008]

5 O UR S OLUTION : T HE DRF x M EMORY M ODEL 5 data race Programming Error Memory Model Exception Memory Model Exception Fatal Runtime Error DEBUGGABILITY SC for all executions SAFETY halt program before non-SC behavior exhibited COMPILER CORRECTNESS most sequentially-valid optimization permitted

6 DRF x A LLOWS R ELAXED D ATA R ACE D ETECTION 6 MM Exception MM Exception SC Behavior SOURCE PROGRAMOBSERVED BEHAVIOR data race free has data races precise runtime data race detection is slow in software and complex in hardware [Flanagan & Freund 2009] [Prvulovic & Torrelas 2003] simplify detection

7 // Thread t // Thread u A: x = new X(); C: if(init) B: init = true; D: x->f++; X* x = null; bool init = false; runtime must detect conflicting accesses in regions that execute concurrently. D ETECTING AN SC V IOLATION A: x = new X(); B: init = true; C: if(init) D: x->f++; region fence Insight: compiler can communicate to runtime the regions in which reordering may have occurred Insight: compiler can communicate to runtime the regions in which reordering may have occurred Races need not be reported between regions that do not execute concurrently! region serializable for compiled ⇒ SC for source MM Exception data race, but no SC violation region fence

8 DRF x C OMPILER AND R UNTIME R EQUIREMENTS DRF x Compiler communicate regions in which optimizations were made by using fence instructions synchronization in their own region no speculative memory accesses DRF x Execution Environment trap on conflicting accesses in concurrent regions global order on region fences memory order consistent with fence order 8

9 F ORMALIZATION compiler requirements how program is split into regions permitted optimizations all non-speculative, sequentially valid optimizations execution environment requirements when conflict may/must be reported memory orderings allowed w.r.t. fences prove no MM exception ⇒ SC behavior for source program MM exception ⇒ data race in source program 9

10 E FFICIENT & S IMPLE C ONFLICT D ETECTION perform detection in hardware like transactional memory hardware – but simpler no rollback we control region boundaries compiler bounds number of memory locations dynamically accessed in a region limits optimization opportunities distinguish “bounding” region fence hardware can merge regions separated by a bounding fence when resources available 10

11 C OMPILER I MPLEMENTATION built conservative DRF x -compliant compiler LLVM [Lattner & Adve 2004] naïve bounding analysis bounding fence at all loop back edges disable speculative optimizations measured performance PARSEC benchmark suite stock x86 hardware – no architectural simulator 11

12 DRF x O VERHEAD ON P ARSEC B ENCHMARKS 12 slowdown over unmodified, fully optimizing LLVM

13 R ELATED W ORK memory models e.g. [Lamport 1979], [Dubois et al. 1986], [Adve & Hill 1990] hardware race detection [Adve et al.1991], [Muzahid et al. 2009], [Prvulovic & Torrelas 2003] software race detection e.g. [Yu et al ],[Flanagan & Freund 2009],[Elmas et al. 2007] detecting SC violations [Gharachorloo&Gibbons, SPAA 1991] conflict exception [Lucia et al., ISCA 2010] stronger guarantee : serializability of sync-free regions requires unbounded detection scheme focused on hardware 13

14 DRF x C ONCLUSION 14 lightweight form of data race detection MM Exception EASY - TO - UNDERSTAND programmer gets understandable behavior for all programs compiler may perform most sequentially valid optimizations within regions EFFICIENT straightforward hardware support compiler restrictions ⇒ only 0% - 7% slowdown regions

15 C ONTRIBUTIONS defined DRF x in terms of end-to-end programmer guarantees established sufficient requirements on compiler and execution environment proved that requirements establish guarantees implemented a DRF x compliant compiler and measured its performance designed hardware that meets DRF x execution environment requirements 15

16 Soundness Safety DRF DRF x G UARANTEES DRF x provides a more intuitive memory model and introduces a dynamic Memory Model (MM) exception DRF x provides a more intuitive memory model and introduces a dynamic Memory Model (MM) exception P is data race free ⇒ all executions of P are SC and MM-exception-free SC is violated during an execution of P ⇒ an MM exception will eventually be thrown 16 an execution of P makes a system call ⇒ the system call is reachable in an SC execution

17 DRF x ALLOWS A RANGE OF IMPLEMENTATIONS 17 DRFSC DRF = data race freeSC = sequentially consistent All Executions must raise MM exception may or may not raise MM exception must not raise MM exception precise runtime data race detection is slow in software and complex in hardware [Flanagan & Freund 2009] [Prvulovic & Torrelas 2003]

18 D ETECTING AN SC V IOLATION A: x = new X(); B: init = true; C: if(init) D: x->f++; A issued B issued A retired B retired if hardware performs optimization, we can notice SC violation by detecting races on in-flight instructions Gharachorloo and Gibbons [SPAA 1991] if hardware performs optimization, we can notice SC violation by detecting races on in-flight instructions Gharachorloo and Gibbons [SPAA 1991] if compiler performs optimization, this strategy is insufficient if compiler performs optimization, this strategy is insufficient Insight: compiler can communicate to hardware regions in which reordering may have occurred Insight: compiler can communicate to hardware regions in which reordering may have occurred Races need not be reported between regions that do not execute concurrently! Concurrent Region Conflict data race, but no SC violation

19 R EGIONS F ACILITATE DRF x I MPLEMENTATION 19 DRFSC DRF = data race freeSC = sequentially consistent All Executions RCFRS RCF = region conflict freeRS = region serializable syncs in own region and no compiler speculation syncs in own region and no compiler speculation we can order regions consistent with memory ordering we can order regions consistent with memory ordering compiler optimizes only within regions compiler optimizes only within regions

20 Soundness Safety DRF E STABLISHING THE DRF x G UARANTEES Compiler and hardware cooperate to establish the DRF x guarantees Compiler and hardware cooperate to establish the DRF x guarantees P race free + syncs in own region ⇒ conflicting regions cannot execute concurrently ⇒ MM exception is not throw on any execution MM exception not thrown ⇒ no concurrent region conflict ⇒ region-serializable execution of compiled program ⇒ SC execution of source program Soundness + timely exceptions + system calls in own region ⇒ system call reachable through SC execution 20

21 DRF x H ARDWARE D ESIGN 21 Processor AddressWrite?Access Size add entry upon issuing memory access check for conflicts with active regions broadcast access set upon region completion bounded regions allow fixed size buffer without overflow soft fences allow processor reordering, multiple “in-flight” regions

22 DRF x O VERHEAD ON P ARSEC B ENCHMARKS 22


Download ppt "DRF x A Simple and Efficient Memory Model for Concurrent Programming Languages Dan Marino Abhay Singh Todd Millstein Madan Musuvathi Satish Narayanasamy."

Similar presentations


Ads by Google