Presentation is loading. Please wait.

Presentation is loading. Please wait.

Memory Models: A Case for Rethinking Parallel Languages and Hardware † Sarita Adve University of Illinois Acks: Mark Hill, Kourosh Gharachorloo,

Similar presentations


Presentation on theme: "Memory Models: A Case for Rethinking Parallel Languages and Hardware † Sarita Adve University of Illinois Acks: Mark Hill, Kourosh Gharachorloo,"— Presentation transcript:

1 Memory Models: A Case for Rethinking Parallel Languages and Hardware † Sarita Adve University of Illinois Acks: Mark Hill, Kourosh Gharachorloo, Jeremy Manson, Bill Pugh, Hans Boehm, Doug Lea, Herb Sutter, Vikram Adve, Rob Bocchino, Marc Snir PODC, SPAA keynote, August 2009 † Also a paper by S. Adve & H. Boehm,

2 Memory Consistency Models Parallelism for the masses! Shared-memory most common Memory model = Legal values for reads

3 Memory Consistency Models Parallelism for the masses! Shared-memory most common Memory model = Legal values for reads

4 Memory Consistency Models Parallelism for the masses! Shared-memory most common Memory model = Legal values for reads

5 Memory Consistency Models Parallelism for the masses! Shared-memory most common Memory model = Legal values for reads

6 Memory Consistency Models Parallelism for the masses! Shared-memory most common Memory model = Legal values for reads

7 20 Years of Memory Models Memory model is at the heart of concurrency semantics – 20 year journey from confusion to convergence at last! – Hard lessons learned – Implications for future Current way to specify concurrency semantics is too hard – Fundamentally broken Must rethink parallel languages and hardware Implications for broader CS disciplines

8 What is a Memory Model? Memory model defines what values a read can return Initially A=B=C=Flag=0 Thread 1 Thread 2 A = 26 while (Flag != 1) {;} B = 90 r1 = B … r2 = A Flag = 1 …

9 Memory Model is Key to Concurrency Semantics Interface between program and transformers of program – Defines what values a read can return C++ program Compiler Dynamic optimizer Hardware Weakest system component exposed to the programmer – Language level model has implications for hardware Interface must last beyond trends Assembly

10 Desirable Properties of a Memory Model 3 Ps – Programmability – Performance – Portability Challenge: hard to satisfy all 3 Ps – Late 1980’s - 90’s: Largely driven by hardware Lots of models, little consensus – 2000 onwards: Largely driven by languages/compilers Consensus model for Java, C++ (C, others ongoing) Had to deal with mismatches in hardware models Path to convergence has lessons for future

11 Programmability – SC [Lamport79] Programmability: Sequential consistency (SC) most intuitive – Operations of a single thread in program order – All operations in a total order or atomic But Performance? – Recent (complex) hardware techniques boost performance with SC – But compiler transformations still inhibited But Portability? – Almost all h/w, compilers violate SC today  SC not practical, but…

12 Next Best Thing – SC Almost Always Parallel programming too hard even with SC – Programmers (want to) write well structured code – Explicit synchronization, no data races Thread 1 Thread 2 Lock(L) Read Data1 Read Data2 Write Data2 Write Data1 … Unlock(L) Unlock(L) – SC for such programs much easier: can reorder data accesses  Data-race-free model [AdveHill90] – SC for data-race-free programs – No guarantees for programs with data races

13 Definition of a Data Race Distinguish between data and non-data (synchronization) accesses Only need to define for SC executions  total order Two memory accesses form a race if – From different threads, to same location, at least one is a write – Occur one after another Thread 1 Thread 2 Write, A, 26 Write, B, 90 Read, Flag, 0 Write, Flag, 1 Read, Flag, 1 Read, B, 90 Read, A, 26 A race with a data access is a data race Data-race-free-program = No data race in any SC execution

14 Data-Race-Free Model Data-race-free model = SC for data-race-free programs – Does not preclude races for wait-free constructs, etc. Requires races be explicitly identified as synchronization E.g., use volatile variables in Java, atomics in C++ – Dekker’s algorithm Initially Flag1 = Flag2 = 0 volatile Flag1, Flag2 Thread1 Thread2 Flag1 = 1 Flag2 = 1 if Flag2 == 0 if Flag1 == 0 //critical section //critical section SC prohibits both loads returning 0

15 Data-Race-Free Approach Programmer’s model: SC for data-race-free programs Programmability – Simplicity of SC, for data-race-free programs Performance – Specifies minimal constraints (for SC-centric view) Portability – Language must provide way to identify races – Hardware must provide way to preserve ordering on races – Compiler must translate correctly

16 1990's in Practice (The Memory Models Mess) Hardware – Implementation/performance-centric view – Different vendors had different models – most non-SC Alpha, Sun, x86, Itanium, IBM, AMD, HP, Cray, … – Various ordering guarantees + fences to impose other orders – Many ambiguities - due to complexity, by design(?), … High-level languages – Most shared-memory programming with Pthreads, OpenMP Incomplete, ambiguous model specs Memory model property of language, not library [Boehm05] – Java – commercially successful language with threads Chapter 17 of Java language spec on memory model But hard to interpret, badly broken LD ST LD Fence

17 2000 – 2004: Java Memory Model ~ 2000: Bill Pugh publicized fatal flaws in Java model Lobbied Sun to form expert group to revise Java model Open process via mailing list – Diverse participants – Took 5 years of intense, spirited debates – Many competing models – Final consensus model approved in 2005 for Java 5.0 [MansonPughAdve POPL 2005]

18 Java Memory Model Highlights Quick agreement that SC for data-race-free was required Missing piece: Semantics for programs with data races – Java cannot have undefined semantics for ANY program – Must ensure safety/security guarantees – Limit damage from data races in untrusted code Goal: Satisfy security/safety, w/ maximum system flexibility – Problem: “safety/security, limited damage” w/ threads very vague

19 Java Memory Model Highlights Initially X=Y=0 Thread 1 Thread 2 r1 = X r2 = Y Y = r1 X = r2 Is r1=r2=42 allowed? Data races produce causality loop! Definition of a causality loop was surprisingly hard Common compiler optimizations seem to violate“causality”

20 Java Memory Model Highlights Final model based on consensus, but complex – Programmers can (must) use “SC for data-race-free” – But system designers must deal with complexity – Correctness tools, racy programs, debuggers, …?? – Recent discovery of bugs [SevcikAspinall08]

21 :C++, Microsoft Prism, Multicore ~ 2005: Hans Boehm initiated C++ concurrency model – Prior status: no threads in C++, most concurrency w/ Pthreads Microsoft concurrently started its own internal effort C++ easier than Java because it is unsafe – Data-race-free is plausible model BUT multicore  New h/w optimizations, more scrutiny – Mismatched h/w, programming views became painfully obvious – Debate that SC for data-race-free inefficient w/ hardware models

22 C++ Challenges 2006: Pressure to change Java/C++ to remove SC baseline To accommodate some hardware vendors But what is alternative? – Must allow some hardware optimizations – But must be teachable to undergrads Showed such an alternative (probably) does not exist

23 C++ Compromise Default C++ model is data-race-free AMD, Intel, … on board But – Some systems need expensive fence for SC – Some programmers really want more flexibility C++ specifies low-level atomics only for experts Complicates spec, but only for experts We are not advertising this part – [BoehmAdve PLDI 2008]

24 Summary of Current Status Convergence to “SC for data-race-free” as baseline For programs with data races – Minimal but complex semantics for safe languages – No semantics for unsafe languages

25 Lessons Learned Specifying semantics for programs with data races is HARD – But “no semantics for data races” also has problems Not an option for safe languages Debugging, correctness checking tools Hardware-software mismatch for some code – “Simple” optimizations have unintended consequences  State-of-the-art is fundamentally broken

26 Lessons Learned Specifying semantics for programs with data races is HARD – But “no semantics for data races” also has problems Not an option for safe languages Debugging, correctness checking tools Hardware-software mismatch for some code – “Simple” optimizations have unintended consequences  State-of-the-art is fundamentally broken Banish shared-memory?

27 Lessons Learned Specifying semantics for programs with data races is HARD – But “no semantics for data races” also has problems Not an option for safe languages Debugging, correctness checking tools Hardware-software mismatch for some code – “Simple” optimizations have unintended consequences  State-of-the-art is fundamentally broken We need – Higher-level disciplined models that enforce discipline – Hardware co-designed with high-level models Banish wild shared-memory!

28 Research Agenda for Languages Disciplined shared-memory models – Simple – Enforceable – Expressive – Performance Key: What discipline? How to enforce it?

29 Data-Race-Free A near-term discipline: Data-race-free Enforcement – Ideally, language prohibits by design – e.g., ownership types [Boyapati+02] – Else, runtime catches as exception e.g., Goldilocks [Elmas+07] But work still needed for expressivity and/or performance But data-race-free still not sufficiently high level

30 Deterministic-by-Default Parallel Programming Even data-race-free parallel programs are too hard – Multiple interleavings due to unordered synchronization (or races) – Makes reasoning and testing hard But many algorithms are deterministic – Fixed input gives fixed output – Standard model for sequential programs – Also holds for many transformative parallel programs Parallelism not part of problem specification, only for performance Why write such an algorithm in non-deterministic style, then struggle to understand and control its behavior?

31 Deterministic-by-Default Model Parallel programs should be deterministic-by-default – Sequential semantics (easier than SC!) If non-determinism is needed – should be explicitly requested, encapsulated – should not interfere with guarantees for the rest of the program Enforcement: – Ideally, language prohibits by design – Else, runtime catches violations as exceptions

32 State-of-the-art Many deterministic languages today – Functional, pure data parallel, some domain-specific, … – Much recent work on runtime, library-based approaches – E.g., Allen09, Divietti09, Olszewski09, … Our work: Language approach for modern O-O methods – Deterministic Parallel Java (DPJ) [V. Adve et al.]

33 Deterministic Parallel Java (DPJ) Object-oriented type and effect system – Aliasing information : partition the heap into “regions” – Effect specifications : regions read or written by each method – Language guarantees determinism through type checking – Side benefit: regions, effects are valuable documentation Implemented as extension to base Java type system – Initial evaluation for expressivity, performance [Bocchino+09] – Semi-automatic tool for region annotations [Vakilian+09] – Recent work on encapsulating frameworks and unchecked code – Ongoing work on integrating non-determinism

34 Implications for Hardware Current hardware not matched even to current model Near term: ISA changes, speculation Long term: Co-design hardware with new software models – Use disciplined software to make more efficient hardware – Use hardware to support disciplined software

35 Illinois DeNovo Project How to design hardware from the ground up to – Exploit disciplined parallelism for better performance, power, … – Support disciplined parallelism for better dependability Working with DPJ to exploit region, effect information – Software-assisted coherence, communication, scheduling – New hardware/software interface – Opportune time as we determine how to scale multicore

36 Conclusions Current way to specify concurrency semantics fundamentally broken – Best we can do is SC for data-race-free But cannot hide from programs with data races – Mismatched hardware-software Simple optimizations give unintended consequences Need – High-level disciplined models that enforce discipline – Hardware co-designed with high-level models E.g., DPJ, DeNovo – Implications for many CS communities


Download ppt "Memory Models: A Case for Rethinking Parallel Languages and Hardware † Sarita Adve University of Illinois Acks: Mark Hill, Kourosh Gharachorloo,"

Similar presentations


Ads by Google