Memory Models: The Case for Rethinking Parallel Languages and Hardware Sarita Adve University of Illinois at Urbana-Champaign

Slides:

Advertisements

Similar presentations

Memory Consistency Models Sarita Adve Department of Computer Science University of Illinois at Urbana-Champaign Ack: Previous tutorials.

Advertisements

Memory Consistency Models Kevin Boos. Two Papers Shared Memory Consistency Models: A Tutorial – Sarita V. Adve & Kourosh Gharachorloo – September 1995.

Multiprocessor Architectures for Speculative Multithreading Josep Torrellas, University of Illinois The Bulk Multicore Architecture for Programmability.

Memory Models (1) Xinyu Feng University of Science and Technology of China.

CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,

“FENDER” AUTOMATIC MEMORY FENCE INFERENCE Presented by Michael Kuperstein, Technion Joint work with Martin Vechev and Eran Yahav, IBM Research 1.

D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.

Memory Models: A Case for Rethinking Parallel Languages and Hardware † Sarita Adve University of Illinois Acks: Mark Hill, Kourosh Gharachorloo,

Rethinking Shared-Memory Languages and Hardware Sarita V. Adve University of Illinois Acks: M. Hill, K. Gharachorloo, H. Boehm, D. Lea,

CS492B Analysis of Concurrent Programs Consistency Jaehyuk Huh Computer Science, KAIST Part of slides are based on CS:App from CMU.

The Path to Multi-core Tools Paul Petersen. Multi-coreToolsThePathTo 2 Outline Motivation Where are we now What is easy to do next What is missing.

DeNovo † : Rethinking Hardware for Disciplined Parallelism Byn Choi, Rakesh Komuravelli, Hyojin Sung, Rob Bocchino, Sarita Adve, Vikram Adve Other collaborators:

Lock-free Cache-friendly Software Queue for Decoupled Software Pipelining Student: Chen Wen-Ren Advisor: Wuu Yang 學生 : 陳韋任指導教授 : 楊武 Abstract Multicore.

“THREADS CANNOT BE IMPLEMENTED AS A LIBRARY” HANS-J. BOEHM, HP LABS Presented by Seema Saijpaul CS-510.

1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.

By Sarita Adve & Kourosh Gharachorloo Review by Jim Larson Shared Memory Consistency Models: A Tutorial.

University of Kansas Construction & Integration of Distributed Systems Jerry James Oct. 30, 2000.

Computer Architecture II 1 Computer architecture II Lecture 9.

Chapter 2: Impact of Machine Architectures What is the Relationship Between Programs, Programming Languages, and Computers.

1 Sharing Objects – Ch. 3 Visibility What is the source of the issue? Volatile Dekker’s algorithm Publication and Escape Thread Confinement Immutability.

Memory Consistency Models

CS510 Concurrent Systems Class 5 Threads Cannot Be Implemented As a Library.

Shared Memory Consistency Models: A Tutorial By Sarita V Adve and Kourosh Gharachorloo Presenter: Sunita Marathe.

Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.

A Behavioral Memory Model for the UPC Language Kathy Yelick Joint work with: Dan Bonachea, Jason Duell, Chuck Wallace.

Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.

Memory Models: A Case for Rethinking Parallel Languages and Hardware † Sarita V. Adve University of Illinois Acks: Mark Hill, Kourosh.

Evaluation of Memory Consistency Models in Titanium.

ICOM 5995: Performance Instrumentation and Visualization for High Performance Computer Systems Lecture 7 October 16, 2002 Nayda G. Santiago.

Memory Models: A Case for Rethinking Parallel Languages and Hardware COE 502/CSE 661 Fall 2011 Parallel Processing Architectures Professor: Muhamed Mudawar,

Software & the Concurrency Revolution by Sutter & Larus ACM Queue Magazine, Sept For CMPS Halverson 1.

Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995.

Shared Memory Consistency Models. Quiz (1)  Let’s define shared memory.

Foundations of the C++ Concurrency Memory Model Hans-J. Boehm Sarita V. Adve HP Laboratories UIUC.

Memory Consistency Models Alistair Rendell See “Shared Memory Consistency Models: A Tutorial”, S.V. Adve and K. Gharachorloo Chapter 8 pp of Wilkinson.

Rethinking Hardware and Software for Disciplined Parallelism Sarita V. Adve University of Illinois

Writing Systems Software in a Functional Language An Experience Report Iavor Diatchki, Thomas Hallgren, Mark Jones, Rebekah Leslie, Andrew Tolmach.

By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial.

Group 3: Architectural Design for Enhancing Programmability Dean Tullsen, Josep Torrellas, Luis Ceze, Mark Hill, Onur Mutlu, Sampath Kannan, Sarita Adve,

Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.

Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.

Threads Cannot be Implemented as a Library Hans-J. Boehm.

Multiprocessor Cache Consistency (or, what does volatile mean?) Andrew Whitaker CSE451.

Threads cannot be implemented as a library Hans-J. Boehm (presented by Max W Schwarz)

Parallelism without Concurrency Charles E. Leiserson MIT.

ICFEM 2002, Shanghai Reasoning about Hardware and Software Memory Models Abhik Roychoudhury School of Computing National University of Singapore.

CS533 Concepts of Operating Systems Jonathan Walpole.

What’s Ahead for Embedded Software? (Wed) Gilsoo Kim

Specifying Multithreaded Java semantics for Program Verification Abhik Roychoudhury National University of Singapore (Joint work with Tulika Mitra)

Gauss Students’ Views on Multicore Processors Group members: Yu Yang (presenter), Xiaofang Chen, Subodh Sharma, Sarvani Vakkalanka, Anh Vo, Michael DeLisi,

740: Computer Architecture Memory Consistency Prof. Onur Mutlu Carnegie Mellon University.

Tools and Libraries for Manycore Computing Kathy Yelick U.C. Berkeley and LBNL.

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Lecture 20: Consistency Models, TM

Memory Consistency Models

Memory Consistency Models

Threads and Memory Models Hal Perkins Autumn 2011

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Shared Memory Consistency Models: A Tutorial

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Threads and Memory Models Hal Perkins Autumn 2009

Lecture 22: Consistency Models, TM

Shared Memory Consistency Models: A Tutorial

Memory Consistency Models

Xinyu Feng University of Science and Technology of China

Relaxed Consistency Finale

Tools for the development of parallel applications

Programming with Shared Memory Specifying parallelism

Lecture: Consistency Models, TM

Problems with Locks Andrew Whitaker CSE451.

Presentation transcript:

Memory Models: The Case for Rethinking Parallel Languages and Hardware Sarita Adve University of Illinois at Urbana-Champaign

Memory Consistency Models Memory model is at the heart of concurrency semantics −20 year journey from confusion to convergence at last! −Hard lessons learned −Implications for future Current way of specifying concurrency semantics is too hard Must rethink parallel languages and hardware

Memory model defines what values a read can return Initially A=B=C=Flag=0 Thread 1 Thread 2 A = 26 while (Flag != 1) {;} B = 90 r1 = B … r2 = A Flag = 1 … What is a Memory Model?

What is a Memory Model? Interface between program and transformers of program −Defines what values a read can return C++ program Compiler Dynamic optimizer Hardware Language level model has implications for hardware Weakest system component exposed to programmer Memory model is at the heart of concurrency semantics Assembly

Desirable Properties of a Memory Model 3 Ps −Programmability −Performance −Portability Challenge: hard to satisfy all 3 Ps −Late 1980’s - 90’s: Largely driven by hardware  Lots of models, little consensus −2000 onwards: Largely driven by languages/compilers  Consensus model for Java, C++, Microsoft native code  Most major hardware vendors on board Next: Path to convergence, lessons learned, implications for future

Programmability – SC [Lamport79] Programmability: Sequential consistency (SC) most intuitive −Accesses of a single thread in program order −All accesses in a total order or atomic But Performance? −Recent hardware techniques boost performance with SC −But compiler transformations still inhibited But Portability? −Almost all current hardware, compilers violate SC  SC not practical yet, but…

Next Best Thing – SC for Almost Everyone Parallel programming too hard even with SC −Programmers (want to) write well structured code −Explicit synchronization, no data races Thread 1 Thread 2 Lock(L) Read Data1 Read Data2 Write Data2 Write Data1 Write Data3 Read Data3 … Unlock(L) −SC for such programs much easier: can reorder data accesses  Data-race-free model [AdveHill, Gharachorloo et al. 1990s] –SC for data-race-free programs –No guarantees for programs with data races

Definition of a Data Race Distinguish between data and non-data (synchronization) accesses Only need to define for SC executions  total order Two memory accesses form a race if −From different threads, to same location, at least one is a write −Occur one after another Thread 1 Thread 2 Write, A, 26 Write, B, 90 Read, Flag, 0 Write, Flag, 1 Read, Flag, 1 Read, B, 90 Read, A, 26 A race with a data access is a data race Data-race-free-program = No data race in any SC execution

Data-Race-Free Model Data-race-free model = SC for data-race-free programs −Does not preclude races for wait-free constructs, etc.  Requires races be explicitly identified as synch  E.g., use volatile variables in Java, atomics in C++ −Dekker’s algorithm Initially Flag1 = Flag2 = 0 volatile Flag1, Flag2 Thread1 Thread2 Flag1 = 1 Flag2 = 1 if Flag2 == 0 if Flag1 == 0 //critical section //critical section SC prohibits both loads returning 0

Data-Race-Free Approach Programmer’s model: SC for data-race-free programs Programmability −Simplicity of SC, for data-race-free programs Performance −Specifies minimal constraints (for SC-centric view) Portability −Language must provide way to identify races −Hardware must provide way to preserve ordering on races −Compiler must translate correctly

1990’s in Practice Hardware −Different vendors had different models – most non-SC  Alpha, Sun, x86, Itanium, IBM, AMD, HP, Convex, Cray, … −Various ordering guarantees + fences to impose other orders −Many ambiguities - due to complexity, by design(?), … High-level languages −Most shared-memory programming with Pthreads, OpenMP  Incomplete, ambiguous model specs  Memory model property of language, not library −Java – commercially successful language with threads  Chapter 17 of Java language spec described memory model  But hard to interpret, badly broken

2000 – 2004: Java Memory Model ~ 2000: Bill Pugh publicized fatal flaws in Java memory model Lobbied Sun to form expert group to revise Java model Open process via mailing list with diverse subscribers −Took 5 years of intense, spirited debates −Many competing models −Final consensus model approved in 2005 for Java 5.0 [MansonPughAdve POPL 2005]

Java Memory Model - Highlights Quick agreement that SC for data-race-free was required Missing piece: Semantics for programs with data races –Java cannot have undefined semantics for ANY program –Must ensure safety/security guarantees of language Goal: minimal semantics for races to satisfy security/safety −Problem: safety/security issues w/ threads very vague Final model based on consensus, but complex −Programmers can (must) use “SC for data-race-free” −But system designers must deal with complexity −Correctness checkers, racy programs, debuggers, …??

:C++, Microsoft Prism, Multicore ~ 2005: Hans Boehm started effort for C++ concurrency model −Prior status: no threads in C++, most concurrency w/ Pthreads Microsoft concurrently started its own internal effort C++ easier than Java because it is unsafe −Data-race-free is plausible model BUT −Multicore  New h/w optimizations, h/w vendors cared more −Pthreads has larger set of synchronization techniques −Can we really get away with no semantics for data races?

Hardware Implications of Data-Race-Free Synchronization (volatiles/atomics) must appear SC –Each thread’s synch must appear in program order synch Flag1, Flag2 T1 T2 Flag1 = 1 Flag2 = 1 Fence Fence if Flag2 == 0 if Flag1 == 0 critical section critical section SC  both reads cannot return 0 Requires efficient fences between synch stores/loads –All synchs must appear in a total order (atomic)

Independent reads, independent writes (IRIW): Initially X=Y=0 T1 T2 T3 T4 X = 1 Y = 1 … = Y … = X fence fence … = X … = Y SC  no thread sees new value until old copies invalidated –Shared caches w/ hyperthreading/multicore make this harder –Programmers don’t usually use IRIW –Why pay cost for SC in h/w if not useful to s/w? 0 Implications of Atomic Synch Writes 11 0

Implications of Data-Race-Free for Hardware 2006: Pressure to change Java/C++ to remove SC baseline To accommodate some hardware vendors But what is alternative? −Must allow above optimizations −But must be teachable to undergrads Showed such an alternative (probably) does not exist

C++ Compromise Default C++ model is data-race-free AMD, Intel, … on board But –Some IBM systems need expensive fence for SC –Some programmers really want more flexibility  C++ specifies low-level atomics only for experts  Complicates spec, but only for experts  We won’t be advertising this part [BoehmAdve PLDI 2008]

Summary of Current Status Convergence to “SC for data-race-free programs” as baseline For programs with data races −Minimal but complex semantics for safe languages −No semantics for unsafe languages

Lessons Learned Specifying semantics for programs with data races is HARD −But “no semantics for data races” also has problems  Unsafe language  Correctness checking tools  Debugging  Source-to-source compilers cannot introduce data races “Simple” optimizations can have unintended consequences We need −Higher-level concurrency models that banish data races −Concurrent hardware co-designed with high level models

What Next? What concurrency models should high-level languages support for correctness? Implications for hardware? −Rethink hardware from the ground up for correctness −Key: What concurrency models should hardware support? −Current hardware: lowest common denominator model  “Wild” hardware-coherent shared-memory + fences  Bad for correctness and for scalability −Future hardware concurrency models should be  Driven by correctness  Driven by disciplined high-level programming models

UPCRC/Illinois High-Level Models Determinism – paradigm for transformational programs When concurrency is only for performance, not part of the problem Program enforces fixed order for conflicting memory accesses −Every execution gives identical visible behavior −Sequential semantics: debug/test like a sequential program −Stronger than data-race-free Deterministic language mechanisms: −Explicit parallelism with type checks, data parallelism, … Shared-nothing – paradigm for reactive programs When concurrency is part of the problem specification Others: controlled non-determinism, higher abstraction (domain-specific, …)

Rethinking Hardware w/ Disciplined Parallelism Implications of deterministic high-level models −Known precedences among tasks, (almost?) known data sharing  Explicit synchronization  Explicit communication?  Software controlled shared memory?  Locality-driven task scheduling? But what about atomicity? Transactions are great, but … May also need “wild” shared memory – roll your own (explicit) synch −But limited to experts, encapsulated Need a disciplined sharing specification for hardware −Deterministic + transactions + limited wild shared memory ?? Integrate shared nothing

Rethinking Hardware w/ Disciplined Parallelism More generally, −What concurrency models should hardware be optimized for to foster correct-by-design parallel programs?  Driven by application needs, high-level concurrency models −How to support legacy programs? −Impact on how to express parallelism, communication, synchronization, coherence, memory consistency for hardware −Impact on rest of the system - runtime, correctness tools, languages, compilers C++, Java memory model convergence from same thought process −But this time, let’s co-design language and hardware models