CDP 2012 Based on “C++ Concurrency In Action” by Anthony Williams and The C++11 Memory Model and GCC WikiThe C++11 Memory Model and GCC Created by Eran.

Slides:

Advertisements

Similar presentations

Operating Systems Components of OS

Advertisements

Threads Cannot be Implemented As a Library Andrew Hobbs.

50.003: Elements of Software Construction Week 6 Thread Safety and Synchronization.

Memory Consistency Models Kevin Boos. Two Papers Shared Memory Consistency Models: A Tutorial – Sarita V. Adve & Kourosh Gharachorloo – September 1995.

CS492B Analysis of Concurrent Programs Lock Basics Jaehyuk Huh Computer Science, KAIST.

ECE 454 Computer Systems Programming Parallel Architectures and Performance Implications (II) Ding Yuan ECE Dept., University of Toronto

D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.

Shared Memory – Consistency of Shared Variables The ideal picture of shared memory: CPU0CPU1CPU2CPU3 Shared Memory Read/ Write The actual architecture.

Slides 8d-1 Programming with Shared Memory Specifying parallelism Performance issues ITCS4145/5145, Parallel Programming B. Wilkinson Fall 2010.

“THREADS CANNOT BE IMPLEMENTED AS A LIBRARY” HANS-J. BOEHM, HP LABS Presented by Seema Saijpaul CS-510.

6/10/2015C++ for Java Programmers1 Pointers and References Timothy Budd.

Programming Language Semantics Java Threads and Locks Informal Introduction The Java Specification Language Chapter 17.

Computer Architecture II 1 Computer architecture II Lecture 9.

CS1061 C Programming Lecture 4: Indentifiers and Integers A.O’Riordan, 2004.

1 Sharing Objects – Ch. 3 Visibility What is the source of the issue? Volatile Dekker’s algorithm Publication and Escape Thread Confinement Immutability.

CS510 Concurrent Systems Class 5 Threads Cannot Be Implemented As a Library.

Shared Memory Consistency Models: A Tutorial By Sarita V Adve and Kourosh Gharachorloo Presenter: Meenaktchi Venkatachalam.

Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.

Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.

IT253: Computer Organization Lecture 4: Instruction Set Architecture Tonga Institute of Higher Education.

DEPARTMENT OF COMPUTER SCIENCE & TECHNOLOGY FACULTY OF SCIENCE & TECHNOLOGY UNIVERSITY OF UWA WELLASSA 1 CST 221 OBJECT ORIENTED PROGRAMMING(OOP) ( 2 CREDITS.

CDP 2013 Based on “C++ Concurrency In Action” by Anthony Williams, The C++11 Memory Model and GCCThe C++11 Memory Model and GCC Wiki and Herb Sutter’s.

Foundations of the C++ Concurrency Memory Model Hans-J. Boehm Sarita V. Adve HP Laboratories UIUC.

Memory Consistency Models Alistair Rendell See “Shared Memory Consistency Models: A Tutorial”, S.V. Adve and K. Gharachorloo Chapter 8 pp of Wilkinson.

Practical Reduction for Store Buffers Ernie Cohen, Microsoft Norbert Schirmer, DFKI.

Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.

Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.

Threads Cannot be Implemented as a Library Hans-J. Boehm.

Java Thread and Memory Model

Multiprocessor Cache Consistency (or, what does volatile mean?) Andrew Whitaker CSE451.

Threads cannot be implemented as a library Hans-J. Boehm (presented by Max W Schwarz)

CHARLES UNIVERSITY IN PRAGUE faculty of mathematics and physics Advanced.NET Programming I 5 th Lecture Pavel Ježek

Fundamentals of Parallel Computer Architecture - Chapter 71 Chapter 7 Introduction to Shared Memory Multiprocessors Yan Solihin Copyright.

ICFEM 2002, Shanghai Reasoning about Hardware and Software Memory Models Abhik Roychoudhury School of Computing National University of Singapore.

Thread basics. A computer process Every time a program is executed a process is created It is managed via a data structure that keeps all things memory.

CS533 Concepts of Operating Systems Jonathan Walpole.

Memory Management OS Fazal Rehman Shamil. swapping Swapping concept comes in terms of process scheduling. Swapping is basically implemented by Medium.

Specifying Multithreaded Java semantics for Program Verification Abhik Roychoudhury National University of Singapore (Joint work with Tulika Mitra)

Week 9, Class 3: Java’s Happens-Before Memory Model (Slides used and skipped in class) SE-2811 Slide design: Dr. Mark L. Hornick Content: Dr. Hornick Errors:

3/12/2013Computer Engg, IIT(BHU)1 OpenMP-1. OpenMP is a portable, multiprocessing API for shared memory computers OpenMP is not a “language” Instead,

The C++11 Memory Model CDP Based on “C++ Concurrency In Action” by Anthony Williams, The C++11 Memory Model and GCCThe C++11 Memory Model and GCC Wiki.

Today Quiz not yet graded (Sorry!) More on Multithreading Happens-Before in Java Other langauges with Happens-Before Happens-Before in C++ SE-2811 Slide.

1 Written By: Adi Omari (Revised and corrected in 2012 by others) Memory Models CDP

December 1, 2006©2006 Craig Zilles1 Threads & Atomic Operations in Hardware  Previously, we introduced multi-core parallelism & cache coherence —Today.

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Lecture 20: Consistency Models, TM

The C++11 Memory Model CDP

Course Contents KIIT UNIVERSITY Sr # Major and Detailed Coverage Area

Advanced .NET Programming I 11th Lecture

Java Primer 1: Types, Classes and Operators

Java Programming Language

Memory Consistency Models

Threads Cannot Be Implemented As a Library

Memory Consistency Models

Threads and Memory Models Hal Perkins Autumn 2011

The C++ Memory model Implementing synchronization)

Implementing synchronization

Synchronization Issues

COP 4600 Operating Systems Fall 2010

Threads and Memory Models Hal Perkins Autumn 2009

COT 5611 Operating Systems Design Principles Spring 2012

Lecture 22: Consistency Models, TM

Dr. Mustafa Cem Kasapbaşı

Memory Consistency Models

(Computer fundamental Lab)

CSE 153 Design of Operating Systems Winter 19

Compilers, Languages, and Memory Models

Programming with Shared Memory Specifying parallelism

Lecture: Consistency Models, TM

Problems with Locks Andrew Whitaker CSE451.

Presentation transcript:

CDP 2012 Based on “C++ Concurrency In Action” by Anthony Williams and The C++11 Memory Model and GCC WikiThe C++11 Memory Model and GCC Created by Eran Gilad 1

 The guarantees provided by the runtime environment to a multithreaded program, regarding memory operations.  Each level of the environment might have a different memory model – CPU, virtual machine, language.  The correctness of parallel algorithms depends on the memory model. 2

Isn’t the CPU memory model enough?  Threads are now part of the language. Their behavior must be fully defined.  Till now, different code was required for every compiler, OS and CPU combination.  Having a standard guarantees portability and simplifies the programmer’s work. 3

Given the following data race: Thread 1: x = 10; Thread 2: cout << x; Pre-2011 C++ C++11 Huh? What’s a “Thread”? Namely – behavior is unspecified. Undefined Behavior. But now there’s a way to get it right. 4

if (x >= 0 && x < 3) { switch (x) { case 0: do0(); break; case 1: do1(); break; case 2: do2(); break; } Can the compiler convert the switch to a quick jump table, based on x’s value? At this point, on some other thread: x = 8; Switch jumps to unknown location. Your monitor catches fire. Optimizations that are safe on sequential programs might not be safe on parallel ones. So, races are forbidden! 5

Optimizations are crucial for increasing performance, but might be dangerous:  Compiler optimizations: ◦ Reordering to hide load latencies ◦ Using registers to avoid repeating loads and stores  CPU optimizations: ◦ Loose cache coherence protocols ◦ Out Of Order Execution  A thread must appear to execute serially to other threads. Optimizations must not be visible to other threads. 6

 Every variable of a scalar (simple) type occupies one memory location.  Bit fields are different.  One memory location must not be affected by writes to an adjacent memory location! 7 struct s { char c[4]; int i: 3, j: 4; struct in { double d; } id; }; Can’t read and write all 4 bytes when changing just one. Same mem. location Considered one mem. loc. even if hardware has no 64-bit atomic ops.

 Sequenced before : The order imposed by the appearance of (most) statements in the code, for a single thread.  Synchronized with : The order imposed by an atomic read of a value that has been atomically written by some other thread.  Inter-thread happens-before : A combination of the above.  An event A happens-before an event B if: ◦ A inter-thread happens-before B or ◦ A is sequenced-before B. 8

Thread 1 (1.1) read x, 1 Sequenced Before (1.2) write y, 2 9 Thread 2 (2.1) read y, 2 Sequenced Before (2.2) write z, 3 Synchronized With (1.1) inter-thread happens-before (2.2) The full happens-before order: 1.1 < 1.2 < 2.1 < 2.2

 Various classes – thread, mutex, condition_variable (for wait/notify) etc.  Not extremely strong, to allow portability (native_handle() allows lower level ops)  RAII extensively used – no finally clause in C++ exceptions. 10 std::mutex m; int main() { m.lock(); std::thread t(f); // do stuff m.unlock(); t.join(); } void f() { lock_guard lock(m); // do stuff // auto unlock }

 The class to use when writing lock-free code!  A template wrapper around various types, providing access that is: ◦ Atomic – reads and writes are done as a whole. ◦ Ordered with respect to other accesses to the variable (or others).  The language offers specializations for basic types (bool, numbers, pointers). Custom wrappers can be created by the programmer.  Depending on the target platform, operations can be lock-free, or protected by a mutex. 11

 The class offers member functions and operators for atomic loads and stores.  Atomic compare-exchange is also available.  Some specializations offer arithmetic operations such as ++, which are in fact executed atomically.  All operations guarantee sequential consistency by default. However, other options are possible as well.  Using atomic, C++ memory model supports various orderings of memory operations. 12

 SC is the strongest order. It is the most natural to program with, and is therefore the default (can also explicitly specify memory_order_seq_cst if you want).  All processors see the exact same order of atomic writes.  Compiler optimizations are limited – no action can be reordered past an atomic action.  Atomic writes flush non-atomic writes that are sequenced-before them. 13

atomic x; int y; void thread1() { y = 1; x.store(2); } void thread2() { if (x.load() == 2) assert(y == 1); } The assert is guaranteed not to fail. 14

 Happens-before now only exists between actions to the same variables.  Similar to cache coherence, but can be used on systems with incoherent caches.  Imposes fewer limitations on the compiler – atomic actions can now be reordered if they access different memory locations.  Less synchronization is required on the hardware side as well. 15

Both asserts might fail – no order between operations on x and y. #define rlx memory_order_relaxed /* save slide space… */ atomic x, y; void thread1() { y.store(20, rlx); x.store(10, rlx); } void thread2() { if (x.load(rlx) == 10) { assert(y.load(rlx) == 20); y.store(10, rlx); } void thread3() { if (y.load(rlx) == 10) assert(x.load(rlx) == 10); } 16

Let’s look at the trace of a run in which the assert failed (read 0 instead of 10 or 20): T1: W x, 10; W y, 20; T2: R x, 10; R y, 0; W y, 10; T3: R y, 10; R x, 0; T1: W x, 10; W y, 20; T2: R x, 10; R y, 0; W y, 10; T3: R y, 10; R x, 0; T1: W x, 10; W y, 20; T2: R x, 10; R y, 0; W y, 10; T3: R y, 10; R x, 0; 17

 Synchronized-with relation exists only between the releasing thread and the acquiring thread.  Other threads might see updates in a different order.  All writes before a release are visible after an acquire, even if they were relaxed or non- atomic.  Similar to release consistency memory model. 18

#define rls memory_order_release /* save slide space… */ #define acq memory_order_acquire /* save slide space… */ atomic x, y; void thread1() { y.store(20, rls); } void thread2() { x.store(10, rls); } void thread3() { assert(y.load(acq) == 20 && x.load(acq) == 0); } void thread4() { assert(y.load(acq) == 0 && x.load(acq) == 10); } Both asserts might succeed – no order between writes to x and y. 19

 Declaring a variable as volatile prevents the compiler from placing it in a register.  Exists for cases in which devices write to memory behind the compiler’s back.  Wrongly believed to provide consistency.  Useless in C++11 multi-threaded code – has no real consistency semantics, and might still cause races (and thus undefined behavior).  Use std::atomic instead.  In Java, volatile does guarantee atomicity and consistency. But this is C++… 20

 Safest order is SC. It is also the default.  Orders can be mixed, but the result is extremely hard to understand… avoid that.  Some orders were not covered in this tutorial (mostly variants of those that were).  Memory order is a run-time argument. Use constants to let the compiler better optimize.  Lock-free code is hard to write. Unless speed is crucial, better use mutexes.  C++11 is new. Not all features are supported by common compilers. 21