Presentation is loading. Please wait.

Presentation is loading. Please wait.

CDP 2012 Based on “C++ Concurrency In Action” by Anthony Williams and The C++11 Memory Model and GCC WikiThe C++11 Memory Model and GCC Created by Eran.

Similar presentations


Presentation on theme: "CDP 2012 Based on “C++ Concurrency In Action” by Anthony Williams and The C++11 Memory Model and GCC WikiThe C++11 Memory Model and GCC Created by Eran."— Presentation transcript:

1 CDP 2012 Based on “C++ Concurrency In Action” by Anthony Williams and The C++11 Memory Model and GCC WikiThe C++11 Memory Model and GCC Created by Eran Gilad 1

2  The guarantees provided by the runtime environment to a multithreaded program, regarding memory operations.  Each level of the environment might have a different memory model – CPU, virtual machine, language.  The correctness of parallel algorithms depends on the memory model. 2

3 Isn’t the CPU memory model enough?  Threads are now part of the language. Their behavior must be fully defined.  Till now, different code was required for every compiler, OS and CPU combination.  Having a standard guarantees portability and simplifies the programmer’s work. 3

4 Given the following data race: Thread 1: x = 10; Thread 2: cout << x; Pre-2011 C++ C++11 Huh? What’s a “Thread”? Namely – behavior is unspecified. Undefined Behavior. But now there’s a way to get it right. 4

5 if (x >= 0 && x < 3) { switch (x) { case 0: do0(); break; case 1: do1(); break; case 2: do2(); break; } Can the compiler convert the switch to a quick jump table, based on x’s value? At this point, on some other thread: x = 8; Switch jumps to unknown location. Your monitor catches fire. Optimizations that are safe on sequential programs might not be safe on parallel ones. So, races are forbidden! 5

6 Optimizations are crucial for increasing performance, but might be dangerous:  Compiler optimizations: ◦ Reordering to hide load latencies ◦ Using registers to avoid repeating loads and stores  CPU optimizations: ◦ Loose cache coherence protocols ◦ Out Of Order Execution  A thread must appear to execute serially to other threads. Optimizations must not be visible to other threads. 6

7  Every variable of a scalar (simple) type occupies one memory location.  Bit fields are different.  One memory location must not be affected by writes to an adjacent memory location! 7 struct s { char c[4]; int i: 3, j: 4; struct in { double d; } id; }; Can’t read and write all 4 bytes when changing just one. Same mem. location Considered one mem. loc. even if hardware has no 64-bit atomic ops.

8  Sequenced before : The order imposed by the appearance of (most) statements in the code, for a single thread.  Synchronized with : The order imposed by an atomic read of a value that has been atomically written by some other thread.  Inter-thread happens-before : A combination of the above.  An event A happens-before an event B if: ◦ A inter-thread happens-before B or ◦ A is sequenced-before B. 8

9 Thread 1 (1.1) read x, 1 Sequenced Before (1.2) write y, 2 9 Thread 2 (2.1) read y, 2 Sequenced Before (2.2) write z, 3 Synchronized With (1.1) inter-thread happens-before (2.2) The full happens-before order: 1.1 < 1.2 < 2.1 < 2.2

10  Various classes – thread, mutex, condition_variable (for wait/notify) etc.  Not extremely strong, to allow portability (native_handle() allows lower level ops)  RAII extensively used – no finally clause in C++ exceptions. 10 std::mutex m; int main() { m.lock(); std::thread t(f); // do stuff m.unlock(); t.join(); } void f() { lock_guard lock(m); // do stuff // auto unlock }

11  The class to use when writing lock-free code!  A template wrapper around various types, providing access that is: ◦ Atomic – reads and writes are done as a whole. ◦ Ordered with respect to other accesses to the variable (or others).  The language offers specializations for basic types (bool, numbers, pointers). Custom wrappers can be created by the programmer.  Depending on the target platform, operations can be lock-free, or protected by a mutex. 11

12  The class offers member functions and operators for atomic loads and stores.  Atomic compare-exchange is also available.  Some specializations offer arithmetic operations such as ++, which are in fact executed atomically.  All operations guarantee sequential consistency by default. However, other options are possible as well.  Using atomic, C++ memory model supports various orderings of memory operations. 12

13  SC is the strongest order. It is the most natural to program with, and is therefore the default (can also explicitly specify memory_order_seq_cst if you want).  All processors see the exact same order of atomic writes.  Compiler optimizations are limited – no action can be reordered past an atomic action.  Atomic writes flush non-atomic writes that are sequenced-before them. 13

14 atomic x; int y; void thread1() { y = 1; x.store(2); } void thread2() { if (x.load() == 2) assert(y == 1); } The assert is guaranteed not to fail. 14

15  Happens-before now only exists between actions to the same variables.  Similar to cache coherence, but can be used on systems with incoherent caches.  Imposes fewer limitations on the compiler – atomic actions can now be reordered if they access different memory locations.  Less synchronization is required on the hardware side as well. 15

16 Both asserts might fail – no order between operations on x and y. #define rlx memory_order_relaxed /* save slide space… */ atomic x, y; void thread1() { y.store(20, rlx); x.store(10, rlx); } void thread2() { if (x.load(rlx) == 10) { assert(y.load(rlx) == 20); y.store(10, rlx); } void thread3() { if (y.load(rlx) == 10) assert(x.load(rlx) == 10); } 16

17 Let’s look at the trace of a run in which the assert failed (read 0 instead of 10 or 20): T1: W x, 10; W y, 20; T2: R x, 10; R y, 0; W y, 10; T3: R y, 10; R x, 0; T1: W x, 10; W y, 20; T2: R x, 10; R y, 0; W y, 10; T3: R y, 10; R x, 0; T1: W x, 10; W y, 20; T2: R x, 10; R y, 0; W y, 10; T3: R y, 10; R x, 0; 17

18  Synchronized-with relation exists only between the releasing thread and the acquiring thread.  Other threads might see updates in a different order.  All writes before a release are visible after an acquire, even if they were relaxed or non- atomic.  Similar to release consistency memory model. 18

19 #define rls memory_order_release /* save slide space… */ #define acq memory_order_acquire /* save slide space… */ atomic x, y; void thread1() { y.store(20, rls); } void thread2() { x.store(10, rls); } void thread3() { assert(y.load(acq) == 20 && x.load(acq) == 0); } void thread4() { assert(y.load(acq) == 0 && x.load(acq) == 10); } Both asserts might succeed – no order between writes to x and y. 19

20  Declaring a variable as volatile prevents the compiler from placing it in a register.  Exists for cases in which devices write to memory behind the compiler’s back.  Wrongly believed to provide consistency.  Useless in C++11 multi-threaded code – has no real consistency semantics, and might still cause races (and thus undefined behavior).  Use std::atomic instead.  In Java, volatile does guarantee atomicity and consistency. But this is C++… 20

21  Safest order is SC. It is also the default.  Orders can be mixed, but the result is extremely hard to understand… avoid that.  Some orders were not covered in this tutorial (mostly variants of those that were).  Memory order is a run-time argument. Use constants to let the compiler better optimize.  Lock-free code is hard to write. Unless speed is crucial, better use mutexes.  C++11 is new. Not all features are supported by common compilers. 21


Download ppt "CDP 2012 Based on “C++ Concurrency In Action” by Anthony Williams and The C++11 Memory Model and GCC WikiThe C++11 Memory Model and GCC Created by Eran."

Similar presentations


Ads by Google