Memory Consistency Zhonghai Lu Outline Introduction What is a memory consistency model? Who should care? Memory consistency models Strict.

Memory Consistency Zhonghai Lu zhonghai@kth.se

Outline Introduction What is a memory consistency model? Who should care? Memory consistency models Strict consistency, sequential consistency Relaxed consistency models Processor, weak ordering, release consistency Summary December 13, 2015SoC Architecture2

December 13, 2015SoC Architecture3 Shared memory architectures

December 13, 2015SoC Architecture4 Memory Consistency Model Specifies constraints on the order in which memory operations (from any process) can appear to execute with respect to one another What orders are preserved? Given a load, constrain the possible values returned by it Without it, can’t tell much about a Shared Memory based program’s execution

December 13, 2015SoC Architecture5 Example of Orders What’s the intuition? Cache Coherence does not say, anything about the order between different variables A and B Whatever it is, we need an ordering model for clear semantics across different locations as well so programmers can reason about what results are possible P 1 P 2 /*Assume initial values of A and B are 0*/ (1a) A = 1;(2a) print B; (1b) B = 2;(2b) print A;

December 13, 2015SoC Architecture6 Memory Consistency Model Implications for both programmer and system designer Programmer uses to reason about correctness and possible results System designer can use to constrain how much accesses can be reordered by compiler or hardware Contract between programmer and system

Many Consistency Models Strict cinsistency (linearizability, or atomic consistency) sequential consistency causal consistency release consistency eventual consistency delta consistency PRAM consistency (also known as FIFO consistency) weak consistency vector-field consistency fork consistency one-copy Serializability entry consistency December 13, 2015SoC Architecture7

December 13, 2015SoC Architecture8 Goals of consistency mdoels Programmability: Enables programmers to reason about the behavior and correctness of programs Performance: Impose the ordering constraints that strike a good balance between programming complexity and performance Portability: Should be portable to different machines

Strict Consistency Model Strict consistency Any read to a memory location X returns the value stored by the most recent (last) write operation to X related to a global clock. For uni-processors, ’last’ write follows the program order. What is ’last’ for multiprocessors? P1: W(x)1 P2: R(x)1 R(x)1 P1: W(x)1 P2: R(x)0 R(x)1 P1: W(x)1 P2: R(x)0 R(x)1 OK NO (OK for Sequential Consistency) Assume that all variables initially have a value of 0.

Sequential Consistency

December 13, 2015SoC Architecture11 Sequential Consistency “A multiprocessor is sequentially consistent if the result of any execution is the same as if the operations of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program.” [Lamport, 1979]

December 13, 2015SoC Architecture12 Sequential Consistency (as if there were no caches, and a single memory) Total order achieved by interleaving accesses from different processes Maintains program order, and memory operations, from all processes, appear to [issue, execute, complete] atomically w.r.t. others Programmer’s intuition is maintained

December 13, 2015SoC Architecture13 SC example Program order among operations from a single processor Atomic execution of memory operations

December 13, 2015SoC Architecture14 SC Example What matters is order in which the program appears to execute, possible outcomes for (A,B): (0,0), (1,0), (1,2); impossible under SC: (0,2) we know 1a->1b and 2a->2b by program order A = 0 implies 2b->1a, which implies 2a->1b B = 2 implies 1b->2a, which leads to a contradiction actual execution 1b->2a->2b->1a is not SC P 1 P 2 /*Assume initial values of A and B are 0*/ (1a) A = 1;(2a) print B; (1b) B = 2;(2b) print A;

December 13, 2015SoC Architecture15 Discussion on SC Sequential consistency model Intuitive semantics to the programmer Easily implementable by satisfying its sufficient conditions Write completion Write atomicity: writes visible to all processes. Restricts many of performance optimizations with the hardware and compiler techniques.

Write buffer General interconnect with multiple memory modules Overlapping write operations Non-blocking read operations December 13, 2015SoC Architecture16 Canonical hardware optimization (without caches)

Write buffer Write transaction is not complete until acknolwedged On a write, a processor simply inserts the write operation into the write buffer and proceeds without waiting for the write to complete. Subsequent reads are allowed to by pass any previous writes in the write buffer for faster completion. Purpose: hide the latency of write operations Write buffers are safe to use in a uniprocessor since bypassing between operations to different locations does not lead to a violation of unprocessor data dependence. What happens in a multiprocessor? December 13, 2015SoC Architecture17 Write buffer

December 13, 2015SoC Architecture18 If write buffers are used, both reads of flag return 0, violating SC, the program order of Write2Read (to different locations). Terms t1, t2, t3, t4 indicate the order in which the corresponding read/write operations execute at memory.

Overlapping writes December 13, 2015SoC Architecture19 Allowing writes to different locations to be reordered is safe for uniprocessor programs. What about multiprocessors? The write completion may be out of program order. An example Interconnection network allows concurrent transactions. Multiple memory modules. To explore the concurrency allowed by the network and memory, write to another location starts before the previous one is complete (acknowedged).

Overlapping writes December 13, 2015SoC Architecture20 For P2, when Head=1, what is the value for Data? Since no gurantee that the write to Data completes before the write to Head, no guarantee that Data = 2000, violating SC, the program order of Write2Write (to different locations).

Nonblocking read operations December 13, 2015SoC Architecture21 Many processors do not stall for the return value of a read operation. They can proceed past a read opertion by using techniqueds such as speculative execution, and dynamic scheduling. Reads (Read2Read to different locations) complete out-of-program-order. What does this mean for multiprocessors?

Nonblocking read operations December 13, 2015SoC Architecture22 P2 reads Data before the updated Head, violating SC, the program order of Read2Read (to different locations).

More chance to reorder operations that can violate sequential consistency. E.g. write-through cache has the similar behavior as write buffer. Even if a read hits the cache, the processor cannot read the cached value until its previous operations by program order are complete!! Additional issues: Need cache coherence protocol to propagate (update, invalidate) a newly writen value to all caches copies of the modifed location. Detecting when a write is complete needs more transactions. Hard to make propagating to multiple copies atomic: more challenging to preserve the program order. December 13, 2015SoC Architecture23 Architectures with caches

Detect the completion of write oprations December 13, 2015SoC Architecture24 Suppose a write-through cache for P1 and P2 P2 initially has Data in its cache What if P2 reads Data from its cache after it sees Head=1, but before Data is updated ? This can be avoided if P1 waits for P2’s cache copy of Data to be updated or invalidated before proceeding with the write to Head.

Maintain the illusion of atomicity for writes December 13, 2015SoC Architecture25 All processors see writes to the same location in the same order, making writes appear atomic. Example A, B, C are cached P3 and P4 may see the writes to A by P1 and P2 in a different order. Register1 and register2 may get 1 and 2, respectively. This violates SC.

Maintain the illusion of atomicity for writes December 13, 2015SoC Architecture26 The value of a write not returned by a read until all invalidates are acked. Otherwise, violates SC. Example A, B, C are cached P2 sees A=1, P3 sees B=1, but A=1 not be seen, register1=0, violating SC.

Re-order memory references similar to hardware- generated re-orderings Register allocation example If the compiler register allocates the location Head on P2 (by doing a single read of P2 and then reading the value within the register), the while loop may never terminate in some executions (if the single read on P2 returns the old value of Head). This violates SC, because the loop is guarantted to terminate in every sequentially consistent execution of the code. December 13, 2015SoC Architecture27 Compiler optimization

Sequential consistency requirements: Program order requirement: a processor must ensure that its previous memory operation is complete before proceeding with the next memory operation in program order. A write is complete only after all invalidates (or updates) are acked. Write atomicity requirement (for cached arch.): Writes to the same location be made visible in the same order to all processors. The value of a write not returned by a read until all invalidates are acked. These requirements make many hardware and compiler optimizations invalid. Memory reference order must be strictly enforced. Instruction scheduling, register allocation, etc December 13, 2015SoC Architecture28 Summary of SC

To improve performance, need to Relax program order requirement Read/write order for different addresses  Write2Read, Write2Write, and Read2Read/Write Read/write order for the same address must always be enforced. Relax write atomicity requirement. Allow a read to return the value of another processor’s write before the write is complete (visible to all processors) Relaxation related to program order and write atomicity Allow a read to return the value of its own previous write before the write is complete. December 13, 2015SoC Architecture29 Relax the requirements

Relaxed consistency models

Relaxation Relaxed models that relax all program orders Processor consistency (PC) [Goodman] Weak consistency (weak ordering, WC or WO) [Dubois et al] Release consistency (RC) [Gharachorloo et al] December 13, 2015SoC Architecture31 Relaxed consistency models

Processor Consistency Processor consistency (PC) Writes done by a single processor are received by all other processors in the order in which they were issued, but writes from different processors may be seen in a different order by different processors The basic idea To better reflect the reality of networks in which the latency between different nodes can be different.

Processor Consistency Rules: 2 memory access conditions On a given processor, before a read is allowed to perform all previous read accesses must be performed. On a given processor, before a write is allowed to perform all previous read or write accesses must be performed. Example P1: W(x)1 W(x)2 P2: R(x)2 R(x)1 NO P1P2P3 A = 1;While (A==0); B = 1; While (B==0); Print A; SC: print 1 PC: print 0 or 1

Weak Consistency (WC) Idea: Accesses to shared variables should be done within critical sections; exploit this fact Memory accesses are distinguished as either data or sync opertaions. Rules: 3 memory access conditions All previous synchronization accesses must be performed before a read or a write access is allowed with respect to any other processor. All previous read and write accesses must be performed before a synchronization access is performed with respect to any other processor. Synchronization accesses are sequentially consistent with respect to one another. The WO model ensures that writes always appear atomic to the programmer

Program: Identify/label memory accesses as data or sync operations. Program construct(s) Define a special data type Compiler: translate the high-level intention to machine language: Associate the type with a pariticular address (region) Or, map the special type to, for example, a SYNC instruction, if the hardware provides such primitives. Hardware support: Each processor uses a counter to track outstanding transactions December 13, 2015SoC Architecture35 Implementing Weak consistency

Race Given a sequentially consistent execution, an operation forms a Race with another operation if the two operations access the same location; at least one of the operationts is write; there are no other intervening operations between the two operations Example The operations on Data are data operations, because the write and read of Data will always be separated by the intervening operations of the write and read of Head. The operations on Head are not always separated by other operations. Therefore, they are sync operations. December 13, 2015SoC Architecture36 When should an operation be a sync operation?

Programmer-centric view on Weak consistency Sync or Data? December 13, 2015SoC Architecture37 Never races? Given a memory location Data operation Sync operation No Yes Don’t know

Each processor uses a counter tracking outstanding transactions The counter is incremented when the processor issues an operation; is decremented when a previously issued operation completes; Each procesor must ensure that A sync operation is not issued until all previous operations are complete, i.e., count = 0. No operations are issued until the previous sync operation completes. Note: memory operations between two sync operations may still be reordered and overlapped with respect to one another. December 13, 2015SoC Architecture38 Hardware support for WC

Release Consistency (RC) Idea: Extends weak consistency by considering lock (acquire) and unlock (release) operations on synchronization variable Rules: 3 memory access conditions Before a read or write operation on shared data is performed, all previous acquires done by the process must have completed successfully. Before a release is allowed to be performed, all previous reads and writes by the process must have completed (flush writes) Accesses to synchronization variables are FIFO consistent (sequential consistency is not required). Example: valid ordering for release consistency

Release Consistency If all accesses to shared variables are surrounded by acquire and release operations, results are the same as with sequential consistency Blocks of operations within critical section are made atomic via acquire/release operations

Weak vs Release Consistency read/write Acquire(read) read/write Release(write) read/write sync read/write sync read/write Weak consistencyRelease consistency 1 2 3 1 2 3 Acquire -> all; All -> release; RCsc: special -> special special: acquire/release

Weak vs Release Consistency S (Sync) S waits for earlier writes R/W after S wait for S Weak Consistency Acq. (Lock) Rel. (Unlock) R/W wait for Acq. Release Consistency Rel. waits for earlier R/W Release consistency: further relax synchronization constraints by distinguishing between Acquire (Lock) and Release (Unlock) operations

December 13, 2015 SoC Architecture 43 Summary of consistency models Strict Consistency: A read always returns with the most recent write to the same memory location Sequential Consistency: The result of any execution appears as the interleaving of individual programs strictly in sequential program order Processor Consistency: Writes issued by each processor are in program order, but writes from different processors can be out of order Weak Consistency: Programmer uses synchronization operators to enforce sequential consistency Release Consistency: Weak consistency with two synchronization operators: acquire and release. Each operator is guaranteed to be processor-consistent Uniprocessors Multiprocessors

December 13, 201544 Summary of consistency models Rela- xation W2RW2WR2RWRead others’ write early Read own write early Safety net SC√ PC√√√ Read- modify-write WC√√√√ sync RC√√√√ Acquire, release A consistency model: what it is; what conditions and primitives to enforce; what order (relaxation) a processor sees; how does it differ from others.

Conclusion A memory consistency model is a contract between a shared memory machine with its programs 3P: programmablity, performance and portability Different consistency models exist. They have subtle but important differences. Different performance, overhead, hardware cost etc. Programmer prefers an intuitive interface, like SC. December 13, 2015SoC Architecture45

Memory Consistency Zhonghai Lu Outline Introduction What is a memory consistency model? Who should care? Memory consistency models Strict.

Similar presentations

Presentation on theme: "Memory Consistency Zhonghai Lu Outline Introduction What is a memory consistency model? Who should care? Memory consistency models Strict."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Memory Consistency Zhonghai Lu Outline Introduction What is a memory consistency model? Who should care? Memory consistency models Strict.

Similar presentations

Presentation on theme: "Memory Consistency Zhonghai Lu Outline Introduction What is a memory consistency model? Who should care? Memory consistency models Strict."— Presentation transcript:

Similar presentations

About project

Feedback