Memory Consistency Models Alistair Rendell See “Shared Memory Consistency Models: A Tutorial”, S.V. Adve and K. Gharachorloo Chapter 8 pp 262-265 of Wilkinson.

Slides:



Advertisements
Similar presentations
Chapter 5 Part I: Shared Memory Multiprocessors
Advertisements

Symmetric Multiprocessors: Synchronization and Sequential Consistency.
1 Episode III in our multiprocessing miniseries. Relaxed memory models. What I really wanted here was an elephant with sunglasses relaxing On a beach,
1 Lecture 20: Synchronization & Consistency Topics: synchronization, consistency models (Sections )
Memory Consistency Models Sarita Adve Department of Computer Science University of Illinois at Urbana-Champaign Ack: Previous tutorials.
Memory Consistency Models Kevin Boos. Two Papers Shared Memory Consistency Models: A Tutorial – Sarita V. Adve & Kourosh Gharachorloo – September 1995.
CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,
© Krste Asanovic, 2014CS252, Spring 2014, Lecture 12 CS252 Graduate Computer Architecture Spring 2014 Lecture 12: Synchronization and Memory Models Krste.
D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.
1 Lecture 20: Speculation Papers: Is SC+ILP=RC?, Purdue, ISCA’99 Coherence Decoupling: Making Use of Incoherence, Wisconsin, ASPLOS’04 Selective, Accurate,
CS492B Analysis of Concurrent Programs Consistency Jaehyuk Huh Computer Science, KAIST Part of slides are based on CS:App from CMU.
Slides 8d-1 Programming with Shared Memory Specifying parallelism Performance issues ITCS4145/5145, Parallel Programming B. Wilkinson Fall 2010.
1 Lecture 21: Transactional Memory Topics: consistency model recap, introduction to transactional memory.
CS 7810 Lecture 19 Coherence Decoupling: Making Use of Incoherence J.Huh, J. Chang, D. Burger, G. Sohi Proceedings of ASPLOS-XI October 2004.
By Sarita Adve & Kourosh Gharachorloo Review by Jim Larson Shared Memory Consistency Models: A Tutorial.
Computer Architecture 2011 – coherency & consistency (lec 7) 1 Computer Architecture Memory Coherency & Consistency By Dan Tsafrir, 11/4/2011 Presentation.
1 Lecture 7: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.
Lecture 13: Consistency Models
Computer Architecture II 1 Computer architecture II Lecture 9.
1 Lecture 15: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.
Memory Consistency Models
1 Lecture 12: Relaxed Consistency Models Topics: sequential consistency recap, relaxing various SC constraints, performance comparison.
Shared Memory – Consistency of Shared Variables The ideal picture of shared memory: CPU0CPU1CPU2CPU3 Shared Memory Read/ Write The actual architecture.
Shared Memory Consistency Models: A Tutorial By Sarita V Adve and Kourosh Gharachorloo Presenter: Meenaktchi Venkatachalam.
1 Lecture 22: Synchronization & Consistency Topics: synchronization, consistency models (Sections )
Shared Memory Consistency Models: A Tutorial By Sarita V Adve and Kourosh Gharachorloo Presenter: Sunita Marathe.
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
©RG:E0243:L2- Parallel Architecture 1 E0-243: Computer Architecture L2 – Parallel Architecture.
Computer Architecture 2015 – Cache Coherency & Consistency 1 Computer Architecture Memory Coherency & Consistency By Yoav Etsion and Dan Tsafrir Presentation.
Shared Memory Consistency Models: A Tutorial Sarita V. Adve Kouroush Ghrachorloo Western Research Laboratory September 1995.
Shared Memory Consistency Models. Quiz (1)  Let’s define shared memory.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Caltech CS184 Spring DeHon 1 CS184b: Computer Architecture (Abstractions and Optimizations) Day 12: May 3, 2003 Shared Memory.
By Sarita Adve & Kourosh Gharachorloo Slides by Jim Larson Shared Memory Consistency Models: A Tutorial.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.
Memory Consistency Models 1. Uniform Consistency Models Only have read and write operations Sequential Consistency Pipelined-RAM Causal Consistency Coherence.
Release Consistency Yujia Jin 2/27/02. Motivations Place partial order on memory accesses for correct parallel program behavior Relax partial order for.
DISTRIBUTED COMPUTING
Fundamentals of Parallel Computer Architecture - Chapter 71 Chapter 7 Introduction to Shared Memory Multiprocessors Yan Solihin Copyright.
Memory Consistency Zhonghai Lu Outline Introduction What is a memory consistency model? Who should care? Memory consistency models Strict.
CS533 Concepts of Operating Systems Jonathan Walpole.
1 Lecture 19: Scalable Protocols & Synch Topics: coherence protocols for distributed shared-memory multiprocessors and synchronization (Sections )
1 Lecture 3: Coherence Protocols Topics: consistency models, coherence protocol examples.
1 Lecture 20: Speculation Papers: Is SC+ILP=RC?, Purdue, ISCA’99 Coherence Decoupling: Making Use of Incoherence, Wisconsin, ASPLOS’04.
EE 382 Processor DesignWinter 98/99Michael Flynn 1 EE382 Processor Design Winter 1998 Chapter 8 Lectures Multiprocessors, Part I.
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Lecture 20: Consistency Models, TM
COSC6385 Advanced Computer Architecture
Memory Consistency Models
Lecture 11: Consistency Models
Memory Consistency Models
Lecture 18: Coherence and Synchronization
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Shared Memory Consistency Models: A Tutorial
Symmetric Multiprocessors: Synchronization and Sequential Consistency
Introduction to High Performance Computing Lecture 20
Threads and Memory Models Hal Perkins Autumn 2009
Lecture 22: Consistency Models, TM
Background for Debate on Memory Consistency Models
Shared Memory Consistency Models: A Tutorial
Lecture 10: Consistency Models
Memory Consistency Models
Lecture 24: Multiprocessors
Programming with Shared Memory Specifying parallelism
Lecture: Consistency Models, TM
Lecture 19: Coherence and Synchronization
Lecture 11: Relaxed Consistency Models
CS 152 Computer Architecture and Engineering CS252 Graduate Computer Architecture Lecture 19 Memory Consistency Models Krste Asanovic Electrical Engineering.
Lecture 11: Consistency Models
Presentation transcript:

Memory Consistency Models Alistair Rendell See “Shared Memory Consistency Models: A Tutorial”, S.V. Adve and K. Gharachorloo Chapter 8 pp of Wilkinson and Allen Parallel Programming

Dekkers Algorithm for Critical Sections What is Dekkers algorithm for critical sections? –Go and look this up! What fundamental assumption(s) is the above code making? initially flag1 = flag2 = 0 Process 0Process 1 flag1 = 1 if (flag2 == 0) critical section flag2 = 1 if (flag1 == 0) critical section “To write correct and efficient shared memory programs, programmers need a precise notion of how memory behaves with respect to read and write operations from multiple processors” (Adve and Gharachorloo)

Sequential Consistency Lamport’s definition: [A multiprocessor system is sequentially consistent if] the result of any execution is the same as if the operation of all the processors were executed in some sequential order, and the operations of each individual processor appear in this sequence in the order specified by its program Two aspects: –Maintaining program order among operations from individual processors –Maintaining a single sequential order among operations from all processors The latter aspect make it appear as if a memory operation executes atomically or instantaneously with respect to other memory operations

Programmers View of Sequential Consistency Conceptually –There is a single global memory and a switch that connects an arbitrary processor to memory at any time step –Each process issues memory operations in program order and the switch provides the global serialization among all memory operations. Memory P 1 ld, x3 st, x1 st, x2 ld, x1 P 2 ld, x3 ld, x2 st, x1 ld, x4 P n-1 st, x3 ld, x4 st, x4 st, x1 P n ld, x3 st, x4 ld, x2 st, x1

Processor Consistency Before a LOAD is allowed to perform wrt any processor, all previous LOAD accesses must be performed wrt everyone Before a STORE is allowed to perform wrt any processor, all previous LOAD AND STORE accesses must be performed wrt everyone. LOAD STORE LOAD Execution from bottom Intervening stores are to different addresses allowing this load to bypass the two stores 1st 2nd 3rd 4th Program Order

Breaking Sequential Consistency: Write Buffers Write buffer (very common) –Process inserts writes in write buffer and proceeds assuming it completes in due course –Subsequent reads bypass previous writes as long as read address does not overlap with those in write buffer Shared Bus P2P2 Write flag2 t4 P1P1 Write flag1 t3 Read flag1 t2 Read flag2 t1 flag1 : 0 flag2 : 0 Memory P1P1 P2P2 flag1 = 1 if (flag2==0) Critical Section flag2 = 1 if (flag1==0) Critical Section

Breaking Sequential Consistency: Overlapped Writes General (non-bus) Interconnect with multiple memory modules –Different memory operations issued by same processor are serviced by different memory modules –Writes from P 1 are injected into the memory system in program order, but they may complete out of program order –Digital Alpha processors coalesced write to the same cache line in a write buffer, and could lead to similar effects P1P1 Head : 0 Memory P1P1 P2P2 Data=2000 Head=1 While (head==0){;} …=Data Data : 0 P2P2 Write Data t4 Write Head t1 Read Head t2 Read Data t3 General Interconnect

Breaking Sequential Consistency: Non-Blocking Reads Effect of Non-Blocking Read –Early RISC machines stalled for value of a read –Many current generation machines have capability to proceed past a read using non-blocking caches, speculative execution, and dynamic scheduling P1P1 Head : 0 Memory P1P1 P2P2 Data=2000 Head=1 While (head==0){;} …=Data Data : 0 P2P2 Read Data t1 Write Data t2 Write Head t3 General Interconnect Read Head t4

Cache Coherency and Sequential Consistency One definition of Cache Coherency: –A write is eventually made visible to all processors –Writes to the same location appear to be seen in the same order by all processors Whereas sequential consistency requires: –Writes to ALL memory locations appear to be seen in the same order by all processors –Also require operations of a single processor appear to execute in program order

Multiple Caches and Writes Sequential consistency requires memory operations to appear atomic or instantaneous –But propagating changes to multiple cache copies is inherently non-atomic! One option is to prohibit a read from returning a newly written quantity until all cached copies have acknowledged receipt of the invalidation or update messages –Easier to ensure for invalidate protocol

Relaxed Memory Consistency Models How they relax –The program order requirement –The write atomicity requirement Relaxing the write to read program order –This is store buffer Relaxing write to read and write to write –Writes to different locations can be pipelined or overlapped Relaxing everything! –Weak consistency –Release/Acquire consistency models Write atomicity –Allowing a read to return another processors write before all cached copies have been updated

Weak Consistency Relies on the programmer having used critical sections to control access to shared variables –Within the critical section no other process can rely on that data structure being consistent until the critical section is exited We need to distinguish critical points when the programmer enters or leaves a critical section –Distinguish standard load/stores from synchronization accesses

Weak Consistency Before an ordinary LOAD/STORE is allowed to perform wrt any processor, all previous SYNCH accesses must be performed wrt everyone Before a SYNCH access is allowed to perform wrt any processor, all previous ordinary LOAD/STORE accesses must be performed wrt everyone SYNCH accesses are sequentially consistent wrt one another LOAD/STORE ⋮ LOAD/STORE SYNCH LOAD/STORE ⋮ LOAD/STORE SYNCH LOAD/STORE ⋮ LOAD/STORE ExecutIonOrderExecutIonOrder Not Consistent Not Consistent Not Consistent

Release Consistency Ordinary LOAD/STORE RELEASE ACQUIRE E x e c u t i o n O r d e r Before any ordinary LOAD/STORE is allowed to perform wrt any processor, all previous ACQUIRE accesses must be performed wrt everyone Before any RELEASE access is allowed to perform wrt any processor, all previous ordinary LOAD/STORE accesses must be performed wrt everyone Acquire/Release accesses are processor consistent wrt one another Ordinary LOAD/STORE Ordinary LOAD/STORE OK

Enforcing Consistency The hardware provides underlying instructions that are used to enforce consistency –Fence or Memory Bar instructions Different processors provide different types of fence instructions

Some Examples W  R Order W  W Order W  RW Order Rd Others’ Wt Earl Read Own Write Early Safety Net SequentialX IBM 370XSerialisation Inst Total Store OrderXXReadModifyWrite Processor (PC)XXXRMW Partial Store OrderXXXRMW, STBAR WeakXXXXSynchronization Release (SC)XXXXRelease,acquire, nsync, RMW Release (PC)XXXXXRelease,acquire, nsync, RMW AlphaXXXXMB, WMB SPARC v9XXXXVarious MBAR PowerPCXXXXXSYNC

Final Thoughts What consistency model best describes pthreads? –You need to consider how this applies to other programming models you encounter, such as OpenMP, Thread Building Blocks, Silk etc