Shared Memory – Consistency of Shared Variables The ideal picture of shared memory: CPU0CPU1CPU2CPU3 Shared Memory Read/ Write The actual architecture.

Slides:

Advertisements

Similar presentations

Chapter 5 Part I: Shared Memory Multiprocessors

Advertisements

1 Episode III in our multiprocessing miniseries. Relaxed memory models. What I really wanted here was an elephant with sunglasses relaxing On a beach,

Implementation and Verification of a Cache Coherence protocol using Spin Steven Farago.

1 Written By: Adi Omari (Revised and corrected in 2012 by others) Memory models CDP Tutorial 7.

CS 162 Memory Consistency Models. Memory operations are reordered to improve performance Hardware (e.g., store buffer, reorder buffer) Compiler (e.g.,

D u k e S y s t e m s Time, clocks, and consistency and the JMM Jeff Chase Duke University.

Shared Memory – Consistency of Shared Variables The ideal picture of shared memory: CPU0CPU1CPU2CPU3 Shared Memory Read/ Write The actual architecture.

Slides 8d-1 Programming with Shared Memory Specifying parallelism Performance issues ITCS4145/5145, Parallel Programming B. Wilkinson Fall 2010.

Computer Architecture 2011 – coherency & consistency (lec 7) 1 Computer Architecture Memory Coherency & Consistency By Dan Tsafrir, 11/4/2011 Presentation.

1 Lecture 7: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.

Lecture 13: Consistency Models

Computer Architecture II 1 Computer architecture II Lecture 9.

1 Lecture 15: Consistency Models Topics: sequential consistency, requirements to implement sequential consistency, relaxed consistency models.

Memory Consistency Models

CPE 731 Advanced Computer Architecture Snooping Cache Multiprocessors Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.

Shared Memory Consistency Models: A Tutorial By Sarita V Adve and Kourosh Gharachorloo Presenter: Meenaktchi Venkatachalam.

Processor Consistency [Goodman 1989]* Processor Consistency is a memory model in which the result of any execution is the same as if the operations of.

Shared Memory Consistency Models: A Tutorial By Sarita V Adve and Kourosh Gharachorloo Presenter: Sunita Marathe.

Fundamental Issues in Parallel and Distributed Computing Assaf Schuster, Computer Science, Technion.

Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.

A Behavioral Memory Model for the UPC Language Kathy Yelick Joint work with: Dan Bonachea, Jason Duell, Chuck Wallace.

Evaluation of Memory Consistency Models in Titanium.

Shared Memory – Consistency of Shared Variables The ideal picture of shared memory: CPU0CPU1CPU2CPU3 Shared Memory Read/ Write The actual architecture.

Lazy Release Consistency for Software Distributed Shared Memory Pete Keleher Alan L. Cox Willy Z.

Computer Architecture 2015 – Cache Coherency & Consistency 1 Computer Architecture Memory Coherency & Consistency By Yoav Etsion and Dan Tsafrir Presentation.

Shared Memory Consistency Models. Quiz (1)  Let’s define shared memory.

ECE200 – Computer Organization Chapter 9 – Multiprocessors.

Memory Consistency Models Alistair Rendell See “Shared Memory Consistency Models: A Tutorial”, S.V. Adve and K. Gharachorloo Chapter 8 pp of Wilkinson.

Multiprocessor cache coherence. Caching: terms and definitions cache line, line size, cache size degree of associativity –direct-mapped, set and fully.

Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.

Memory Consistency Models. Outline Review of multi-threaded program execution on uniprocessor Need for memory consistency models Sequential consistency.

Memory Consistency Models 1. Uniform Consistency Models Only have read and write operations Sequential Consistency Pipelined-RAM Causal Consistency Coherence.

Multiprocessor Cache Consistency (or, what does volatile mean?) Andrew Whitaker CSE451.

Fundamentals of Parallel Computer Architecture - Chapter 71 Chapter 7 Introduction to Shared Memory Multiprocessors Yan Solihin Copyright.

Memory Consistency Zhonghai Lu Outline Introduction What is a memory consistency model? Who should care? Memory consistency models Strict.

CIS 720 Distributed Shared Memory. Shared Memory Shared memory programs are easier to write Multiprocessor systems Message passing systems: - no physically.

Understanding and Implementing Cache Coherency Policies CSE 8380: Parallel and Distributed Processing Dr. Hesham El-Rewini Presented by, Fazela Vohra CSE.

Multiprocessor  Use large number of processor design for workstation or PC market  Has an efficient medium for communication among the processor memory.

CS267 Lecture 61 Shared Memory Hardware and Memory Consistency Modified from J. Demmel and K. Yelick

Fundamentals of Memory Consistency Smruti R. Sarangi Prereq: Slides for Chapter 11 (Multiprocessor Systems), Computer Organisation and Architecture, Smruti.

The University of Adelaide, School of Computer Science

1 Programming with Shared Memory - 3 Recognizing parallelism Performance issues ITCS4145/5145, Parallel Programming B. Wilkinson Jan 22, 2016.

1 Written By: Adi Omari (Revised and corrected in 2012 by others) Memory Models CDP

CSC/ECE 506: Architecture of Parallel Computers Bus-Based Coherent Multiprocessors 1 Lecture 12 (Chapter 8) Lecture 12 (Chapter 8)

Multiprocessors – Locks

Symmetric Multiprocessors: Synchronization and Sequential Consistency

COSC6385 Advanced Computer Architecture

תרגול מס' 5: MESI Protocol

Cache Coherence in Shared Memory Multiprocessors

Memory Consistency Models

Computer Engineering 2nd Semester

The University of Adelaide, School of Computer Science

Lecture 11: Consistency Models

Memory Consistency Models

Symmetric Multiprocessors: Synchronization and Sequential Consistency

Consistency Models.

Shared Memory Consistency Models: A Tutorial

Bus-Based Coherent Multiprocessors

Multiprocessor Highlights

Distributed Shared Memory

The University of Adelaide, School of Computer Science

Lecture 24: Virtual Memory, Multiprocessors

Lecture 23: Virtual Memory, Multiprocessors

Why we have Counterintuitive Memory Models

Tools for the development of parallel applications

Programming with Shared Memory - 3 Recognizing parallelism

Prof John D. Kubiatowicz

Programming with Shared Memory Specifying parallelism

The University of Adelaide, School of Computer Science

Lecture 11: Consistency Models

Presentation transcript:

Shared Memory – Consistency of Shared Variables The ideal picture of shared memory: CPU0CPU1CPU2CPU3 Shared Memory Read/ Write The actual architecture of shared memory systems: R/W of Misses + Cache Invalidate CPU0CPU1CPU2CPU3 Shared Memory Read/ Write Local Cache Local Cache Local Cache Local Cache Symmetric Multi-Processor (SMP): CPU0CPU1CPU2CPU3 Local Memory Module Local Memory Module Local Memory Module Local Memory Module Network Distributed Shared Memory (DSM):

The Million $$s Question: How/When Does One Process Read Other Process’s Writes? CPUi Write value x to local copy of shared variable V W V,x Assumption: Initial value of shared variables is always 0. CPUj R V,0? R V,x? Read V from local copy Why is this a question? Because temporal order relations like “before/after” do not necessarily hold in a distributed system.

Non-Atomic write s/ read s (also called load s /store s) A read by Pi is considered performed with respect to Pk at a point in time when the issuing of a write to the same address by Pk cannot affect the value returned by the read. A write by Pi is considered performed with respect to Pk at a point in time when an issued read to the same address by Pk returns the value defined by this write (or a subsequent write to the same location). An access is performed when it is performed with respect to all processors. A read is globally performed if it is performed and if the write that is the source of the returned value has been performed. In what follows, we will think of atomic read/write but these definitions can be used to generalize.

Why Memory Model? a=0,b=0 Print(b)Print(a) a=1b=1 Printed:0,0? Printed:1,0? Printed:1,1? Answers the question: “Which writes by a process are seen by which reads of the other processes?”

Memory Consistency Models Pi: R V; W V,7; R V; R V Pj: R V; W V,13; R V; R V Example program: A consistency/memory model is an “agreement” between the execution environment (H/W, OS, middleware) and the processes. Runtime guarantees to the application certain properties on the way values written to shared variables become visible to reads. This determines the memory model, what’s valid, what’s not. Example execution: Pi: R V,0; W V,7; R V,7; R V,13 Pj: R V,0; W V,13; R V,13; R V,7 Order of writes to V as seen to Pi: (1) W V,7; (2) W V,13 Order of writes to V as seen to Pj: (1) W V,13; (2) W V,7

Memory Model: Coherence Coherence is the memory model in which (the runtime guarantees to the program that) writes performed by the processes for every specific variable are viewed by all processes in the same full order. Example program:All valid executions under Coherence: Pi: W V,7 R V Pj: W V,13 R V Note: the view of a process consists of the values it “sees” in its reads, and the writes it performs. Thus, if a R V in P which is later than a W V,x in P sees a value different than x, then a later R V cannot see x. Pi: W V,7 R V,7 Pj: W V,13 R V,13 R V,7 Pi: W V,7 R V,7 Pj: W V,13 R V,7 Pi: W V,7 R V,7 R V,13 Pj: W V,13 R V,13 Pi: W V,7 R V,13 Pj: W V,13 R V,13 Pi: W V,7 R V,7 Pj: W V,13 R V,13

Formal definition of Coherence Program Order: The order in which instructions appear in each process. This is a partial order on all the instructions in the program. A serialization: A full order on all the instructions (reads/writes) of all the processes, which is consistent with the program order. A legal serialization: A serialization in which each read X returns the value written by the latest write X in the full order. Let P be a program; let P X be the “sub-program” of P which contains all the read X/write X operations on X only. Coherence: P is said to be coherent if for every variable X there exists a legal serialization of P X. (Note: a process cannot distinguish one such serialization from another for a given execution)

Examples Process 1 write x,1 write x,2 Process 2 read x,2 read x,1 Coherent. Serializations: x: write x,1, read x,1 y: write y,1, read y,1 Not Coherent. Cycle of dependencies. Cannot be serialized. Not Coherent. Cannot be serialized. Process 2 read y,1 write x,1 Process 1 read x,1 write y,1 Process 1 read x,1 write x,2 Process 2 read x,2 write x,1 Process 2 read y,1 write x,1

Sequential Consistency [Lamport 1979] Sequential Consistency is the memory model in which all reads/writes performed by the processes are viewed by all processes in the same full order. Coherent. Not Sequentially consistent. Coherent. Not Sequentially consistent. Process 1 write x,1 write y,1 Process 2 read y,1 read x,0 Process 1 read x,1 write y,1 Process 2 read y,1 write x,1

Strict (Strong) Memory Models a=0,b=0 Print(b)Print(a) a=1b=1 Printed:0,0 or 0,1 or 1,0 Printed:1,1 Sequential Consistency: Given an execution, there exists an order of reads/writes which is consistent with all program orders. Coherence: For any variable x, there exists an order of read x/write x consistent with all p.o.s.

Formal definition of Sequential Consistency Let P be a program. Sequential Consistency: P is said to be sequentially consistent if there exists a legal serialization of all reads/writes in P. Observation: Every program which is sequentially consistent is also coherent. Conclusion: Sequential Consistency has stronger requirements and we thus say that it is stronger than Coherence. In general: A consistency model A is said to be (strictly) stronger than B if all executions which are valid under A are also valid under B.

The problem of strong consistency models The runtime system should ensure the existence of legal serialization, and the same consistent view for all processes. This requires lots of expensive coordination  degrades performance! P1: Print(U) Write V,1 P2: Print(V) Write U,1 SC: Hardware cannot reorder locally in each thread for this will result in a possible printing 1,1. HW may reorder anyway and postpone writes, but then why reorder in the first place?

Coherence Forbids Reordering p.x = 0 p.x=1 a=p.x b=q.x assert(a  b) Once thread sees an update – cannot “forget” it has seen it.  Cannot reorder two reads of the same memory location. q.x is aliased to p.x. Reordering may make assignment to B early (seeing 0) and that to A late (seeing 1). The right thread see order of writes different from left thread.

Coherence makes read s prevent common compiler optimizations p and q might point to same object p.x = 0 p.x=1 a=p.x b=q.x assert(p==q  a  b  c) Cannot put c=ac=p.x reads can make a process see writes by another process. The read “kills” later reuse of local values.