Computer Architecture 2011 – coherency & consistency (lec 7) 1 Computer Architecture Memory Coherency & Consistency By Dan Tsafrir, 11/4/2011 Presentation.

Slides:



Advertisements
Similar presentations
Cache Coherence. Memory Consistency in SMPs Suppose CPU-1 updates A to 200. write-back: memory and cache-2 have stale values write-through: cache-2 has.
Advertisements

CSCI 8150 Advanced Computer Architecture
1 Lecture 20: Synchronization & Consistency Topics: synchronization, consistency models (Sections )
The University of Adelaide, School of Computer Science
Synchronization. How to synchronize processes? – Need to protect access to shared data to avoid problems like race conditions – Typical example: Updating.
Multi-core systems System Architecture COMP25212 Daniel Goodman Advanced Processor Technologies Group.
ECE 454 Computer Systems Programming Parallel Architectures and Performance Implications (II) Ding Yuan ECE Dept., University of Toronto
Process Synchronization. Module 6: Process Synchronization Background The Critical-Section Problem Peterson’s Solution Synchronization Hardware Semaphores.
Consistency Models Based on Tanenbaum/van Steen’s “Distributed Systems”, Ch. 6, section 6.2.
Slides 8d-1 Programming with Shared Memory Specifying parallelism Performance issues ITCS4145/5145, Parallel Programming B. Wilkinson Fall 2010.
CSE 490/590, Spring 2011 CSE 490/590 Computer Architecture Directory-Based Caches I Steve Ko Computer Sciences and Engineering University at Buffalo.
CS 7810 Lecture 19 Coherence Decoupling: Making Use of Incoherence J.Huh, J. Chang, D. Burger, G. Sohi Proceedings of ASPLOS-XI October 2004.
CIS629 Coherence 1 Cache Coherence: Snooping Protocol, Directory Protocol Some of these slides courtesty of David Patterson and David Culler.
CS252/Patterson Lec /23/01 CS213 Parallel Processing Architecture Lecture 7: Multiprocessor Cache Coherency Problem.
1 Lecture 18: Coherence Protocols Topics: coherence protocols for symmetric and distributed shared-memory multiprocessors (Sections )
1 COMP 206: Computer Architecture and Implementation Montek Singh Mon, Nov 14, 2005 Topic: Cache Coherence.
CPE 731 Advanced Computer Architecture Snooping Cache Multiprocessors Dr. Gheith Abandah Adapted from the slides of Prof. David Patterson, University of.
1 Lecture 22: Synchronization & Consistency Topics: synchronization, consistency models (Sections )
Snooping Cache and Shared-Memory Multiprocessors
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
1 Shared-memory Architectures Adapted from a lecture by Ian Watson, University of Machester.
Multiprocessor Cache Coherency
Memory Consistency Models Some material borrowed from Sarita Adve’s (UIUC) tutorial on memory consistency models.
Shared Address Space Computing: Hardware Issues Alistair Rendell See Chapter 2 of Lin and Synder, Chapter 2 of Grama, Gupta, Karypis and Kumar, and also.
Computer Architecture 2015 – Cache Coherency & Consistency 1 Computer Architecture Memory Coherency & Consistency By Yoav Etsion and Dan Tsafrir Presentation.
EECS 252 Graduate Computer Architecture Lec 13 – Snooping Cache and Directory Based Multiprocessors David Patterson Electrical Engineering and Computer.
ECE200 – Computer Organization Chapter 9 – Multiprocessors.
Lecture 13: Multiprocessors Kai Bu
Ch4. Multiprocessors & Thread-Level Parallelism 2. SMP (Symmetric shared-memory Multiprocessors) ECE468/562 Advanced Computer Architecture Prof. Honggang.
Memory Consistency Models Alistair Rendell See “Shared Memory Consistency Models: A Tutorial”, S.V. Adve and K. Gharachorloo Chapter 8 pp of Wilkinson.
Cache Coherence Protocols 1 Cache Coherence Protocols in Shared Memory Multiprocessors Mehmet Şenvar.
Shared Memory Consistency Models. SMP systems support shared memory abstraction: all processors see the whole memory and can perform memory operations.
Memory Consistency Models 1. Uniform Consistency Models Only have read and write operations Sequential Consistency Pipelined-RAM Causal Consistency Coherence.
December 1, 2006©2006 Craig Zilles1 Threads and Cache Coherence in Hardware  Previously, we introduced multi-cores. —Today we’ll look at issues related.
Cache Coherence Protocols A. Jantsch / Z. Lu / I. Sander.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 5, 2005 Session 22.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 March 20, 2008 Session 9.
1 Lecture 3: Coherence Protocols Topics: consistency models, coherence protocol examples.
Distributed shared memory u motivation and the main idea u consistency models F strict and sequential F causal F PRAM and processor F weak and release.
The University of Adelaide, School of Computer Science
1 Dynamic Decentralized Cache Schemes for MIMD Parallel Processors Larry Rudolph Zary Segall Presenter: Tu Phan.
CS267 Lecture 61 Shared Memory Hardware and Memory Consistency Modified from J. Demmel and K. Yelick
CMSC 611: Advanced Computer Architecture Shared Memory Most slides adapted from David Patterson. Some from Mohomed Younis.
The University of Adelaide, School of Computer Science
Multi Processing prepared and instructed by Shmuel Wimer Eng. Faculty, Bar-Ilan University June 2016Multi Processing1.
Lecture 13: Multiprocessors Kai Bu
1 Computer Architecture & Assembly Language Spring 2001 Dr. Richard Spillman Lecture 26 – Alternative Architectures.
COMP 740: Computer Architecture and Implementation
תרגול מס' 5: MESI Protocol
Operating Systems Engineering Scalable Locks
The University of Adelaide, School of Computer Science
The University of Adelaide, School of Computer Science
Lecture 18: Coherence and Synchronization
12.4 Memory Organization in Multiprocessor Systems
Multiprocessor Cache Coherency
The University of Adelaide, School of Computer Science
Example Cache Coherence Problem
The University of Adelaide, School of Computer Science
CMSC 611: Advanced Computer Architecture
Chapter 5 Exploiting Memory Hierarchy : Cache Memory in CMP
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Cache coherence CEG 4131 Computer Architecture III
Lecture 24: Multiprocessors
Coherent caches Adapted from a lecture by Ian Watson, University of Machester.
Lecture 17 Multiprocessors and Thread-Level Parallelism
Programming with Shared Memory Specifying parallelism
Lecture 19: Coherence and Synchronization
The University of Adelaide, School of Computer Science
Lecture 17 Multiprocessors and Thread-Level Parallelism
Presentation transcript:

Computer Architecture 2011 – coherency & consistency (lec 7) 1 Computer Architecture Memory Coherency & Consistency By Dan Tsafrir, 11/4/2011 Presentation based on slides by David Patterson, Avi Mendelson, Lihu Rappoport, and Adi Yoaz

Computer Architecture 2011 – coherency & consistency (lec 7) 2 Coherency - intro u When there’s only one core  Caching doesn’t affect correctness u But what happens when ≥ 2 cores work simultaneously on same memory location?  If both are reading, not a problem  Otherwise, one might use a stale, out-of-date copy of the data  The inconsistencies might lead to incorrect execution u Terminology  Memory coherency Cache coherency Processor 1 L1 cache Processor 2 L1 cache L2 cache (shared) Memory

Computer Architecture 2011 – coherency & consistency (lec 7) 3 The cache coherency problem for a single memory location Memory contents for location X Cache contents for CPU-2 Cache contents for CPU-1 EventTime 10 11CPU-1 reads X1 111CPU-2 reads X2 010CPU-1 stores 0 into X3 Stale value, different than corresponding memory location and CPU-1 cache. (The next read by CPU-2 will yield “1”.)

Computer Architecture 2011 – coherency & consistency (lec 7) 4 A memory system is coherent if… u Informally, we could say (or we would like to say) that... u A memory system is coherent if… u Any read of a data item returns the most recently written value of that data item u (This definition is intuitive, but overly simplistic) u More formally…

Computer Architecture 2011 – coherency & consistency (lec 7) 5 A memory system is coherent if… 1. - Processor P writes to location X, and later - P reads from X, and - No other processor writes to X between above write & read => Read must return value previously written by P 2. - P1 writes to X - Some time – T – elapses - P2 reads from X => For big enough T, P2 will read the value written by P1 3. Two writes to same location by any two processors are serialized => Are seen in the same order by all processors (if “1” and then “2” are written, no processor would read “2” & “1”)

Computer Architecture 2011 – coherency & consistency (lec 7) 6 A memory system is coherent if… 1. - Processor P writes to location X, and later - P reads from X, and - No other processor writes to X between above write & read => Read must return value previously written by P 2. - P1 writes to X - Some time – T – elapses - P2 reads from X => For big enough T, P2 will read the value written by P1 3. Two writes to same location X by any two processors are serialized => Are seen in the same order by all processors (if “1” and then “2” are written, no processor would read “2” & “1”) Simply preserves program order (needed even on uniprocessor). Defines notation of what it means to have a coherent view of memory; if X is never updated regardless of the duration of T, than the memory is not coherent. If P1 writes to X and then P2 writes to X, serialization of writes ensures that every processor will see P2’s write eventually; otherwise P1’s value might be maintained indefinitely.

Computer Architecture 2011 – coherency & consistency (lec 7) 7 Memory Consistency u The coherency definition is not enough  So as to be able to write correct programs  It must be supplemented by a consistency model  Critical for program correctness u Coherency & consistency are 2 different, complementary aspects of memory systems  Coherency What values can be returned by a read Relates to behavior of reads & writes to the same memory location  Consistency When will a written value be returned by a subsequent read Relates to behavior of reads & writes to different memory locations

Computer Architecture 2011 – coherency & consistency (lec 7) 8 Memory Consistency (cont.) u “How consistent is the memory system?”  A nontrivial question u Assume: locations A & B are originally cached by P1 & P2  With initial value = 0 u If writes are immediately seen by other processors  Impossible for both “if” conditions to be true  Reaching “if” means either A or B must hold 1 u But suppose:  (1) “Write invalidate” can be delayed, and  (2) Processor allowed to compute during this delay  => It’s possible P1 & P2 haven’t seen the invalidations of B & A until after the reads, thus, both “if” conditions are true u Should this be allowed?  Determined by consistency model Processor P2Processor P1 B = 0;A = 0; …… B = 1;A = 1; if ( A == 0 ) …if ( B == 0 ) …

Computer Architecture 2011 – coherency & consistency (lec 7) 9 Consistency models u From most strict to most relaxed  Strict consistency  Sequential consistency  Weak consistency  Release consistency  […many…] u Stricter models are  Easier to understand  Harder to implement  Slower  Involve more communication  Waste more energy

Computer Architecture 2011 – coherency & consistency (lec 7) 10 Strict consistency (“linearizability”) u All memory operations are ordered in time u Any read to location X returns the most recent write op to X u This is the intuitive notion of memory consistency u But too restrictive and thus unused

Computer Architecture 2011 – coherency & consistency (lec 7) 11 Sequential consistency u Relaxation of strict (defined by Lamport) u Requires the result of any execution be the same as if memory accesses were executed in some arbitrary order  Can be a different order upon each run u Left is sequentially consistent (can be ordered as in the right) u Q. What if we flip the order of P2’s reads (on left)? W(x)1P1: R(x)2R(x)1P2: R(x)2R(x)1P3: W(x)2P4: W(x)1P1: R(x)2R(x)1P2: R(x)2R(x)1P3: W(x)2P4: time

Computer Architecture 2011 – coherency & consistency (lec 7) 12 Weak consistency 1. Access to “synchronization variables” are sequentially consistent 2. No access to a synchronization variable is allowed to be performed until all previous writes have completed everywhere 3. No data access (read or write) is allowed to be performed until all previous accesses to synchronization variables have been performed u In other words, the processor doesn’t need to broadcast values at all, until a synchronization access happens u But then it broadcasts all values to all cores SW(x)2W(x)1P1: R(x)2S R(x)0P2: R(x)2SR(x)1P3:

Computer Architecture 2011 – coherency & consistency (lec 7) 13 Release consistency u Before accessing shared variable  Acquire op must be completed u Before a release allowed  All accesses must be completed u Acquire/release calls are sequentially consistent u Serves as “lock”

Computer Architecture 2011 – coherency & consistency (lec 7) 14 MESI Protocol u Each cache line can be on one of 4 states  Invalid – Line data is not valid (as in simple cache)  Shared – Line is valid & not dirty, copies may exist in other caches  Exclusive – Line is valid & not dirty, other processors do not have the line in their local caches  Modified – Line is valid & dirty, other processors do not have the line in their local caches u (MESI = Modified, Exclusive, Shared, Invalid) u Achieves sequential consistency

Computer Architecture 2011 – coherency & consistency (lec 7) 15 Two classes of protocols to track sharing u Directory based  Status of each memory block kept in just 1 location (=directory)  Directory-based coherence has bigger overhead  But can scale to bigger core counts u Snooping  Every cache holding a copy of the data has a copy of the state  No centralized state  All caches are accessible via broadcast (bus or switch)  All cache controllers monitor (or “snoop”) the broadcasts To determine if they have a copy of what’s requsted

Computer Architecture 2011 – coherency & consistency (lec 7) 16 Processor 1 L1 cache Processor 2 L1 cache L2 cache (shared) Memory [1000]: 5 miss Multi-processor System: Example u P1 reads 1000 u P1 writes 1000 [1000]: 5 [1000] miss [1000]: 5 [1000]: 6 EM 00 10

Computer Architecture 2011 – coherency & consistency (lec 7) 17 Processor 1 L1 cache Processor 2 L1 cache L2 cache (shared) Memory MS [1000]: 5 Multi-processor System: Example u P1 reads 1000 u P1 writes 1000 u P2 reads 1000 u L2 snoops 1000 u P1 writes back 1000 u P2 gets 1000 [1000]: 5 [1000]: 6 [1000] miss [1000]: 6 S 10 11

Computer Architecture 2011 – coherency & consistency (lec 7) 18 Processor 1 L1 cache Processor 2 L1 cache L2 cache (shared) Memory MS [1000]: 5 Multi-processor System: Example u P1 reads 1000 u P1 writes 1000 u P2 reads 1000 u L2 snoops 1000 u P1 writes back 1000 u P2 gets 1000 [1000]: 6 S u P2 requests for ownership with write intent [1000] I 01 [1000] E

Computer Architecture 2011 – coherency & consistency (lec 7) 19 The alternative: incoherent memory u As core counts grow, many argue that maintaining coherence  Will slow down the machines  Will waste a lot of energy  Will not scale u Intel SCC  Single chip cloud computer – for research purposes  48 cores  Shared, incoherent memory  Software is responsible for correctness u The Barrelfish operating system  By Microsoft & ETH (Zurich)  Assumes no coherency as the base line

Computer Architecture 2011 – coherency & consistency (lec 7) 20

Computer Architecture 2011 – coherency & consistency (lec 7) 21 Intel SCC Shared (incoherent) memory