Presentation on theme: "EECE 5501 Chapter 5 Part I: Shared Memory Multiprocessors Small multiprocessor Typically uses SMP (symmetric multiprocessor) architecture Shared address."— Presentation transcript:
EECE 5501 Chapter 5 Part I: Shared Memory Multiprocessors Small multiprocessor Typically uses SMP (symmetric multiprocessor) architecture Shared address space directed supported by the hardware Common memory hierarchy configurations: Figure 5.2 Shared cache Bus-based SMP most common SMP arch. Dancehall Typically uses MIN (multistage interconnection network) Distributed memory (asymmetric) Shared memory supported through directory methods
EECE 5502 Cache Coherence When a memory location is read, memory should provide the latest value written to that location Uniprocessor systems use a memory hierarchy There is no cache coherence problem Multiprocessor systems typically have multiple caches Copies of the same data may reside in different caches Potential cache coherence problem
EECE 5503 Example of Cache Coherence Problem P1 U: 5 Cache U: 5 Memory P2 Cache P3 U: 5 Cache U = 7 U = ?
EECE 5504 Cache Coherency Formal Definition (bottom of p. 276) Informal Definition The memory system should behave as if all processors obtain all of their data from a single memory store. Properties required for cache coherence Write propagation Writes must become visible to all other processes Write serialization All writes to a location (by 1 or more processes) are seen in the same order by ALL processes
EECE 5505 Bus Snooping Concept shown in Figure 5.4 Requires continuous monitoring of the bus by each cache s cache controller Snooping protocol requires A set of states associated with memory blocks in local caches A state transition diagram, showing the required state changes for a matching block Actions associated with each state transition
EECE 5506 Uniprocessor Cache Concepts Write-through Information is written to BOTH cache AND to main memory Write-back Information is written to cache only Modified cache block is tagged as dirty and later written to main memory Dirty block written when it needs to be flushed to to block replacement
EECE 5507 Possible write miss policies Write-allocate Transfer block to cache, and then update value Write-no-allocate Block is modified in main memory only Cache block placement strategies Direct-mapped Only one possible location for each memory address Fully-associative Data for a given memory address can be stored anywhere in the cache Set-associative Data for a given memory address can be stored in a limited set of locations in the cache
EECE 5508 Bus Snooping Write-through cache Snooping is simpler since all writes can be seen on the bus Problems with scaling All writes generate bus traffic Figure 5.5 Bus snooping with write-through, write-no-allocate policy Suppose that a write-through, write-allocate policy is used How should Figure 5.5 be modified?
EECE 5509 Partial Order for Cache Coherence Total ordering can be based on partial orders Refer to middle of p. 282 Example: Figure 5.6 Partial order with write-through invalidation protocol Example 5.3
EECE Memory Consistency A memory consistency model … specifies constraints on the order in which memory operations must appear to be performed … with respect to one another. [Culler et. al. 1999, p. 285] Event synchronization through flags Figure 5.7 Explicit synchronization using barriers Figure 5.8 Order among accesses without synchronization Figure 5.9
EECE Sequential Consistency Values become visible to a process according to some sequential interleaving of the memory accesses for all processes Formal definition p. 286 (referenced from [Lamport 1979]) Figure 5.10: Programmer s view of sequential consistency Note: inter-process synchronization still required Write atomicity Example 5.4 All writes (to any location) should appear to all processors to have occurred in the same order
EECE Sufficient conditions for preserving sequential consistency (p. 289) Every process issues memory operations in program order After a write is issued, the issuing process waits for the write to complete before issuing next operation After a read operation is issued If the write whose value is being returned has performed with respect to this processor, then the processor should wait until the write has performed with respect to all processors. Example 5.5: Re-ordering of memory operations (Figure 5.7) Creates problems for parallel or multithreaded program