Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiprocessor cache coherence. Caching: terms and definitions cache line, line size, cache size degree of associativity –direct-mapped, set and fully.

Similar presentations


Presentation on theme: "Multiprocessor cache coherence. Caching: terms and definitions cache line, line size, cache size degree of associativity –direct-mapped, set and fully."— Presentation transcript:

1 Multiprocessor cache coherence

2 Caching: terms and definitions cache line, line size, cache size degree of associativity –direct-mapped, set and fully associative placement, replacement, location (tags) hits and misses clean vs. dirty entries write-through vs. write-back multi-level inclusion

3 Cache line, line and cache size caches hold multi-byte lines bytes are from sequential locations in memory (called a memory block) number of bytes in a line usually a multiple of bus width lines are identified by “tags” cache size = # lines * # bytes per line –tags are not included in cache size

4 Degree of associativity how many “ways” or places can we store a block from memory in the cache? –direct-mapped => 1 “way” or place to store a given block –set-associative => multiple sets, or “ways” –fully associative => a block from memory can be stored in any line in the cache

5 Placement when we bring a block in from memory, where do we put it? –use address of the first byte of the block –break into offset, index and tag –remove log 2 (# bytes in block) low-order bits for offset –use middle log 2 (# lines per way) bits to select line -> called “index” –remaining high-order bits are the tag

6 Replacement placement selects the same line number in each way if one way has an empty line at that location, use it if all ways have valid lines at that location, one will need to be victimized use LRU, clock, random,….

7 Locating a block: Tags how do we know whether a given block is in cache? calculate the index and tag from the address check the tags for that index in each way separate memory for tag array circuitry for tag comparisons

8 Hit vs. miss hit == block is in the cache read hit / miss: process wants to read one or more bytes in the block write hit / miss similarly we want high hit rates

9 Clean vs. dirty has the value in the cache been modified since it was placed in the cache? one bit per line similarly, one bit for valid / not valid

10 Write-through vs. write-back write-through: on a write, update the cache and also write to memory –more traffic to memory –no need to stall on replacement write-back: hold writes in cache, mark the line dirty –less memory traffic –coalesces multiple writes to the line

11 Multi-level inclusion (MLI) L1 (the child) holds a subset of L2 (the parent) L2 holds a subset of main memory affects servicing of misses & invalidations –see 6.3.1, pp. 232-233 in text constrains the organizations we can build if we want MLI - can switch to MLE, though ensures there will be an allocated line in the parent with the same contents as the child (if clean) or which can receive a dirty line from the child when it is replaced

12 Uniprocessor MLI assuming writeback caches Ap, Bp are parent’s associativity and line size, respectively; Ac and Bc for the child we are constrained to: Ap >= (Bp / Bc) Ac associativity must at least cover the ratio of the line sizes

13 Multiprocessor MLI a parent may have k children: Ap >= k (Bp / Bc) Ac e.g. parent is 32 KB, 16B lines, child is 1 KB, 4B lines & direct-mapped Ac / Bc = ¼. If k=1, Ap >= 4 If k=4, Ap >= 16

14 Coherence MLI makes it easy to keep an L1 / L2 pair consistent, or “coherent” what about multiple caches in a multiprocessor or multi-core system?

15 Shared memory machines

16 What’s the problem? TimeEventCache ACache BMemory 0X = 10 1CPU A reads X 10 2CPU B reads X 10 3CPU A writes X 2010

17 Coherence (formally) determines the value returned by a read coherent memory system: –If P writes to X then reads X, with no writes to X by other processors, should return value written by P –If P1 writes to X and then P2 reads from X, if read/write “sufficiently” separated in time, should return value written P1 –Writes to the same location are serialized; two writes to the same location by any two processors are seen in the same order by all processors

18 Consistency determines when a written value will be available to be read –“sufficiently separated” in the previous slide various consistency models are possible –later

19 Think / group / share how can we ensure coherence in a shared- memory multiprocessor?

20 Reading assignment section 7.3 on pages 281 to 290 in Baer –covers synchronization example 1, pages 269 to 270


Download ppt "Multiprocessor cache coherence. Caching: terms and definitions cache line, line size, cache size degree of associativity –direct-mapped, set and fully."

Similar presentations


Ads by Google