Presentation is loading. Please wait.

Presentation is loading. Please wait.

IntroductionSnoopingDirectoryConclusion IntroductionSnoopingDirectoryConclusion Memory 1A 2B 3C 4D 5E Cache 1 1A 2B 3C Cache 2 3C 4D 5E Cache 4 1A 2B.

Similar presentations


Presentation on theme: "IntroductionSnoopingDirectoryConclusion IntroductionSnoopingDirectoryConclusion Memory 1A 2B 3C 4D 5E Cache 1 1A 2B 3C Cache 2 3C 4D 5E Cache 4 1A 2B."— Presentation transcript:

1

2 IntroductionSnoopingDirectoryConclusion

3 IntroductionSnoopingDirectoryConclusion Memory 1A 2B 3C 4D 5E Cache 1 1A 2B 3C Cache 2 3C 4D 5E Cache 4 1A 2B 4D Cache 3 1A 3C 5E Memory 1A 2B 3C 4D 5E Cache 1 1A 2B 3C Cache 2 3C 4D 5E Cache 4 1A 2B 4D Cache 3 1A 3F 5E

4 IntroductionSnoopingDirectoryConclusion Memory 1A 2B 3C 4D 5E Cache 1 1A 2B 3C Cache 2 3C 4D 5E Cache 4 1A 2B 4D Cache 3 1A 3C 5E Memory 1A 2B 3F 4D 5E Cache 1 1A 2B 3C Cache 2 3C 4D 5E Cache 4 1A 2B 4D Cache 3 1A 3F 5E

5 IntroductionSnoopingDirectoryConclusion The goal of a coherence protocol is to maintain coherence by enforcing the SWMR invariant: Single-Writer, Multiple-Read (SWMR) invariant: For any memory location “A”, at any given time, there exist only one core that may write to A or some number of cores that may read it. issued coherence requests & responses Core Cache Controller Interconnection Network Cache loads & stores loaded values received coherence requests & responses received coherence requests & responses Memory Controller Interconnection Network Memory issued coherence requests & responses

6 IntroductionSnoopingDirectoryConclusionCore Cache Controller Interconnection Network Cache Memory Controller Interconnection Network Memory

7 IntroductionSnoopingDirectoryConclusion

8 IntroductionSnoopingDirectoryConclusion

9  Transient states occur during the transition from one stable state to another one.  XY z : the block is transition from stable state X to stable state Y and the transition will not be complete until an event of type Z occurs.  IM D : denotes that a block was in the I state and will become in the M state when data (D) is received. IntroductionSnoopingDirectoryConclusion

10 IntroductionSnoopingDirectoryConclusion

11  To maintain the state of blocks in caches, the most common way is to add some extra bit at the end of each block. For example, in MOSEI we need 3 bits to show the state.  To maintain the state of blocks in memory, we can use the same approach. Alternatively, we can use logical gates. For example we can use an NOR gate and if one of its inputs are OWNED = 1, the state of the block in memory would be I = 0. IntroductionSnoopingDirectoryConclusion Block DataState 10011…….000 -> I 11111…….001 -> O 00000…….101 -> M Block state in cache 1 Block state in cache 2 Block state in cache 3 State of block in memory

12  Most protocols have a similar set of transactions, because the basic goals of the coherence controllers are similar.  Transactions are all initiated by cache controllers that are responding to requests from their associated cores IntroductionSnoopingDirectoryConclusion TransactionGoal GetShared (GetS)Obtain block in Shared (read-only) state. GetModified (GetM)Obtain block in Modified (read-write) state. Upgrade (Upg)Upgrade block state from read-only (Shared or Owned) to read-write (Modified); Upg (unlike GetM) does not require data to be sent to requestor. PutShared (PutS)Evict block in Shared state. PutExclusive (PutE)Evict block in Exclusive state. PutOwned (PutO)Evict block in Owned state. PutModified (PutM)Evict block in Modified state.

13  Events are core requests to their cache controllers. IntroductionSnoopingDirectoryConclusion EventResponse of Cache Controller Load if cache hit, respond with data from cache; else initiate GetS transaction Soteif cache hit in state E or M, write data into cache; else initiate GetM or Upg transaction Atomic read-modify-writeif cache hit in state E or M, atomically execute read- modify-write semantics; else initiate GetM or Upg transaction Instruction fetch if cache hit (in I-cache), respond with instruction from cache; else initiate GetS transaction Read-only prefetch if cache hit, ignore; else may optionally initiate GetS transaction Read-write prefetchIf cache hit in state M, ignore; else may optionally initiate GetM or Upg transaction Replacement depending on state of block, initiate PutS, PutE, PutO, or PutM transaction

14 IntroductionSnoopingDirectoryConclusion

15  Snooping protocols  Directory protocols  Hybrid (a combination of Snooping and Directory protocols) IntroductionSnoopingDirectoryConclusion

16 IntroductionSnoopingDirectoryConclusion TimeC1C2Memory 0A:I A:I, Owner 1A: GetM from C1 /M, OwnerA: GetM from C1/IGetM from C1/ M 2A: GetM from C2 /IA: GetM from C2/M, OwnerGetM from C2/ M TimeC1C2Memory 0A:I A:I, Owner 1A: GetM from C1 /M, OwnerA: GetM from C2/M, OwnerGetM from C1/ M 2A: GetM from C2 /IA: GetM from C1/IGetM from C2/ M

17 IntroductionSnoopingDirectoryConclusion MAIN MEMORY core Interconnection network LLC/direct ory controller Last-level cache (LLC) Private data (LI) cache Cache controller core Cache controller Private data (LI) cache MULTICORE PROCESSOR CHIP

18 IntroductionSnoopingDirectoryConclusion

19 IntroductionSnoopingDirectoryConclusion StateState Core EventsBus Event Own Transaction Other Cores Transactions LoadStore Replacement GetSGetMPutMdataGetSGetMPutM I GetS/IS D IS D stall loadstall storestall evict copy data into cache, load hit/S (A) IM D stall loadstall storestall evict copy data into cache, store hit/M (A) S load hitGetM/SM D -/I SM D load hitstall storestall evict copy data into cache, load hit/S (A) M load hitstore hitPutM, Send data to memory /I send data to req and memory/S send data to req/I

20 IntroductionSnoopingDirectoryConclusion stateBus Events GetSGetMPutMData from Owner IorSSend data block to requestor/IorS Send data block to requestor/M IorS D (A) Update data block in memory/IorS M-/IorS D

21 IntroductionSnoopingDirectoryConclusion

22 Implements atomic transactions and non-atomic request properties. The Exclusive state is used in almost all commercial coherence protocols because it optimizes a common case: a core first reads a block and then subsequently writes it.  In MSI, a core needs to issue a GetS message to get the read permission (in case a cache miss) and then have to issue a GetM message to get the write permission.  In MESI, a core can get the block in the exclusive state and no other block can access it anymore. Thus, the core does not need to issue a GetM message. IntroductionSnoopingDirectoryConclusion

23 LoadStoreRepl. GetSGetMPutM GetSGetMPutMData I GetS/ IS AD GetM/ IM AD --- IS AD stall -/IS D --- IS D stall (A) -/S-/E IM AD stall -/IM D --- IM D stall (A) -/M S hitGetM/ SM AD -/I - - SM AD hitstall -/SM D --/IM AD - SM D hitstall (A) -/M E hithit/MPutM/ EI A data to R & M/S data to R/I - M hit PutM/ MI A data to R & M/S data to R/I - MI A hit stall data to M/I data to M & R/II A data to R/II A - EI A hitstall -/Idata to M & R/II A data to R/II A - II A stall -/I--- IntroductionSnoopingDirectoryConclusion

24 GetSGetMPutMDataNoDataNoData-E Idata to R/EorM data to R/EorM -/I D Sdata to R/EorM data to R/EorM -/S D EorM-/SD--/EorM D IDID (A) write data to M/I -/I SDSD (A) write data to M/S -/S EorM D (A) write data to M/I -/EorM-/I IntroductionSnoopingDirectoryConclusion

25 IntroductionSnoopingDirectoryConclusion

26  Uses MOESI  Non-atomic requests and transactions.  Supports up to 64bit processors.  Wired snooping busses consume lots of energy; thus, they do not scale up to large number of cores. To solve this problem. E10000 uses point-to-point links instead.  Uses a separate bus for sending out-of-order data response messages. IntroductionSnoopingDirectoryConclusion

27 IntroductionSnoopingDirectoryConclusion

28  Benchmark suite: Splash-2  Benchmark application: Gem5, SE mode  Hardware: four CPUs. Each CPU has private L1 cache of 32KB with associativity 4. Default cache line size is 64 bytes which we configure for our experiment. IntroductionSnoopingDirectoryConclusion L1 Block Size (bytes) Write-Back/ Memory References 1611214 3212350 6412672 12813001 Write backs L1 cash size (KB) Write backs L1 block size (bytes)

29 IntroductionSnoopingDirectoryConclusion

30 IntroductionSnoopingDirectoryConclusion

31 Benchmark suite: Splash-2 Benchmark applications: Barnes-Hut, LU, OCEAN, Radiosity, Radix, Ray Trace Protocols: MESI and MSI Hardware: ? IntroductionSnoopingDirectoryConclusion

32 Protocols: MSI and MESI, MOSI, MOESI IntroductionSnoopingDirectoryConclusion Hardware Splash-2 inputs and applications

33  [1] - Daniel J. S. Mark D. H. David A. W., “A Primer on Memory Consistency and Cache Coherence,” Morgan Claypool Publishers, 2011.  [2] – Suleman, Linda Bigelow Veynu Narasiman Aater. "An Evaluation of Snoop- Based Cache Coherence Protocols."  [3] – Tiwari, Anoop. Performance comparison of cache coherence protocol on multi-core architecture. Diss. 2014.  [4] – Chang, Mu-Tien, Shih-Lien Lu, and Bruce Jacob. "Impact of Cache Coherence Protocols on the Power Consumption of STT-RAM-Based LLC."  [5] – CMU 15-418: Parallel Architecture and Programming. Lecture Series. Spring 2012 IntroductionSnoopingDirectoryConclusion

34 IntroductionSnoopingDirectoryConclusion


Download ppt "IntroductionSnoopingDirectoryConclusion IntroductionSnoopingDirectoryConclusion Memory 1A 2B 3C 4D 5E Cache 1 1A 2B 3C Cache 2 3C 4D 5E Cache 4 1A 2B."

Similar presentations


Ads by Google