Presentation is loading. Please wait.

Presentation is loading. Please wait.

MESI Protocol.

Similar presentations


Presentation on theme: "MESI Protocol."— Presentation transcript:

1 MESI Protocol

2 Multi-processor System
A memory system is coherent if If P1 writes to address X, and later on P2 reads X, and there are no other writes to X in between  P2’s read returns the value written by P1’s write Writes to the same location are serialized: two writes to location X are seen in the same order by all processors Processor 1 L1 cache Processor 2 L2 cache (shared) Memory

3 MESI Protocol Each cache line can be in one of 4 states
Invalid – Line’s data is not valid Shared – Line is valid and not dirty, copies may exist in other processors Exclusive – Line is valid and not dirty, other processors do not have the line in their local caches Modified – Line is valid and dirty, other processors do not have the line in their local caches

4 Multi-processor System: Example
P1 reads 1000 P1 writes 1000 Processor 1 L1 cache Processor 2 L2 cache (shared) Memory [1000] [1000]: 6 miss M E [1000]: 5 00 10 miss [1000]: 5 [1000]: 5

5 Multi-processor System: Example
P1 reads 1000 P1 writes 1000 P2 reads 1000 L2 snoops 1000 P1 writes back 1000 P2 gets 1000 Processor 1 L1 cache Processor 2 L2 cache (shared) Memory [1000] [1000]: 6 [1000]: 6 S M miss S [1000]: 6 [1000]: 5 11 10 [1000]: 5

6 Multi-processor System: Example
P1 reads 1000 P1 writes 1000 P2 reads 1000 L2 snoops 1000 P1 writes back 1000 P2 gets 1000 Processor 1 L1 cache Processor 2 L2 cache (shared) Memory [1000]: 6 [1000]: 6 [1000]: 6 [1000] I M S E S [1000] [1000]: 6 01 10 11 [1000]: 5 P2 requests for ownership with write intent

7 Core Valid Bits and Inclusion
L2 keeps track of the presence of each line in each of the Core’s L1 caches Determine if it needs to send a snoop to a processor Determine in what state to provide a requested line (S,E) Maintain Core Valid Bits (CVB) per cache line Need to guarantee that the L1 caches in each Core are inclusive of the L2 cache When L2 evicts a line L2 sends a snoop invalidate to all processors that have it If the line is modified in the L1 cache of one of the processors (in which case it exist only in that processor) The processor responds by sending the updated value to L2 When the line is evicted from L2, the updated value gets written to memory

8 Copies may exist in other processors
MESI Protocol States State Valid Modified Copies may exist in other processors Invalid No N.A. N.A Shared Yes Exclusive A modified line must be exclusive Otherwise, another processor which has the line will be using stale data Therefore, before modifying a line, a processor must request ownership of the line

9 MESI Protocol Example A four-processor shared-memory system implements MESI protocol For the following sequence of memory references, show the state of the line containing the variable X in each processor’s cache after each reference is resolved Each processors start out with the line containing X invalid in their cache P0’s state P1’s state P2’s state P3’s state CVBs Initial State I 0000 P0 reads X P1 reads X P2 reads X P3 writes X E I 1000 S I 1100 S I 1110 I M 0001 S I 1001

10 MESI Question 2 Dual Core processor MESI
Each core has an L1 cache – L2 cache is shared L1 can send the following packets to L2 Read Address (A) In case of L1 miss on address A RFO (A) Data (A) L2 can send the following packets Read Address (A) to memory Data (A) to L1 including MESI state RFO (A) – after a RFO from L1 Snoop (A) to L1 Memory can send the following packets Data (A) to L2

11 MESI question 2 Messages times: between L1 and L2: 10ns
Between L2 and memory: 100ns Caches are empty upon reset Series of requests to address A P1’s L1 state P2’s L1 state P3’s L1 state L2 CVBs P1 P2 P3 Time Total message time Initial State I P2 reads A E 1 220 L1 miss/req to L2, L2 Miss/req to Mem, Mem data to L2, L2 data to L1 P2 writes A M P1 reads A S 40 P1: miss/req to L2, P2: L2L1 snoop, P2: L1 to L2 data, P1:L2 to L1 data P3 reads A 20 P3: L1 miss/ Req to L2, L2 to P3 data P3 writes A 30 P3: L1 to L2 RFO, L2 snoop P1+P2, L2 to P3 RFO granted

12 Read For Ownership (RFO)
RFO Request: A signal from private to shared cache (i.e. L1->L2) requesting cache line exclusivity for write intent MLC/LLC invalidates cache line in other L1s MLC/LLC responds to L1 that RFO has been granted L1 can now modify cache line Read M: What happens from L3 victimization until M reaches the core Reminder Write back: We write only to the needed cache Update the level when the entry is victimized Add a dirty bit to each entry

13 Global Observation (GO)
Assume L1 (private) requests line from L2 (shared) Global Observation (GO): Before sending the actual data to L1, L2 responds to L1 that line is observed to be in it by all other processors The Global Observation carries the MESI state E/S So each L1L2 line request is answered in 2 steps: GO Fill (actual data) Read M: What happens from L3 victimization until M reaches the core Reminder Write back: We write only to the needed cache Update the level when the entry is victimized Add a dirty bit to each entry

14 Ring Interconnect

15 Ring 2 x 4 rings Req / Data / Ack / Snoop
Packets always use the shortest path Static Even/Odd polarity per station Each ring switches polarity on each cycle Graphics Core 2 LLC 2 Core 3 LLC 3 Core 0 LLC 0 Core 1 LLC 1 System Agent Display DMI PCI Express* IMC

16 Ring In our example: 64B cache line (L1/L2/LLC) 32B/cycle bus BW
Cache line data transfers from LLC to L1 in two strokes Graphics Core 2 LLC 2 Core 3 LLC 3 Core 0 LLC 0 Core 1 LLC 1 System Agent Display DMI PCI Express* IMC

17 iMPH NCU DDR IO MC LLC0 L2 LLC1 L2 L2 Hit Hit
Core 0, Core 1 and the GFX issue data read requests The GFX request was not sent to the ring in the current cycle, since the distance to the destination is odd (1 cycle) and it must arrive their on an Even cycle LLC sends GO (Global Observation) to both cores to acknowledge that both cores can get the data. The GO carries the MESI state: E/S. The GO and the data may be sent at different cycles: the GO is the outcome of a tag-hit, while the data comes from the data array. LLC sends the second chunk to each one of the cores LLC sends the first chunk (1/2 cache line) to each one of the cores Both Core’s requests get a hit in the LLC, and the CVBs (core valid bits) indicate that no snoop is needed IO System Agent iMPH MC NCU PEG/DMI DDR LLC0 Core 0 L2 Hit GO 1 No Snoop DRd 01 O O D1 1 D2 1 Request Core 1 L2 e LLC1 GO G Hit GO 0 D1 G No Snoop Global observation D2 G DRd 10 D1 0 D2 0 Data GFX L2 DRd G1 GO G O O Cycle 13: up ring E, down ring O Cycle 15: up ring E, down ring O Cycle 1: up ring E, down ring O Cycle 14: up ring O, down ring E Cycle 10: up ring 0, down ring E Cycle 4: up ring O, down ring E Cycle 8: up ring O, down ring E Cycle 3: up ring E, down ring O Cycle 2: up ring O, down ring E Cycle 5: up ring E, down ring O Cycle 7: up ring E, down ring O Cycle 11: up ring E, down ring O Cycle 9: up ring E, down ring O Cycle 12: up ring O, down ring E Cycle 6: up ring O, down ring E


Download ppt "MESI Protocol."

Similar presentations


Ads by Google