Presentation is loading. Please wait.

Presentation is loading. Please wait.

Architecture and Design of AlphaServer GS320 Kourosh Gharachorloo, Madhu Sharma, Simon Steely, and Stephen Van Doren ASPLOS’2000 Presented By: Alok Garg.

Similar presentations


Presentation on theme: "Architecture and Design of AlphaServer GS320 Kourosh Gharachorloo, Madhu Sharma, Simon Steely, and Stephen Van Doren ASPLOS’2000 Presented By: Alok Garg."— Presentation transcript:

1 Architecture and Design of AlphaServer GS320 Kourosh Gharachorloo, Madhu Sharma, Simon Steely, and Stephen Van Doren ASPLOS’2000 Presented By: Alok Garg

2 Motivations Coherence Protocol –Bandwidth limitations of snoopy-based protocol –Inefficiencies in directory protocol –Correctness issues related to rare protocol races Implementation of Consistency models –Burdens the common transaction flow

3 Paper Contributions Exploiting network ordering to simplify cache coherence protocol Solutions to decrease network occupancy Elegant solution for deadlock, livelock, starvation, and fairness problems Techniques for efficiently supporting memory ordering

4 Overview Architecture Overview MOESI Cache Coherence Protocol GS320 optimized Cache Coherence Protocol Alpha Consistency Model Consistency Model Implementation Performance

5 Architecture Overview

6 Block Diagram 8x8 Global Crossbar Switch QBB QBB – Quad-Processor Building Block 1.6 GB/s

7 Quad-Processor Building Block (QBB) 10-Port Local Crossbar Switch 1.6 GB/s 3.2 GB/s PL2 P P P SDRAM Memory 8GB, 64-bit 200 MHz 64-entry Cache I/O PCI: 4 PCI Bus 64-bit, 33 MHz Global Port DTAG DIR TTT Arbitration Point 32 Alpha 21264 Duplicate Tag Store Transactions In Transit Buffer

8 The Directory Owner = 0S0S1S2S3S4S5S6S7 14-bit per 64 Byte Memory Line 6-bit Forward QBB0 DTAG QBB3 P0P1P2P3 Invalidate

9 Crossbar Switch Network bi-section Bandwidth: Global Switch (8x8): 12.8 GB/s Local Switch (10-port): 6.4 GB/s

10 MOESI Cache Coherence Protocol

11 MOESI - Directory States –Invalid (I) : –Shared (S) : Valid, (potentially) shared, clean –Exclusive (E) : Valid, exclusive, clean –Modified (M) : Valid, exclusive, (potentially) dirty –Owner (O) : Valid, (potentially) shared, clean Responsible for supplying Data instead of memory (potentially) Request Messages –Read (Rd) : Data needed in shard state (S/E) –Read Exclusive (RE) : Data needed in Modified State (M) –Exclusive (Ex) : Data needed in Modified State (M) Home node – Original owner of data (directory)

12 MOESI Read H/D N3/I N4/IN5/I N2/M N1/I N2/O Rd Forward Marker Reply N5/S

13 MOESI Read-Exclusive H/D N3/I N4/IN5/S N2/ON1/I RE Forward Marker Invalidate N5/I N2/I Ack Reply N3/E

14 GS320 Optimized Cache Coherence Protocol Dirty Sharing No negative acknowledgment –3 Deadlock Conditions due to races

15 Late Request Race Condition H/D N3/I N4/IN5/I N2/M N1/I Rd Forward Marker N2/X Write Back Ack Reply Write Buffer N5/S DEADLOCK ?

16 Early Request Race Condition H/D N3/M N4/IN5/I N2/IN1/I RE H/D Forward Rd Marker Forward H/D N3/I Reply N2/MN2/O Reply N5/S Marker DEADLOCK ?

17 Crossbar Network Q0 Queue: Request to Home Node – (point to point order) Q1 Queue: Forward, Replies and Invalidations from Home Node – (global order) Q2 Queue: Data Replies from Owner to Requester Node

18 Total Ordering on Q1!! P1 A (O) Cache Q1 Inbound Queue P2 B (O) Cache Q1 Inbound Queue HA Q1 Outbound Queue HB Q1 Outbound Queue Crossbar Switch A (P1)B (P2) RE1(B)RE2(A) A (P2)B (P1) P1 – RE2(A)P2 – RE1(B) RE4(A)RE3(B) A (X)B (Y) P1 – RE3(B)P2 – RE4(A) P1 – RE3(B)P2 – RE4(A) P1 – RE2(A)P2 – RE1(B) DEADLOCK ?

19 Desirable Characteristics Dirty sharing - efficient for migratory accesses All directory changes are instant. Needs just single access to home node and directory Eliminate livelock, starvation, and fairness problems Writes can start as soon as Exclusive request is issued

20 Alpha Consistency Model MB: Memory Barrier LOAD STORE LOAD STORE LOAD Oldest Memory Operation Program Order LOAD STORE LOAD STORE LOAD Atomicity is not violated: Read others write early

21 Consistency Model Implementation Barrier Performance (Commit Event) –Early acknowledge of Invalidates –Early acknowledge of Forwards of (Exclusive, Read Exclusive and Read Requests) Overall Performance –Relax total order condition on Q1 at commit points. Let replies (Q1->Q2) bypass forwards (Q1), and invalidations (Q1)

22 Early Acknowledgement of Invalidation Request P1 A = 0 B = 0 Cache Q1 Inbound Queue P2 A = 0 Cache Q1 Inbound Queue Crossbar Switch SC A = 1; B = 1; SC u = B; v = A; u? v? u = 1 v = 0 SC A = 1; B = 1; EX INVAL A A = 1 SC A = 1; B = 1; B = 1 SC u = B; v = A; u? v? u = 1 v = 0 Rd Marker P1 B = 1 Not a Race Condition B = 1 Commit Races MB 1.Optimize memory barrier at P1 for write to write/read ordering 2.Commit events in Q1 queue for ordering purposes in case of replies 3.Sufficient condition: Commit events not to bypass invalidates 4.Memory Barrier at P2 wait for all the commits before going ahead INV Commit Commit pt INV Ack

23 Commit Points 8x8 Global Crossbar Switch QBB DTAG DIR TTT Commit Point

24 Early Acknowledge of Forwards P1 A = 0 B = 0 Cache Q1 Inbound Queue P2 A = 0 Cache Q1 Inbound Queue Crossbar Switch A = 1; MB B = 1; u = B; MB v = A; u? v? u = 1 v = 0 Commit pt u = B; MB v = A; u? v? u = 1 v = 0 Read BCommit/RdB u = B; MB v = A; = 0 u? v? u = ? v = 0 Fwd Ack A = 1; MB B = 1; INVAL A Commit/INV A bypass Sufficient condition: Commit events not to bypass invalidates, reads and read-exclusive forwards A = 1 A = 1; MB B = 1;

25 Optimization Summary Dirty Sharing – Reduces home node traffic No negative acknowledgements –Reduces network traffic (Home Node) –Simple implementation of directory –Removes livelock, starvation, and fairness problems –Network total ordering avoid deadlocks –Write optimization Bypass of replies in Q1 queue –Improve overall performance Improves barrier performance –Early invalidation acknowledgements –Early Forward responses (Rd, RE, EX) –Memory ordering based on commit events

26 Performance

27 DOUBTS?


Download ppt "Architecture and Design of AlphaServer GS320 Kourosh Gharachorloo, Madhu Sharma, Simon Steely, and Stephen Van Doren ASPLOS’2000 Presented By: Alok Garg."

Similar presentations


Ads by Google