Presentation is loading. Please wait.

Presentation is loading. Please wait.

Uncorq: Unconstrained Snoop Request Delivery in Embedded-Ring Multiprocessors Karin StraussAMD Advanced Architecture and Technology.

Similar presentations


Presentation on theme: "Uncorq: Unconstrained Snoop Request Delivery in Embedded-Ring Multiprocessors Karin StraussAMD Advanced Architecture and Technology."— Presentation transcript:

1 Uncorq: Unconstrained Snoop Request Delivery in Embedded-Ring Multiprocessors http://iacoma.cs.uiuc.edu Karin StraussAMD Advanced Architecture and Technology Lab Xiaowei ShenIBM Research Josep TorrellasUniversity of Illinois at Urbana-Champaign

2 Karin Strauss - “Uncorq”2 Motivation CMPs are ubiquitous Shared memory + caches = cache coherence directory-based: indirection, storage shared bus-based: electrical, layout issues Traditional cache coherence solutions

3 Karin Strauss - “Uncorq”3 Embedded-ring cache coherence [ISCA 2006] logical ring is embedded in network control messages use ring Simple and inexpensive to implement Novel snoopy cache coherence for mid-sized machines data messages use any path  Snoop requests can have long latencies

4 Karin Strauss - “Uncorq”4 Contributions Propose invariant for transaction serialization Propose performance enhancements reduces cache-to-cache transfer latency Uncorq: unconstrained snoop request delivery Simple hardware data prefetching technique reduces memory-to-cache transfer latency

5 Karin Strauss - “Uncorq”5 logical ring Embedded-ring terminology snoop request snoop response snoop request + response data Types of messages: + request response snoop op. outcome positive snoop op. outcome + positive response data AB control messages Snoopy, invalidate protocol response request Single supplier protocol request

6 Ordering invariant

7 Karin Strauss - “Uncorq”7 Transaction serialization inv ack read data S MSI S AB SIS time inv I old valuenew value

8 Karin Strauss - “Uncorq”8 Serialization enforcement with embedded-ring Logical unidirectional ring provides partial ordering Distributed algorithm establishes global order for same-address transactions one is declared the “winner” (first to reach supplier) others have to retry On simultaneous transactions to same address: A request response

9 Karin Strauss - “Uncorq”9 How to serialize transactions A B S A’s request and response B’s request and response No clear “first” transaction B’s request reaches S first A receives B’s positive response before its own A retries: B  A Ring guarantees responses are forwarded in the order S performed snoop operations + +

10 Karin Strauss - “Uncorq”10 Enforcing transaction serialization Ordering Invariant: the order in which responses travel the ring after leaving the supplier must be the same as the order in which the supplier processed their corresponding requests. Node whose request arrives at supplier node first is the “winner” What we need to enforce transaction serialization: loser node sees other node’s positive response before its own + S + request response

11 Uncorq: Unconstrained snoop request delivery

12 Karin Strauss - “Uncorq”12 Uncorq idea Baseline request response Idea: requests do not have to follow the ring (but responses do) Uncorq

13 Karin Strauss - “Uncorq”13 Benefit of Uncorq Baseline Reduced cache-to-cache transfer latency Uncorq savings request snoopdata request reaches supplier node time

14 Karin Strauss - “Uncorq”14 Implications of Uncorq Uncorq no longer restricts order of requests Nodes may receive and process requests in any order Responses may also get reordered Problem: distributed algorithm relies on the fact that response order reflects order of requests at supplier

15 Karin Strauss - “Uncorq”15 Example: incorrect transaction ordering A B S S + + + S Ordering invariant A node cannot forward any other response if it has an outstanding positive snoop outcome request response

16 Karin Strauss - “Uncorq”16 How Uncorq stalls responses + + addr C requests responses + Local transaction table (per-node structure) AB…C records messages that node is currently processing + request response

17 Karin Strauss - “Uncorq”17 Optimization: prefetching from memory Predict when no node will supply data Access memory in parallel with ring snoop R memory (1) (2) R memory (1) Goal: reduce latency of memory-to-cache transfers unoptimized optimized

18 Evaluation

19 Karin Strauss - “Uncorq”19 64 nodes in a single CMP Experimental setup SESC simulator (sesc.sourceforge.net) SPLASH-2, SPECjbb and SPECweb workloads Interconnection network: 2D torus with embedded-ring

20 Karin Strauss - “Uncorq”20 Cache-to-cache transfer latency substantial reduction in latency Uncorq 0 2 4 6 8 10 0100200300400500600 distribution (%) 0 20 40 60 80 100 cumulative distribution (%) cache-to-cache transfer latency Baseline 0 2 4 6 8 10 0100200300400500600 distribution (%) 0 20 40 60 80 100 cumulative distribution (%) cache-to-cache transfer latency

21 Karin Strauss - “Uncorq”21 Execution Time 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 SPLASH-2SPECjbbSPECweb normalized execution time Baseline Uncorq Uncorq+Pref  Uncorq + Pref performs the best (reduction: 13-26%) Uncorq significantly reduces execution time (reduction: 5-23%)

22 Karin Strauss - “Uncorq”22 Also in the paper Serialization mechanism for case with no supplier System and node forward progress Fences and memory consistency issues Characterization of prefetching mechanism Comparison against ccHyperTransport

23 Karin Strauss - “Uncorq”23 Conclusion Propose invariant for transaction serialization Propose performance enhancements Uncorq: unconstrained snoop request delivery Reduce execution time by 13-26% Simple hardware data prefetching technique

24 Uncorq: Unconstrained Snoop Request Delivery in Embedded-Ring Multiprocessors http://iacoma.cs.uiuc.edu Karin StraussAMD Advanced Architecture and Technology Lab Xiaowei ShenIBM Research Josep TorrellasUniversity of Illinois at Urbana-Champaign


Download ppt "Uncorq: Unconstrained Snoop Request Delivery in Embedded-Ring Multiprocessors Karin StraussAMD Advanced Architecture and Technology."

Similar presentations


Ads by Google