Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing Liquin Cheng, John B. Carter and Donglai Dai cs.utah.edu by Evangelos Vlachos.

Similar presentations


Presentation on theme: "An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing Liquin Cheng, John B. Carter and Donglai Dai cs.utah.edu by Evangelos Vlachos."— Presentation transcript:

1 An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing Liquin Cheng, John B. Carter and Donglai Dai cs.utah.edu by Evangelos Vlachos

2 2 Introduction Shared Memory multiprocessors – Enterprise servers, Top500 supercomputers Shared Memory paradigm – Producer – Consumer relationships – Suffer from Remote misses Solutions – high performance interconnects, – sophisticated latency hiding techniques – effective caching/coherence mechanisms

3 3 Motivation Update vs. invalidate protocols – too much coherent traffic Adaptive protocols to optimize for migratory sharing – Identify dynamic sharing during execution – Adapt the coherence protocol

4 4 The Problem – 3 hops latency

5 5 Basic Idea (1/2) Directory delegation – Identify shared blocks – Producer node becomes the Home-node – Consumers send requests directly to producer Decrease latency – Each read/write access completes after 2 hops

6 6 Basic Idea (2/2) The producer node can identify sharers – Sharer nodes stored in the directory – Speculate that new data will be requested – Forward new data to sharers Similar to… – prefetching – last write prediction

7 7 Architecture

8 8 RAC == Remote Access Cache In the past – Eliminate Remote misses caused by small & low associative caches – Not a problem today In this work – A location to push data at a remote node – Location to store delegated blocks – Victim cache (for remote misses) as before

9 9 Sharing Pattern Detection Track access history only for frequently used blocks – Directory entries reside in the directory cache Keep saturating counter per directory entry – last_writer id  4 bits – reader_count  2 bits – write_repeat  2 bits

10 10 Producer/Consumer Tables Maintain state for blocks that don’t reside at home node – Producer table: current node serves as a producer for some cache blocks – Consumer table: current node is interested in some blocks found in a corresponding producer node

11 11 Delegate - Undelegate

12 12 One step further – Speculative updates Eliminate remote misses – Maintain sharers list after invalidation – Forward new data to sharers – Downgrade local state to SHARED Need to choose carefully what data to forward – Don’t want to change cpu core – Delayed Intervention

13 13 Delayed Intervention

14 14 Evaluation

15 15 Benchmarks

16 16 Results

17 17 Results

18 18 Results

19 19 Conclusions Adapting mechanisms to improve producer- consumer relationships Eliminate remote misses Directory delegation & speculative updates Minor hardware cost – 32 entry delegate cache & 32KB RAC Exec time ↓13%, remote misses ↓29%, network traffic ↓17% – 1K-entry delegate cache & 1MB RAC Exec time ↓21%, remote misses ↓40%, network traffic ↓15%


Download ppt "An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing Liquin Cheng, John B. Carter and Donglai Dai cs.utah.edu by Evangelos Vlachos."

Similar presentations


Ads by Google