Presentation is loading. Please wait.

Presentation is loading. Please wait.

Coherence Decoupling: Making Use of Incoherence J. Huh, J. Chang, D. Burger, G. Sohi ASPLOS 2004.

Similar presentations


Presentation on theme: "Coherence Decoupling: Making Use of Incoherence J. Huh, J. Chang, D. Burger, G. Sohi ASPLOS 2004."— Presentation transcript:

1 Coherence Decoupling: Making Use of Incoherence J. Huh, J. Chang, D. Burger, G. Sohi ASPLOS 2004

2 Motivation Multi-threading and Multi-processing have become common When a cache line is marked as invalid very often not all data in the line is incorrect If the data in invalid lines can be used speculatively there is a great potential for performance improvement

3 Background Cache Coherence Protocol Used in shared-memory multiprocessors for managing correct data sharing Vital to the design of multiprocessors since it contributes the most to inter-processor communication latency

4 Proposed Idea Separate the traditional cache coherence protocol into two parts –Speculative cache lookup (SCL) – uses a speculative value from an invalid cache line thus allowing the processor to work continuously –Safe coherence protocol – obtains the correct value which is then compared with the value provided by SCL

5 Coherence Decoupling

6 Related Work Customized Coherence Protocols Speculative Coherence Operations Dynamic self-invalidation, coherence message predictor, token coherence etc. Speculation on outcome of events in multi-processor execution

7 Coherence Decoupling Architecture Must support the following: 1.Split - means to split a memory op into speculative load and a coherence operation 2.Compute - mechanisms to support execution with speculative values 3.Recover – means to recover and rollback upon misprediction

8 SCL Protocols for Coherence Decoupling Use a simple safe coherence protocol and rely on an aggressive SCL protocol to increase performance Two components of an SCL protocol –Read component – obtains the speculative value –Update component – updates an invalid cache line so subsequent speculative reads can use it (can be left out in some SCL protocols)

9 Read vs Update components SCL protocol with only a read component can be used if the word in an invalid block has: –Not changed remotely (false sharing) –Changed remotely to a same value (silent stores) –Changed remotely to a different value and then back to the original value (temporally silent stores) For truly-shared data an update component needs to be added –Speculatively sends data around the system by writing it into invalid cache lines

10 SCL protocol Read component CD - Use the locally cached incoherent value for every L2 miss Simple but since it is triggered on every load operation it could produce many mis- speculations CD-F - Add a PC-indexed confidence predictor to filter speculations Reduces the number of (mis)speculative reads thus improving the average accuracy

11 SCL protocol Update component CD-IA Use invalidation piggyback to update all invalid blocks CD-C Use invalidation piggyback if the value is compressed

12 SCL protocol Update component (Ctd.) CD-N - Update all sharers after N writes to a block Increases the number of messages (bandwidth) CD-W - Update on every write if any sharers exist CD assumed wherever Write update is being used

13 Methodology Simulator MP-Sauce & SimpleScalar 16-node SMP systems simulated Coherence protocol used – simple invalidation snooping-bus protocol 3 commercial applications and 5 scientific shared memory SPLASH2 suite benchmarks simulated

14 Results - Microbenchmarks Simple-fs – loads falsely shared data and then executes (in)dependent instructions Critical-fs – forces data dependence between two loads by placing consecutive false sharing misses in critical path

15 L2 Miss Profiling Results

16 Coherence Decoupling Accuracy Results CD, CD-F, CD-IA, CD-C, CD-N, CD-W

17 Timing Results

18 Bandwidth Requirements

19 Latency Tolerance Profiles Executed instructions during coherence decoupling The number of control dependent instructions will grow in future processors

20 Conclusions Coherence Misses – significant fraction of L2 misses ranging from 10% to 80% Coherence Decoupling has the potential to hide the miss latency for 40% to 90% of coherence misses Mis-speculation occurs 20% of the time


Download ppt "Coherence Decoupling: Making Use of Incoherence J. Huh, J. Chang, D. Burger, G. Sohi ASPLOS 2004."

Similar presentations


Ads by Google