Coherence Decoupling: Making Use of Incoherence J. Huh, J. Chang, D. Burger, G. Sohi ASPLOS 2004.

Coherence Decoupling: Making Use of Incoherence J. Huh, J. Chang, D. Burger, G. Sohi ASPLOS 2004

Motivation Multi-threading and Multi-processing have become common When a cache line is marked as invalid very often not all data in the line is incorrect If the data in invalid lines can be used speculatively there is a great potential for performance improvement

Background Cache Coherence Protocol Used in shared-memory multiprocessors for managing correct data sharing Vital to the design of multiprocessors since it contributes the most to inter-processor communication latency

Proposed Idea Separate the traditional cache coherence protocol into two parts –Speculative cache lookup (SCL) – uses a speculative value from an invalid cache line thus allowing the processor to work continuously –Safe coherence protocol – obtains the correct value which is then compared with the value provided by SCL

Coherence Decoupling

Related Work Customized Coherence Protocols Speculative Coherence Operations Dynamic self-invalidation, coherence message predictor, token coherence etc. Speculation on outcome of events in multi-processor execution

Coherence Decoupling Architecture Must support the following: 1.Split - means to split a memory op into speculative load and a coherence operation 2.Compute - mechanisms to support execution with speculative values 3.Recover – means to recover and rollback upon misprediction

SCL Protocols for Coherence Decoupling Use a simple safe coherence protocol and rely on an aggressive SCL protocol to increase performance Two components of an SCL protocol –Read component – obtains the speculative value –Update component – updates an invalid cache line so subsequent speculative reads can use it (can be left out in some SCL protocols)

Read vs Update components SCL protocol with only a read component can be used if the word in an invalid block has: –Not changed remotely (false sharing) –Changed remotely to a same value (silent stores) –Changed remotely to a different value and then back to the original value (temporally silent stores) For truly-shared data an update component needs to be added –Speculatively sends data around the system by writing it into invalid cache lines

SCL protocol Read component CD - Use the locally cached incoherent value for every L2 miss Simple but since it is triggered on every load operation it could produce many mis- speculations CD-F - Add a PC-indexed confidence predictor to filter speculations Reduces the number of (mis)speculative reads thus improving the average accuracy

SCL protocol Update component CD-IA Use invalidation piggyback to update all invalid blocks CD-C Use invalidation piggyback if the value is compressed

SCL protocol Update component (Ctd.) CD-N - Update all sharers after N writes to a block Increases the number of messages (bandwidth) CD-W - Update on every write if any sharers exist CD assumed wherever Write update is being used

Methodology Simulator MP-Sauce & SimpleScalar 16-node SMP systems simulated Coherence protocol used – simple invalidation snooping-bus protocol 3 commercial applications and 5 scientific shared memory SPLASH2 suite benchmarks simulated

Results - Microbenchmarks Simple-fs – loads falsely shared data and then executes (in)dependent instructions Critical-fs – forces data dependence between two loads by placing consecutive false sharing misses in critical path

L2 Miss Profiling Results

Coherence Decoupling Accuracy Results CD, CD-F, CD-IA, CD-C, CD-N, CD-W

Timing Results

Bandwidth Requirements

Latency Tolerance Profiles Executed instructions during coherence decoupling The number of control dependent instructions will grow in future processors

Conclusions Coherence Misses – significant fraction of L2 misses ranging from 10% to 80% Coherence Decoupling has the potential to hide the miss latency for 40% to 90% of coherence misses Mis-speculation occurs 20% of the time

Coherence Decoupling: Making Use of Incoherence J. Huh, J. Chang, D. Burger, G. Sohi ASPLOS 2004.

Similar presentations

Presentation on theme: "Coherence Decoupling: Making Use of Incoherence J. Huh, J. Chang, D. Burger, G. Sohi ASPLOS 2004."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Coherence Decoupling: Making Use of Incoherence J. Huh, J. Chang, D. Burger, G. Sohi ASPLOS 2004.

Similar presentations

Presentation on theme: "Coherence Decoupling: Making Use of Incoherence J. Huh, J. Chang, D. Burger, G. Sohi ASPLOS 2004."— Presentation transcript:

Similar presentations

About project

Feedback