Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reuse-based Online Models for Caches

Similar presentations


Presentation on theme: "Reuse-based Online Models for Caches"— Presentation transcript:

1 Reuse-based Online Models for Caches
Rathijit SeN David A. Wood ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013

2 The Problem Caches: power vs performance Reconfigurable caches
e.g., IvyBridge The Problem: Which configuration to select? e.g., to get the best energy-efficiency? Core LLC Miss Fetch DRAM ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013

3 Cache Performance Prediction
We propose a framework h = (r · B) · φ h: hit ratio r: reuse-distance distribution (novel hardware support) B: stochastic Binomial matrix φ: hit function (LRU, PLRU, RANDOM, NMRU) Case study: Energy-Delay Product (EDP) within 7% of minimum ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013

4 Agenda The Problem Framework Hardware support Case Study Locality (r)
Matrix transformations (B) Hit functions (φ) h = (r · B) · φ Hardware support Case Study ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013

5 Cache Overview Limited storage Address N Miss Y Hit Associativity (A)
Sets of (usually 64-byte) blocks #blocks/set = associativity (#ways) Set Index + Address tags identify data Address N Tag Match? Miss Y Hit Associativity (A) b Sets (S) ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013

6 Workload Variation Last-Level Cache (LLC) swim mgrid zeus apache oltp
jbb equake, gafort, wupwise fma3d ammp, blackscholes, bodytrack, fluidanimate, freqmine, swaptions ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013

7 Bad configurations hurt!
Maximum EDP (energy-delay product) Minimum 218% worse 27% worse ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013

8 Problem Summary Reconfigurable caches Multiple replacement policies
Goal: Online miss-ratio prediction Associativity (A) b Sets (S) ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013

9 Indexing Assumption Mapping of unique addresses to cache sets
Assumption: independent, uniform [Smith, 1978] Unique accesses as Bernoulli trials (Partial) Hashing POWER4, POWER5, POWER6, Xeon Simple XOR-based function [similar to Cypher, 2008] ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013

10 Agenda  The Problem Framework Hardware support Case Study
Locality (r) Matrix transformations (B) Hit functions (φ) h = (r · B) · φ Hardware support Case Study ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013

11 Temporal Locality Metrics
■ ■ ■ ■ … ■ ■ i P(URD=i) r Unique Reuse Distance (URD) #unique intervening addresses x y z z y x : URD(x)=2 Stack Distance [Mattson, 1970] – 1 Large cache  large distances to track Absolute Reuse Distance (ARD) #intervening addresses x y z z y x : ARD(x)=4 Size? ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013

12 Per-set Locality, r(S) r(S) is “compressed” as S (#sets) increases
■ ■ ■ ■ … ■ ■ i P(URD=i) r r(S) is “compressed” as S (#sets) increases Less of the tail is important x                   #sets: S #sets: S > S ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013

13 Agenda   The Problem Framework Hardware support Case Study
Locality (r) Matrix transformations (B) Hit functions (φ) h = (r · B) · φ Hardware support Case Study ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013

14 Estimating per-set locality
Generalized stochastic Binomial matrices [Strum, 1977] r(S) = r(1) · B(1 – 1/S, 1/S) Composition: r(S) = r(S) · B(1 – S/S, S/S) B 1 i   ■ ■ ■ ■ ■ ■ ■ ■ r    i          P(k successes in i trials) i.e., P(k of i to the same set) P(URD=i)                      k ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013

15 Computation reuse & speedup
■ ■ ■ ■ … ■ ■ i P(URD=i) r “Shorter” tail  smaller matrices Poisson Approximation r(214) r(214) r(213) r(213) Size? r(212) r(212) r(1) r(1) r(210) r(211) r(211) Now: compute Later: hardware support r(210) ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013

16 Size of r(210)? Prediction with r(210) limited to URD < n i r
■ ■ ■ ■ … ■ ■ i P(URD=i) r Prediction with r(210) limited to URD < n ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013

17 Agenda    The Problem Framework Hardware support Case Study
Locality (r) Matrix transformations (B) Hit functions (φ) h = (r · B) · φ Hardware support Case Study ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013

18 Hit Function, φ φ0 = 1 φk ≤ φk-1 φ = 0 φk: P(x will hit|URD(x)=k)
       x φk: P(x will hit|URD(x)=k) Monotonically decreasing model Intuition: larger URD  same or larger eviction probability Not x φ0 = 1 φk ≤ φk-1 φ = 0 ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013

19 Hit Function, φ Example: A=8 6/20/2013
ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013

20 Formulating φ φ(LRU): step-function φ(PLRU): φ(RANDOM):
(r · B) · φ(LRU)  [Smith, 1978], [Hill & Smith, 1989] φ(PLRU): Assumes on average, traffic evenly divided between subtrees φ(RANDOM): Estimates #intervening misses using ARD φ(NMRU): similar to φ(RANDOM) except φ1=1 ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013

21 Agenda     The Problem Framework Hardware support Case Study
Locality (r) Matrix transformations (B) Hit functions (φ) h = (r · B) · φ Hardware support Case Study ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013

22 Prediction Accuracy LRU, PLRU(A=2), NMRU(A=2): exact per-set model
Others: approximate per-set model ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013

23 Overheads r = r · B : 6  80 μsec h = (r · B) · φ : 20  30 μsec
Binomial  Poisson approximation for each row of B h = (r · B) · φ : 20  30 μsec Average over 24 configurations B applied 8 times ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013

24 Agenda      The Problem Framework Hardware support Case Study
Locality (r) Matrix transformations (B) Hit functions (φ) h = (r · B) · φ Hardware support Case Study ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013

25 Computation reuse & speedup
■ ■ ■ ■ … ■ ■ i P(URD=i) r “Shorter” tail  smaller matrices Poisson Approximation r(214) r(214) r(213) r(213) Size=512 r(212) r(212) r(1) r(1) r(210) r(211) r(211) Now: compute Later: hardware support Now r(210) ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013

26 Insights Unique “remember” addresses
x y z z y x : URD(x)=2 Unique “remember” addresses Only cardinality, not full addresses Bloom filter for compact (approximate) representation r(210) is seen by any set of a cache with S=210 Filter address stream ■ ■ ■ ■ … ■ ■ i P(URD=i) r ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013

27 Hardware Support for estimating r(210)
Start Sample Reference address register access insert Set Filter Control Logic filtered access load hit inc reset read 1024-bit Bloom Filter 2 hash fns 9-bit Counter 512-entry Histogram array Y Addr match? N Unique? Y (not hit) Remember End Sample ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013

28 Agenda      The Problem Framework Hardware support Case Study
Locality (r) Matrix transformations (B) Hit functions (φ) h = (r · B) · φ Hardware support Case Study + way counters ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013

29 LRU Way Counters [Suh, et al. 2002]
One counter per logical way (stack position) Determining logical position is hard not totally (re-)ordered with every access heuristics, e.g., for PLRU [Kedzierski, et al. 2010] Other Limitations Inclusion property Fixed #sets S = S : special case of reuse framework S  S ? Use B provided, enough tail of r(S) is available ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013

30 Min. EDP configuration EDP within 7% of minimum
Reuse models outperform PLRU way counters in most cases ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013

31 Summary The Problem: Online miss-rate estimation for reconfigurable caches We propose a framework h = (r · B) · φ h: hit-ratio r: reuse-distance distribution (novel hardware support) B: stochastic Binomial matrix φ: hit function (LRU, PLRU, RANDOM, NMRU) Case study: EDP within 7% of minimum Future work: More policies, applications/case studies ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013

32 Also in the paper r: lossy summarization of the address trace
Estimation for ARD Optimizations for LRU Conditions for PLRU eviction More details on models & evaluation ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013

33 Reuse-based Online Models for Caches
Questions? ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013

34 Example LLC performance
OLTP (TPC-C + IBM DB2) ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013

35 Estimating cache performance
Hit ratio = hits/access  ∑ P(URD=i) · P(hit|URD=i) = · Miss ratio = misses/access = 1 – hit ratio Miss rate = misses/instruction = miss ratio x access/instruction i ■ ■ ■ ■ … ■ ■ i P(URD=i) r  …  i P(hit|URD=i) φ ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013

36 URD vs ARD {z0}* {z0,z1}* {z0,z1,z2}* {z0,z1,z2,...,zk-1}* x x z0 z1
dk dk = dk-1 +1/ri k Approximation: ACM SIGMETRICS CMU, Pittsburgh, PA 6/20/2013


Download ppt "Reuse-based Online Models for Caches"

Similar presentations


Ads by Google