Presentation is loading. Please wait.

Presentation is loading. Please wait.

RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS CMU, Pittsburgh, PA 1.

Similar presentations


Presentation on theme: "RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS CMU, Pittsburgh, PA 1."— Presentation transcript:

1 RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 1

2 The Problem 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 2 Caches: power vs performance Reconfigurable caches  e.g., IvyBridge The Problem: Which configuration to select? e.g., to get the best energy-efficiency? Core LLC DRAM Miss Fetch

3 Cache Performance Prediction 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 3 We propose a framework h = (r · B) · φ  h: hit ratio  r: reuse-distance distribution (novel hardware support)  B: stochastic Binomial matrix  φ: hit function (LRU, PLRU, RANDOM, NMRU) Case study: Energy-Delay Product (EDP) within 7% of minimum

4 Agenda 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 4 The Problem Framework  Locality (r)  Matrix transformations (B)  Hit functions (φ)  h = (r · B) · φ Hardware support Case Study

5 Cache Overview 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 5 Limited storage  Sets of (usually 64-byte) blocks  #blocks/set = associativity (#ways)  Set Index + Address tags identify data bbbbbbbb bbbbbbbb bbbbbbbb bbbbbbbb Associativity (A) Sets (S) Address Tag Match? YHit Miss N

6 Last-Level Cache (LLC) Workload Variation swim 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 6 ammp, blackscholes, bodytrack, fluidanimate, freqmine, swaptions equake, gafort, wupwise apache mgrid zeus oltp jbb fma3d

7 Bad configurations hurt! 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 7 EDP (energy-delay product) 27% worse 218% worse Minimum Maximum

8 Problem Summary 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 8 Reconfigurable caches Multiple replacement policies Goal: Online miss-ratio prediction bbbbbbbb bbbbbbbb bbbbbbbb bbbbbbbb Associativity (A) Sets (S)

9 Indexing Assumption 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 9 Mapping of unique addresses to cache sets Assumption: independent, uniform [Smith, 1978] Unique accesses as Bernoulli trials (Partial) Hashing  POWER4, POWER5, POWER6, Xeon  Simple XOR-based function [similar to Cypher, 2008]

10 Agenda 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 10 The Problem Framework  Locality (r)  Matrix transformations (B)  Hit functions (φ)  h = (r · B) · φ Hardware support Case Study

11 Temporal Locality Metrics 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 11 Unique Reuse Distance (URD)  #unique intervening addresses  x y z z y x : URD(x)=2  Stack Distance [Mattson, 1970] – 1  Large cache  large distances to track Absolute Reuse Distance (ARD)  #intervening addresses  x y z z y x : ARD(x)=4 ■ ■ ■ ■ … ■ ■ i P(URD=i) r Size?

12 Per-set Locality, r(S) 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 12 r(S) is “compressed” as S (#sets) increases  Less of the tail is important ■ ■ ■ ■ … ■ ■ i P(URD=i) r x                  x x        x          #sets: S #sets: S > S

13 Agenda 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 13 The Problem Framework  Locality (r)  Matrix transformations (B)  Hit functions (φ)  h = (r · B) · φ Hardware support Case Study

14 Generalized stochastic Binomial matrices [Strum, 1977] r(S) = r(1) · B(1 – 1/S, 1/S) Composition: r(S) = r(S) · B(1 – S/S, S/S)                0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Estimating per-set locality 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 14                                                 ■ ■ ■ ■ i P(URD=i) k i r B P(k successes in i trials) i.e., P(k of i to the same set) 0 0 0 0 0 0 0 0 0 0 0 1

15 Computation reuse & speedup 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 15 “Shorter” tail  smaller matrices r(1) r(2 14 ) r(2 13 ) r(2 12 ) r(2 11 ) r(2 10 ) r(2 14 ) r(2 13 ) r(2 12 ) r(2 11 ) r(1) Now: compute Later: hardware support Size? Poisson Approximation  ■ ■ ■ ■ … ■ ■ i P(URD=i) r

16 Size of r(2 10 )? 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 16 Prediction with r(2 10 ) limited to URD < n ■ ■ ■ ■ … ■ ■ i P(URD=i) r

17 Agenda 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 17 The Problem Framework  Locality (r)  Matrix transformations (B)  Hit functions (φ)  h = (r · B) · φ Hardware support Case Study

18 Hit Function, φ 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 18 φ k : P(x will hit|URD(x)=k) Monotonically decreasing model  Intuition: larger URD  same or larger eviction probability φ 0 = 1 φ k ≤ φ k-1 φ = 0 x              Not x x ∞

19 Hit Function, φ 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 19 Example: A=8

20 Formulating φ 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 20 φ(LRU): step-function  (r · B) · φ(LRU)  [Smith, 1978], [Hill & Smith, 1989] φ(PLRU):  Assumes on average, traffic evenly divided between subtrees φ(RANDOM):  Estimates #intervening misses using ARD φ(NMRU): similar to φ(RANDOM) except φ 1 =1

21 Agenda 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 21 The Problem Framework  Locality (r)  Matrix transformations (B)  Hit functions (φ)  h = (r · B) · φ Hardware support Case Study

22 Prediction Accuracy 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 22 LRU, PLRU(A=2), NMRU(A=2): exact per-set model Others: approximate per-set model

23 Overheads 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 23 r = r · B : 6  80 μsec  Binomial  Poisson approximation for each row of B h = (r · B) · φ : 20  30 μsec  Average over 24 configurations  B applied 8 times

24 Agenda 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 24 The Problem Framework  Locality (r)  Matrix transformations (B)  Hit functions (φ)  h = (r · B) · φ Hardware support Case Study

25 Computation reuse & speedup 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 25 “Shorter” tail  smaller matrices r(1) r(2 14 ) r(2 13 ) r(2 12 ) r(2 11 ) r(2 10 ) r(2 14 ) r(2 13 ) r(2 12 ) r(2 11 ) r(1) Now: compute Later: hardware support Size=512 Poisson Approximation  ■ ■ ■ ■ … ■ ■ i P(URD=i) r Now

26 Insights 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 26 x y z z y x : URD(x)=2 Unique  “remember” addresses  Only cardinality, not full addresses Bloom filter for compact (approximate) representation r(2 10 ) is seen by any set of a cache with S=2 10  Filter address stream ■ ■ ■ ■ … ■ ■ i P(URD=i) r

27 Reference address register access insert Set Filter Control Logic filtered access load hit inc reset read 1024-bit Bloom Filter 2 hash fns 9-bit Counter inc 512-entry Histogram array Hardware Support for estimating r(2 10 ) 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 27 Start Sample Addr match? Unique? Remember End Sample N Y (not hit) Y

28 Agenda 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 28 The Problem Framework  Locality (r)  Matrix transformations (B)  Hit functions (φ)  h = (r · B) · φ Hardware support Case Study + way counters

29 LRU Way Counters [Suh, et al. 2002] 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 29 One counter per logical way (stack position) Determining logical position is hard  not totally (re-)ordered with every access  heuristics, e.g., for PLRU [Kedzierski, et al. 2010] Other Limitations  Inclusion property  Fixed #sets S = S : special case of reuse framework S  S ? Use B  provided, enough tail of r(S) is available

30 Min. EDP configuration 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 30 EDP within 7% of minimum Reuse models outperform PLRU way counters in most cases

31 Summary 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 31 The Problem: Online miss-rate estimation for reconfigurable caches We propose a framework h = (r · B) · φ  h: hit-ratio  r: reuse-distance distribution (novel hardware support)  B: stochastic Binomial matrix  φ: hit function (LRU, PLRU, RANDOM, NMRU) Case study: EDP within 7% of minimum Future work: More policies, applications/case studies

32 Also in the paper 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 32 r: lossy summarization of the address trace Estimation for ARD Optimizations for LRU Conditions for PLRU eviction More details on models & evaluation

33 Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 33 Questions?

34 Example LLC performance 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 34 OLTP (TPC-C + IBM DB2)

35 Estimating cache performance 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 35 Hit ratio = hits/access  ∑ P(URD=i) · P(hit|URD=i) = · Miss ratio = misses/access = 1 – hit ratio Miss rate = misses/instruction = miss ratio x access/instruction ■ ■ ■ ■ … ■ ■ i P(URD=i) r  …  i P(hit|URD=i) φ i

36 URD vs ARD 6/20/2013 ACM SIGMETRICS 2013 @ CMU, Pittsburgh, PA 36 xx z0z0 z1z1 z2z2 z3z3 z k-1 {z 0 }*{z 0,z 1 }*{z 0,z 1,z 2 }*{z 0,z 1,z 2,...,z k-1 }* d k = d k-1 +1/  r i k Approximation: ∞ dkdk


Download ppt "RATHIJIT SEN DAVID A. WOOD Reuse-based Online Models for Caches 6/20/2013 ACM SIGMETRICS CMU, Pittsburgh, PA 1."

Similar presentations


Ads by Google