Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture Seongbeom Kim, Dhruba Chandra, and Yan Solihin Dept. of Electrical and Computer.

Similar presentations


Presentation on theme: "Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture Seongbeom Kim, Dhruba Chandra, and Yan Solihin Dept. of Electrical and Computer."— Presentation transcript:

1 Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture Seongbeom Kim, Dhruba Chandra, and Yan Solihin Dept. of Electrical and Computer Engineering North Carolina State University {skim16, dchandr, solihin}@ncsu.edu

2 PACT 20042Seongbeom Kim, NCSU L2 $ Cache Sharing in CMP L1 $ …… Processor Core 1Processor Core 2 L1 $

3 PACT 20043Seongbeom Kim, NCSU L2 $ Cache Sharing in CMP L1 $ …… Processor Core 1 L1 $ Processor Core 2 ←t1

4 PACT 20044Seongbeom Kim, NCSU L1 $ Processor Core 1 L1 $ Processor Core 2 L2 $ Cache Sharing in CMP …… t2→

5 PACT 20045Seongbeom Kim, NCSU L1 $ L2 $ Cache Sharing in CMP …… Processor Core 1Processor Core 2 ←t1 L1 $ t2→ t2’s throughput is significantly reduced due to unfair cache sharing.

6 PACT 20046Seongbeom Kim, NCSU Shared L2 cache space contention

7 PACT 20047Seongbeom Kim, NCSU Shared L2 cache space contention

8 PACT 20048Seongbeom Kim, NCSU Uniprocessor scheduling 2-core CMP scheduling Problems of unfair cache sharing –Sub-optimal throughput –Thread starvation –Priority inversion –Thread-mix dependent throughput Fairness: uniform slowdown for co-scheduled threads Impact of unfair cache sharing t1 t4 t1 t3 t2 t1 t2 t1 t3 t1 t2 t1 t3 t4 t1 P1: P2: time slice

9 PACT 20049Seongbeom Kim, NCSU Contributions Cache fairness metrics –Easy to measure –Approximate uniform slowdown well Fair caching algorithms –Static/dynamic cache partitioning Optimizing fairness –Simple hardware modifications Simulation results –Fairness: 4x improvement –Throughput 15% improvement Comparable to cache miss minimization approach

10 PACT 200410Seongbeom Kim, NCSU Related Work Cache miss minimization in CMP: –G. Suh, S. Devadas, L. Rudolph, HPCA 2002 Balancing throughput and fairness in SMT: –K. Luo, J. Gummaraju, M. Franklin, ISPASS 2001 –A. Snavely and D. Tullsen, ASPLOS, 2000 –…

11 PACT 200411Seongbeom Kim, NCSU Outline Fairness Metrics Static Fair Caching Algorithms (See Paper) Dynamic Fair Caching Algorithms Evaluation Environment Evaluation Conclusions

12 PACT 200412Seongbeom Kim, NCSU Fairness Metrics Uniform slowdown Execution time of t i when it runs alone.

13 PACT 200413Seongbeom Kim, NCSU Fairness Metrics Uniform slowdown Execution time of t i when it shares cache with others.

14 PACT 200414Seongbeom Kim, NCSU Fairness Metrics Uniform slowdown We want to minimize: –Ideally:

15 PACT 200415Seongbeom Kim, NCSU Fairness Metrics Uniform slowdown We want to minimize: –Ideally:

16 PACT 200416Seongbeom Kim, NCSU Fairness Metrics Uniform slowdown We want to minimize: –Ideally:

17 PACT 200417Seongbeom Kim, NCSU Outline Fairness Metrics Static Fair Caching Algorithms (See Paper) Dynamic Fair Caching Algorithms Evaluation Environment Evaluation Conclusions

18 PACT 200418Seongbeom Kim, NCSU Partitionable Cache Hardware LRU P1: 448B P2 Miss P2: 576B Current Partition P1: 384B P2: 640B Target Partition Modified LRU cache replacement policy –G. Suh, et. al., HPCA 2002

19 PACT 200419Seongbeom Kim, NCSU Partitionable Cache Hardware LRU * P1: 448B P2 Miss P2: 576B Current Partition P1: 384B P2: 640B Target Partition Modified LRU cache replacement policy –G. Suh, et. al., HPCA 2002 LRU * P1: 384B P2: 640B Current Partition P1: 384B P2: 640B Target Partition

20 PACT 200420Seongbeom Kim, NCSU Dynamic Fair Caching Algorithm P1: P2: Ex) Optimizing M3 metric P1: P2: Target Partition MissRate alone P1: P2: MissRate shared Repartitioning interval

21 PACT 200421Seongbeom Kim, NCSU Dynamic Fair Caching Algorithm 1 st Interval P1:20% P2: 5% MissRate alone Repartitioning interval P1: P2: MissRate shared P1:20% P2:15% MissRate shared P1:256KB P2:256KB Target Partition

22 PACT 200422Seongbeom Kim, NCSU Dynamic Fair Caching Algorithm Repartition! Evaluate M3 P1: 20% / 20% P2: 15% / 5% P1:20% P2: 5% MissRate alone Repartitioning interval P1:20% P2:15% MissRate shared P1:256KB P2:256KB Target Partition P1:192KB P2:320KB Target Partition Partition granularity: 64KB

23 PACT 200423Seongbeom Kim, NCSU Dynamic Fair Caching Algorithm 2 nd Interval P1:20% P2: 5% MissRate alone Repartitioning interval P1:20% P2:15% MissRate shared P1:20% P2:15% MissRate shared P1:20% P2:10% MissRate shared P1:192KB P2:320KB Target Partition

24 PACT 200424Seongbeom Kim, NCSU Dynamic Fair Caching Algorithm Repartition! Evaluate M3 P1: 20% / 20% P2: 10% / 5% P1:20% P2: 5% MissRate alone Repartitioning interval P1:20% P2:15% MissRate shared P1:20% P2:10% MissRate shared P1:192KB P2:320KB Target Partition P1:128KB P2:384KB Target Partition

25 PACT 200425Seongbeom Kim, NCSU Dynamic Fair Caching Algorithm 3 rd Interval P1:20% P2: 5% MissRate alone Repartitioning interval P1:20% P2:10% MissRate shared P1:128KB P2:384KB Target Partition P1:20% P2:10% MissRate shared P1:25% P2: 9% MissRate shared

26 PACT 200426Seongbeom Kim, NCSU Dynamic Fair Caching Algorithm Repartition! Do Rollback if: P2: Δ<T rollback Δ=MR old -MR new P1:20% P2: 5% MissRate alone Repartitioning interval P1:20% P2:10% MissRate shared P1:25% P2: 9% MissRate shared P1:128KB P2:384KB Target Partition P1:192KB P2:320KB Target Partition

27 PACT 200427Seongbeom Kim, NCSU Fair Caching Overhead Partitionable cache hardware Profiling –Static profiling for M1, M3 –Dynamic profiling for M1, M3, M4 Storage –Per-thread registers Miss rate/count for “alone” case Miss rate/count for “shared’ case Repartitioning algorithm –< 100 cycles overhead in 2-core CMP –invoked at every repartitioning interval

28 PACT 200428Seongbeom Kim, NCSU Outline Fairness Metrics Static Fair Caching Algorithms (See Paper) Dynamic Fair Caching Algorithms Evaluation Environment Evaluation Conclusions

29 PACT 200429Seongbeom Kim, NCSU Evaluation Environment UIUC ’ s SESC Simulator –Cycle accurate CMP Cores 2 cores, each 4-issue dynamic. 3.2GHz Memory L1 I/D (private): WB, 32KB, 4way, 64B line, RT: 3cycles L2 Unified (shared): WB, 512KB, 8way, 64B line, RT: 14 cycles L2 replacement: LRU or Pseudo-LRU RT memory latency: 407 cycles

30 PACT 200430Seongbeom Kim, NCSU Evaluation Environment ParameterValues Repartitioning granularity64KB Repartitioning interval10K, 20K, 40K, 80K L2 accesses T rollback 0%, 5%, 10%, 15%, 20%, 25%, 30% 18 benchmark pairs Algorithm Parameters Static algorithms: FairM1 Dynamic algorithms: FairM1Dyn, FairM3Dyn, FairM4Dyn

31 PACT 200431Seongbeom Kim, NCSU Outline Fairness Metrics Static Fair Caching Algorithms (See Paper) Dynamic Fair Caching Algorithms Evaluation Environment Evaluation –Correlation results –Static fair caching results –Dynamic fair caching results –Impact of rollback threshold –Impact of time interval Conclusions

32 PACT 200432Seongbeom Kim, NCSU Correlation Results

33 PACT 200433Seongbeom Kim, NCSU Correlation Results M1 & M3 show best correlation with M0.

34 PACT 200434Seongbeom Kim, NCSU Static Fair Caching Results

35 PACT 200435Seongbeom Kim, NCSU Static Fair Caching Results FairM1 has comparable throughput as MinMiss with better fairness

36 PACT 200436Seongbeom Kim, NCSU Static Fair Caching Results Opt assures that better fairness is achieved without throughput loss.

37 PACT 200437Seongbeom Kim, NCSU Dynamic Fair Caching Results

38 PACT 200438Seongbeom Kim, NCSU Dynamic Fair Caching Results FairM1Dyn, FairM3Dyn show best fairness and throughput.

39 PACT 200439Seongbeom Kim, NCSU Dynamic Fair Caching Results Improvement in fairness results in throughput gain.

40 PACT 200440Seongbeom Kim, NCSU Dynamic Fair Caching Results Fair caching sometimes degrades throughput (2 out of 18).

41 PACT 200441Seongbeom Kim, NCSU Impact of Rollback Threshold in FairM1Dyn

42 PACT 200442Seongbeom Kim, NCSU Impact of Rollback Threshold in FairM1Dyn ’20% T rollback ’ shows best fairness and throughput.

43 PACT 200443Seongbeom Kim, NCSU Impact of Repartitioning Interval in FairM1Dyn

44 PACT 200444Seongbeom Kim, NCSU Impact of Repartitioning Interval in FairM1Dyn ‘10K L2 accesses’ shows best fairness and throughput.

45 PACT 200445Seongbeom Kim, NCSU Outline Fairness Metrics Static Fair Caching Algorithms (See Paper) Dynamic Fair Caching Algorithms Evaluation Environment Evaluation Conclusions

46 PACT 200446Seongbeom Kim, NCSU Conclusions Problems of unfair cache sharing –Sub-optimal throughput –Thread starvation –Priority inversion –Thread-mix dependent throughput Contributions –Cache fairness metrics –Static/dynamic fair caching algorithms Benefits of fair caching –Fairness: 4x improvement –Throughput 15% improvement Comparable to cache miss minimization approach –Fair caching simplifies scheduler design –Simple hardware support

47 PACT 200447Seongbeom Kim, NCSU Partitioning Histogram Mostly oscillating between two partitioning choices.

48 PACT 200448Seongbeom Kim, NCSU Partitioning Histogram T rollback of 35% can still find better partition.

49 PACT 200449Seongbeom Kim, NCSU Impact of Partition Granularity in FairM1Dyn 64KB shows best fairness and throughput.

50 PACT 200450Seongbeom Kim, NCSU Impact of Initial Partition in FairM1Dyn Tolerable differences from various initial partition.

51 PACT 200451Seongbeom Kim, NCSU Impact of Initial Partition in FairM1Dyn Initially equal partition alleviates local optimum problem.

52 PACT 200452Seongbeom Kim, NCSU SpeedUp over Batch Scheduling FairM1Dyn, FairM3Dyn show best speedup.


Download ppt "Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture Seongbeom Kim, Dhruba Chandra, and Yan Solihin Dept. of Electrical and Computer."

Similar presentations


Ads by Google