Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems G. Pokam and F. Bodin.

Similar presentations


Presentation on theme: "1 Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems G. Pokam and F. Bodin."— Presentation transcript:

1 1 Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems G. Pokam and F. Bodin

2 2 Plan n Motivation n Previous work n Our approach n Cache model n Trace-based analysis n Experimental setup n Program behavior n Preliminary results n Conclusions and future work

3 3 Motivation (1/3) n High-performance accommodates difficultly with low-power n Consider the cache hierarchy for instance benefits of large caches benefits of large caches  maintain embedded code + data workload on-chip  reduce off-chip memory traffic however, however,  caches account for ~80% of the transistors count  we usually devote half of the chip area to caches

4 4 Motivation (2/3) n Cache impact on the energy consumption static energy is incommensurate in comparison to the rest of the chip u 80% of the transistors contribute steadily to the leakage power dynamic energy (transistors switching activities) represents an important fraction of the total energy due to the high access frequency of caches n Caches design is therefore critical in the context of high-performance embedded systems

5 5 Motivation (3/3)

6 6 Previous work (1/2) n Some configurable cache proposals that apply to embedded systems include: Albonesi [MICRO’99]: selective cache ways Albonesi [MICRO’99]: selective cache ways  to disable/enable individual cache ways of a highly set-associative cache Zhang & al. [ISCA’03]: way-concatenation Zhang & al. [ISCA’03]: way-concatenation  to reduce the cache associativity while still maintaining the full cache capacity

7 7 Previous work (2/2) per-application basis n These approaches only consider configuration on a per-application basis n Problems : empirically, no best cache size exists for a given application empirically, no best cache size exists for a given application varying dynamic cache behavior within an application, and from one application to another varying dynamic cache behavior within an application, and from one application to another Therefore, these approaches do not accommodate well to program phase changes Therefore, these approaches do not accommodate well to program phase changes

8 8 Our approach Objective : emphasize on application-specific cache architectural parameters emphasize on application-specific cache architectural parameters To do so, we consider a cache with fixed line size and modulus set mapping function power/perf is dictated by size and associativity power/perf is dictated by size and associativity Not all dynamic program phases may have the same requirements on cache size and associativity ! Dynamically varying size and assoc. to leverage power/perf. tradeoff at phase-level Dynamically varying size and assoc. to leverage power/perf. tradeoff at phase-level

9 9 Cache model (1/8) n Baseline cache model: [Zhang ISCA’03] way-concatenation cache [Zhang ISCA’03] Functionality of the way-concatenation cache mn  on each cache lookup, a logic selects the number of active cache ways m out of the n available cache ways n-way cache.  virtually, each active cache way is a multiple of the size of a single bank in the base n-way cache.

10 10 Cache model (2/8) n Our proposal: modify the associativity while guaranteeing cache coherency modify the cache size while preserving data availability on unused cache portions

11 11 Cache model (3/8) n First enhancement: associativity level Problem with baseline model Problem with baseline model  consider the following scenario in the baseline model @A Bank 0Bank 1Bank 2Bank 3 Phase 0: 32K 2-way, active banks are 0 and 2 Phase 1: 32K 1-way, active bank is 2, @A is modified Old copy @A invalidation

12 12 Cache model (4/8) Proposed solution Proposed solution :  assume a write-through cache associative tag array  the unused tag and status arrays must be made accessible on a write to ensure coherency across cache configurations => associative tag array  actions of the cache controller: access all tag arrays on a write request to set the corresponding status bit to invalid

13 13 Cache model (5/8) n Second enhancement: cache size level Problem with the baseline model Problem with the baseline model:  Gated-Vdd is used to disconnect a bank => data are not preserved across 2 configurations! Proposed solution Proposed solution:  unused cache ways are put in a low-power mode => drowsy mode [Flautner & al. ISCA’02]  tag portion is left unchanged !  Main advantage preserve the state of the unused memory cells u we can reduce the cache size, preserve the state of the unused memory cells across program phases, while still reducing leakage energy !

14 14 Cache model (6/8) n Overall cache model

15 15 Cache model (7/8) n Modified cache line (DVS is assumed)

16 16 Cache model (8/8) Drowsy circuitry accounts for less than 3% of the chip area Accessing a line in drowsy mode requires 1 cycle delay [Flautner & al. ISCA’02] ISA extension we assume the ISA can be extended with a reconfiguration instruction having the following effects on WCR:

17 17 Trace-based analysis (1/3) n Goal : We want to extract a performance and energy profiles from the trace in order to adapt the cache structure to the dynamic application requirements n Assumptions : LRU replacement policy no prefetching

18 18 Trace-based analysis (2/3) sample interval = set mapping function = (for varying the associativity) LRU-Stack distance d = (for varying the cache size) Then, define the LRU-stack profiles :  : performance  for each pair, this expression defines the number of dynamic references that hit in caches with LRU-stack distance

19 19 Trace-based analysis (3/3)  : energy Cache energy Tag energy Drowsy transitions energy memory energy

20 20 Experimental setup (1/2) n Focus on data cache n Simulation platform 4-issue VLIW processor [Faraboschi & al. ISCA’00] 32KB 4-way data cache 32B block size 20 cycles miss penalty Benchmarks MiBench: fft, gsm, susan MediaBench: mpeg, epic PowerStone: summin, whestone, v42bis

21 21 Experimental setup (2/2) n CACTI 3.0 n to obtain energy values n we extend it to provide leakage energy values for each simulated cache configuration Hotleakage from where we adapted the leakage energy calculation for each simulated leakage reduction technique estimated memory ratio = 50 drowsy energy from [Flautner & al. ISCA’02]

22 22 Program behavior (1/4) n GSM All 32K config All 16K config 8K config Capacity miss effect Tradeoff region Sensitive region Insensitive region (log10 scale)

23 23 Program behavior (2/4) n FFT

24 24 Program behavior (3/4) Working set size sensitivity property Working set size sensitivity property the working set can be partitioned into clusters with similar cache sensitivity Capturing sensitivity through working set size clustering Capturing sensitivity through working set size clustering the partitioning is done relative to the base cache configuration We use a simple metric based on the Manhattan distance vector from two points and

25 25 Program behavior (4/4) n More energy/performance profiles summinwhestone

26 26 Results (1/3) n Dynamic energy reduction

27 27 Results (2/3) n Leakage energy savings (0.07um) Better due to gated-Vdd

28 28 Results (3/3) n Performance Worst-case degradation (65% due to drowsy transitions)

29 29 Conclusions and future work n Can do better for improving performance n reduce the frequency of drowsy transitions within a phase with refined cache bank access policies n management of reconfiguration at the compiler level n insert BB annotation in the trace n exploit feedback-directed compilation n promising scheme for embedded systems


Download ppt "1 Energy-efficiency potential of a phase-based cache resizing scheme for embedded systems G. Pokam and F. Bodin."

Similar presentations


Ads by Google