Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chuanjun Zhang, UC Riverside 1 Low Static-Power Frequent-Value Data Caches Chuanjun Zhang*, Jun Yang, and Frank Vahid** *Dept. of Electrical Engineering.

Similar presentations


Presentation on theme: "Chuanjun Zhang, UC Riverside 1 Low Static-Power Frequent-Value Data Caches Chuanjun Zhang*, Jun Yang, and Frank Vahid** *Dept. of Electrical Engineering."— Presentation transcript:

1 Chuanjun Zhang, UC Riverside 1 Low Static-Power Frequent-Value Data Caches Chuanjun Zhang*, Jun Yang, and Frank Vahid** *Dept. of Electrical Engineering Dept. of Computer Science and Engineering University of California, Riverside **Also with the Center for Embedded Computer Systems at UC Irvine This work was in part supported by the National Science Foundation and the Semiconductor Research Corporation

2 Chuanjun Zhang, UC Riverside 2 Leakage Power Dominates Growing impact of leakage power Increase of leakage power due to scaling of transistors’ lengths and threshold voltages. Power budget limits use of fast leaky transistors. Cache consumes much static power Caches account for the most of the transistors on a die. Related work DRG:dynamically resizes cache by monitoring the miss rate. Cache line decay: dynamically turns off cache lines. Drowsy cache: low leakage mode.

3 Chuanjun Zhang, UC Riverside 3 Frequent Values in Data Cache Frequently accessed values behavior ( J. Yang and R. Gupta Micro 2002 ) 00000000 00100000 00000000 FF000000 FFFFFFFF FFFF1234 FFFFFFFF 00000234 00000000 00100000 00000000 FF000000 FFFFFFFF 2341FFFF FFFFFFFF Microprocessor L1 DATA CACHE address 00000000 data address data address data address FFFFFFFF data 00000234 address data FFFF1234 address data Data read out from L1 data cache 00100000 FFFFFFFF

4 Chuanjun Zhang, UC Riverside 4 Frequent Values in Data Cache 32 FVs account for around 36% of the total data cache accesses for 11 Spec 95 Benchmarks. FVs can be dynamically captured. FVs are also widespread within data cache Not just accesses, but also stored throughout. FVs are stored in encoded form. 4 or 5 bits represent 16 or 32 FVs. Non-FVs are stored in unencoded form. The set of frequent values remains fixed for a given program run. 00000000 00100000 00000000 FF000000 FFFFFFFF FVs accessed FVs in D$ 00000000 00100000 00000000 FFFFFFFF 00000000 FFFFFFFF 00000000 FVs ( J. Yang and R. Gupta Micro 2002 )

5 Chuanjun Zhang, UC Riverside 5 Original Frequent Value Data Cache Architecture Data cache memory is separated as low-bit and high-bit array. 5 bits encodes 32 FVs. 27 bits are not accessed for FVs. A register file holds the decoded lines. Dynamic power is reduced. Two cycles when accessing Non-FVs. Flag bit: 1-FV ; 0-NFV

6 Chuanjun Zhang, UC Riverside 6 New FV Cache Design: One Cycle Access to Non FV No extra delay in determining accesses of the 27-bit portion Leakage energy proportion to program execution time New driver is as fast as the original by tuning the NAND gate’s transistor parameters Flag bit: 0-FV ; 1-NFV 32 bits 5 bits decoder output driver original word line driver (a) (b) new word line driver flag bits new driver 20 bits decoder output 27 bits flag bits New cache line architecture: subbanking Original cache line architecture 27 bits

7 Chuanjun Zhang, UC Riverside 7 Low leakage SRAM Cell and Flag Bit SRAM cell with a pMOS gated Vdd control. Gated-Vdd Control Vdd Bitline Gnd Vdd Bitline Gated_Vdd Control Flag bit output Flag bit SRAM cell flag bits new driver 20 bits decoder output 27 bits flag bits New cache line architecture: sub banking

8 Chuanjun Zhang, UC Riverside 8 Experiments SimpleScalar. Eleven Spec 2000 benchmarks Fast Forward the first 1 billion and execute 500M Configuration of the simulated processor.

9 Chuanjun Zhang, UC Riverside 9 Performance Improvement of One Cycle to Non-FV Two cycles impact performance hence increase leakage power One cycle access to Non FV achieves 5.5% performance improvement (and hence impacts leakage energy correspondingly) Hit rate of FVs in data cache. Performance (IPC) improvement of one-cycle FV cache vs. two-cycle FV cache. 5.5%

10 Chuanjun Zhang, UC Riverside 10 Distribution of FVs in Data Cache FVs are widely found in data cache memory. On average 49.2%. Leakage power reduction proportional to the percentage occurrence of FVs Percentage of data cache words (on average) that are FVs.

11 Chuanjun Zhang, UC Riverside 11 Static Energy Reduction 33% total static energy savings for data caches.

12 Chuanjun Zhang, UC Riverside 12 How to Determine the FVs Application-specific processors The FVs can be first identified offline through profiling, and then synthesized into the cache so that power consumption is optimized for the hard coded FVs. Processors that run multiple applications The FVs can be located in a register file to which different applications can write a different set of FVs. Dynamically-determined FVs Embed the process of identifying and updating FVs into registers, so that the design dynamically and transparently adapts to different workloads with different inputs automatically.

13 Chuanjun Zhang, UC Riverside 13 Conclusion Two improvements to the original FV data cache : One cycle access to Non FVs Improve performance (5.5%) and hence static leakage Shut off the unused 27 bits portion of a FV The scheme does not increase data cache miss rate The scheme further reduces data cache static energy by over 33% on average


Download ppt "Chuanjun Zhang, UC Riverside 1 Low Static-Power Frequent-Value Data Caches Chuanjun Zhang*, Jun Yang, and Frank Vahid** *Dept. of Electrical Engineering."

Similar presentations


Ads by Google