Kyushu University Las Vegas, June 2007 The Effect of Nanometer-Scale Technologies on the Cache Size Selection for Low Energy Embedded Systems.

Kyushu University ESA’07 @ Las Vegas, June 2007 The Effect of Nanometer-Scale Technologies on the Cache Size Selection for Low Energy Embedded Systems Hamid Noori, Maziar Goudarzi, Koji Inoue, and Kazuaki Murakami Kyushu University

ESA’07 @ Las Vegas, June 2007 Outline Motivations and Observations Motivations and Observations Energy Evaluation Energy Evaluation Problem Definition Problem Definition Experimental Results Experimental Results Conclusion Conclusion

Kyushu University ESA’07 @ Las Vegas, June 2007 Outline Motivations and Observations Motivations and Observations Problem Formulation Energy Evaluation Model Experimental Results Conclusion

Kyushu University ESA’07 @ Las Vegas, June 2007 Motivations and Observations (1/2) Caches contribute a large portion of energy consumption in embedded systems Caches contribute a large portion of energy consumption in embedded systems Leakage power is increasing in new nanometer-scale technologies Leakage power is increasing in new nanometer-scale technologies

Kyushu University ESA’07 @ Las Vegas, June 2007 Motivations and Observations (2/2) 4-way set-associative cache with 16-byte block size 4-way set-associative cache with 16-byte block size Dynamic: 180nm ~ 4x 100nm & 9x 70nm (CACTI 4.1) Dynamic: 180nm ~ 4x 100nm & 9x 70nm (CACTI 4.1) Static: 70nm ~ 400x 180nm & 5x 100nm (CACTI 4.1) Static: 70nm ~ 400x 180nm & 5x 100nm (CACTI 4.1)

Kyushu University ESA’07 @ Las Vegas, June 2007 Goal The effect of different nanometer- scale technologies on cache configuration selection in low-energy embedded systems The effect of different nanometer- scale technologies on cache configuration selection in low-energy embedded systems

Kyushu University ESA’07 @ Las Vegas, June 2007 Outline Energy Evaluation Energy Evaluation

Kyushu University ESA’07 @ Las Vegas, June 2007 Energy Evaluation (1/3) Static Static Dynamic Dynamic energy_memory(Config, Tech) = energy_dynamic(Config, Tech) + energy_dynamic(Config, Tech) + energy_static(Config, Tech)

Kyushu University ESA’07 @ Las Vegas, June 2007 Energy Evaluation (2/3) energy_dynamic(Config, Tech) = energy_dynamic(Config, Tech) = cache_accesses(Config) * energy_cache_access(Config, Tech) + cache_misses(Config) * energy_miss(Config,Tech) energy_miss(Config, Tech) = energy_off_chip_access + energy_cache_block_refill(Config,Tech) energy_cache_block_refill(Config,Tech) energy_static(Config, Tech) = executed_clock_cycles(Config) * clock_period * leakage_power(Config, Tech)

Kyushu University ESA’07 @ Las Vegas, June 2007 Energy Evaluation (3/3) Simplescalar Simplescalar –cache_accesses –cache_misses –executed_clock_cycles CACTI 4.1 CACTI 4.1 –energy_cache_access –energy_cache_block_refill –leakage_power energy_off_chip_access = 20 nJ energy_off_chip_access = 20 nJ Clock freq = 200MHz Clock freq = 200MHz

Kyushu University ESA’07 @ Las Vegas, June 2007 Outline Problem Definition Problem Definition

Kyushu University ESA’07 @ Las Vegas, June 2007 Problem Definition “ For a given application, processor architecture, technology, and instruction- and data-cache organization (i.e. the cache associativity and line-size), find the cache size that results in minimum energy consumption (i.e. minimizes Equation 1 for a given technology) over the entire application run. ”

Kyushu University ESA’07 @ Las Vegas, June 2007 Outline Experimental Results Experimental Results

Kyushu University ESA’07 @ Las Vegas, June 2007 Experimental Results Applications from Mibench Applications from Mibench SimpleScalar SimpleScalar CACTI 4.1 CACTI 4.1 –Three technologies: 180nm, 100nm, and 70nm

Kyushu University ESA’07 @ Las Vegas, June 2007 Instruction Cache

Kyushu University ESA’07 @ Las Vegas, June 2007 Energy Evaluation for three different technologies - qsort

Kyushu University ESA’07 @ Las Vegas, June 2007 Energy Saving There are two different points for a minimum-energy cache size which are 64K (180nm), and 16K (100nm and 70nm). There are two different points for a minimum-energy cache size which are 64K (180nm), and 16K (100nm and 70nm). Total energy is reduced by 38% and 55% respectively in 100nm and 70nm processes when selecting 16KB size for the instruction cache instead of 64KB. Total energy is reduced by 38% and 55% respectively in 100nm and 70nm processes when selecting 16KB size for the instruction cache instead of 64KB. In this application (qsort), this saving comes at a performance penalty of 37% In this application (qsort), this saving comes at a performance penalty of 37% We also note that energy is reduced by 50% in 180nm process when employing a 64KB cache instead of 16KB; i.e., bigger cache used to result in less energy. But as shown above, this trend is reversed in nanometer technologies. We also note that energy is reduced by 50% in 180nm process when employing a 64KB cache instead of 16KB; i.e., bigger cache used to result in less energy. But as shown above, this trend is reversed in nanometer technologies.

Kyushu University ESA’07 @ Las Vegas, June 2007 Other Applications Cache Size100nm70nm 180nm100nm70nmEnergy saving Performance penalty Energy saving Performance penalty basicmath32K 0.0 bitcounts2K 0.0 Cjpeg16K 4K0.0 3.38123.88 Djpeg16K 4K0.0 28.1279.27 Lame32K8K 30.0236.3955.5436.39 dijkstra16K 1K0.0 14.41211.07 patricia32K 0.0 blowfish32K 8K0.0 40.7080.40 rijndael32K 16K0.0 8.6261.02 average3.334.0416.7565.78

Kyushu University ESA’07 @ Las Vegas, June 2007 Data Cache

Kyushu University ESA’07 @ Las Vegas, June 2007 Energy Evaluation for three different technologies - qsort

Kyushu University ESA’07 @ Las Vegas, June 2007 Energy Saving According to the results 32K, 2K and 1K are minimum-energy data cache sizes for 180nm, 100nm and 70nm, respectively. According to the results 32K, 2K and 1K are minimum-energy data cache sizes for 180nm, 100nm and 70nm, respectively. The minimum-energy caches for 100nm (2KB) and 70nm (1KB) technologies respectively consume 88% and 56% less energy compared to the minimum-energy cache of 180nm process (i.e. 32KB). The minimum-energy caches for 100nm (2KB) and 70nm (1KB) technologies respectively consume 88% and 56% less energy compared to the minimum-energy cache of 180nm process (i.e. 32KB). The corresponding performance penalty is only 9% and 14% respectively. The corresponding performance penalty is only 9% and 14% respectively. In 180nm technology, the optimal cache size (32KB) consumes 28% and 40% less energy than 2KB and 1KB caches, but this relation is reversed, with increasing significance, in 100nm and 70nm technologies. In 180nm technology, the optimal cache size (32KB) consumes 28% and 40% less energy than 2KB and 1KB caches, but this relation is reversed, with increasing significance, in 100nm and 70nm technologies.

Kyushu University ESA’07 @ Las Vegas, June 2007 Other Applications Cache Size100nm70nm 180nm100nm70nmEnergy saving Performance penalty Energy saving Performance penalty basicmath4K2K 28.152.7343.022.73 susan8K2K 34.8410.0862.2010.08 cjpeg32K8K 48.1312.2166.2212.21 djpeg32K8K 25.4625.9658.7125.96 lame32K16K8K21.9312.9747.5253.85 dijkstra32K8K 34.4435.8758.7735.87 patricia32K8K 57.049.8577.6924.79 blowfish32K8K4K57.9111.4369.2852.10 rijndael32K16K8K36.619.0059.9833.89 sha32K1K 74.5313.791.3413.72 average41.0914.3863.4726.52

Kyushu University ESA’07 @ Las Vegas, June 2007 The effect of miss rate on optimal cache size for different technologies

Kyushu University ESA’07 @ Las Vegas, June 2007 Energy Evaluation

Kyushu University ESA’07 @ Las Vegas, June 2007 Results For direct mapped cache, the minimum-energy cache size for three technologies is 32K For direct mapped cache, the minimum-energy cache size for three technologies is 32K For 2-way, 32K, 16K and 16K are candidates with minimum energy for 180nm, 100nm and 70nm. For 2-way, 32K, 16K and 16K are candidates with minimum energy for 180nm, 100nm and 70nm. When the slope of miss rate is very sharp, dynamic energy becomes dominant compared to static energy, and therefore, for any technology we will reach to the same cache size. When the slope of miss rate is very sharp, dynamic energy becomes dominant compared to static energy, and therefore, for any technology we will reach to the same cache size. However when a 2-way set associative cache is used, the sharpness in miss rate diagram flattens and again the static energy becomes more important. That is why in 100nm and 70nm we have a different optimal point compared to 180nm in the 2-way cache. However when a 2-way set associative cache is used, the sharpness in miss rate diagram flattens and again the static energy becomes more important. That is why in 100nm and 70nm we have a different optimal point compared to 180nm in the 2-way cache. Thus, as the miss ratio variations become softer, the optimal cache sizes for different technologies get farther. Thus, as the miss ratio variations become softer, the optimal cache sizes for different technologies get farther. For the instruction cache, where execution clock cycles changes from 800 M to 17000 M (~21 times more), the optimal cache sizes are 64K, 16K and 16K, whereas for data cache with softer variation, from 800 M to 1000 M (only 1.2 times more, the minimum- energy cache sizes are 32K, 2K and 1K. For the instruction cache, where execution clock cycles changes from 800 M to 17000 M (~21 times more), the optimal cache sizes are 64K, 16K and 16K, whereas for data cache with softer variation, from 800 M to 1000 M (only 1.2 times more, the minimum- energy cache sizes are 32K, 2K and 1K. In the case of the 2-way cache, the optimal cache size for 100nm and 70nm processes (16KB in both of them) respectively consumes 9% and 29% less energy compared to the 180nm optimal cache (32KB) with 25% performance loss. In the case of the 2-way cache, the optimal cache size for 100nm and 70nm processes (16KB in both of them) respectively consumes 9% and 29% less energy compared to the 180nm optimal cache (32KB) with 25% performance loss.

Kyushu University ESA’07 @ Las Vegas, June 2007 Conclusions The results show that for re-implementing low energy embedded systems in a new technology the cache may need to be re-selected. The results show that for re-implementing low energy embedded systems in a new technology the cache may need to be re-selected. Our study showed that the sharper the slope of miss rate for different cache sizes, the less variation in optimal cache size for different technologies. Our study showed that the sharper the slope of miss rate for different cache sizes, the less variation in optimal cache size for different technologies. The experiments showed that in all cases, the optimal cache size decreases in finer technologies despite the increase in misses and dynamic energy. This is due to high impact of static energy in future technologies and confirms that, unlike micrometer-scale technologies, simply adding more cache does not reduce total system energy in future; cache size must be reduced to minimize total system energy in future nanometer technologies. The experiments showed that in all cases, the optimal cache size decreases in finer technologies despite the increase in misses and dynamic energy. This is due to high impact of static energy in future technologies and confirms that, unlike micrometer-scale technologies, simply adding more cache does not reduce total system energy in future; cache size must be reduced to minimize total system energy in future nanometer technologies. In data cache to due the less cache accesses (less dynamic energy) compared to the instruction cache, this fact is magnified. In data cache to due the less cache accesses (less dynamic energy) compared to the instruction cache, this fact is magnified. Since the smaller caches are more suitable for low energy systems in finer technologies, finding an optimal cache configuration that simultaneously optimizes performance and energy is increasingly more difficult in future. Since the smaller caches are more suitable for low energy systems in finer technologies, finding an optimal cache configuration that simultaneously optimizes performance and energy is increasingly more difficult in future.

Kyushu University ESA’07 @ Las Vegas, June 2007 Thank you for your attention

Kyushu University ESA’07 @ Las Vegas, June 2007 Energy Saving & Performance Penalty Energy Saving = (energy_cache180_NTech – energy_cacheNTech) / energy_cache180_NTech Performance Penalty = (exec_time_cacheNTech – exec_time_cache180) / exec_time_cache180

Kyushu University ESA’07 @ Las Vegas, June 2007 Instruction Cache – Energy Saving 100nm: 8%, 27%, and 41% for 20°C, 60°C, 100°C (max: 65%) 70nm: 1%, 6%, and 16% for 20°C, 60°C, 100°C (max: 45%)

Kyushu University ESA’07 @ Las Vegas, June 2007 Instruction Cache – Performance Penalty 100nm: 1%, 1.2%, and 2.2% for 20°C, 60°C, 100°C 70nm: 0.6%, 2.3%, and 16% for 20°C, 60°C, 100°C

Kyushu University ESA’07 @ Las Vegas, June 2007 Data Cache – Energy Saving 100nm: 3.3%, 25%, and 47% for 20°C, 60°C, 100°C (max: 75%) 70nm: 7%, 22%, and 33% for 20°C, 60°C, 100°C (max: 65%)

Kyushu University ESA’07 @ Las Vegas, June 2007 Data Cache – Performance Penalty 100nm: 0.8%, 5.3%, and 8% for 20°C, 60°C, 100°C 70nm: 3.6%, 10%, and 20% for 20°C, 60°C, 100°C

Kyushu University ESA’07 @ Las Vegas, June 2007 Architecture and Reconfiguration Flow for a Temperature-Aware Configurable Cache Configurable Cache + Configurable Cache + –Hardware Thermal sensor Thermal sensor Accessible read port Accessible read port –Software A table in Operating System (OS) for recoding temperature ranges and their suitable cache configuration A table in Operating System (OS) for recoding temperature ranges and their suitable cache configuration

Kyushu University ESA’07 @ Las Vegas, June 2007 Flow of configuring Temperature-Aware Configurable Cache

Kyushu University ESA’07 @ Las Vegas, June 2007 Temperature measurement accuracy (1/2) T j = T a + θ JA. P T j = T a + θ JA. P – T j : Junction Temperature –T a : Ambient Temperature –P: Power –θ JA : Junction-to-Ambient Thermal Resistance

Kyushu University ESA’07 @ Las Vegas, June 2007 Temperature measurement accuracy (2/2) ARM7TDMIARM966E-S 180nm Power consumptio n 24.15 mW 140 mW Frequency 115 MHz 200 MHz 130nm Power consumptio n 7.98 mW 62.5 mW Frequency 133 MHz 250 MHz 90nm Power consumptio n 7.08 mW 51.7 mW Frequency 236 MHz 470 MHz θ JA : 7°C/W ~ 35 °C/W ΔT = (Tj - Ta) ~ 5 °C

Kyushu University ESA’07 @ Las Vegas, June 2007 Conclusions Our results show that up to 66% and 45% energy consumption can be saved for 100nm and 70nm for instruction cache when the temperature changes from 0°C to 100°C. Our results show that up to 66% and 45% energy consumption can be saved for 100nm and 70nm for instruction cache when the temperature changes from 0°C to 100°C. Due to the increase of leakage effect in finer technologies and higher temperatures, the smaller caches will be more energy efficient for future low energy systems. Due to the increase of leakage effect in finer technologies and higher temperatures, the smaller caches will be more energy efficient for future low energy systems. Since the smaller caches are more suitable for low energy systems in finer technologies and higher temperatures, finding an optimal cache configuration that simultaneously optimizes performance and energy is increasingly more difficult in future, specially at high temperatures. Since the smaller caches are more suitable for low energy systems in finer technologies and higher temperatures, finding an optimal cache configuration that simultaneously optimizes performance and energy is increasingly more difficult in future, specially at high temperatures. Since the accesses to data cache are less than the accesses to instruction cache, the data cache is more easily affected by temperature and technology than instruction cache. By using a configurable data cache, up to 74% and 64% energy can be saved for 100nm and 70nm respectively. Since the accesses to data cache are less than the accesses to instruction cache, the data cache is more easily affected by temperature and technology than instruction cache. By using a configurable data cache, up to 74% and 64% energy can be saved for 100nm and 70nm respectively.

Kyushu University ESA’07 @ Las Vegas, June 2007 Thank you for your attention Questions?

Kyushu University ESA’07 @ Las Vegas, June 2007 Motivations and Observations (3/4) BSIM3 equation for subthreshold leakage BSIM3 equation for subthreshold leakage

Kyushu University ESA’07 @ Las Vegas, June 2007 Experimental Results (1/) Applications from Mibench Applications from Mibench SimpleScalar SimpleScalar CACTI 4.1 CACTI 4.1 –Three technologies: 180nm, 100nm, and 70nm –Six Temperatures: 0°C, 20°C, 40°C, 60°C, 80°C, 100°C Configurable Cache Configurable Cache –Size: 64KB~1KB

Kyushu University ESA’07 @ Las Vegas, June 2007 Qsort-Instruction Cache

Kyushu University ESA’07 @ Las Vegas, June 2007 Qsort-Instruction Cache {0°C ~ 80°C}  64KB, {80°C ~ 100°C}  32KB 17% energy saving and 19.6% performance penalty

Kyushu University ESA’07 @ Las Vegas, June 2007 Qsort-Data Cache 2-way set-associative, 16 bytes line size, 100nm.

Kyushu University ESA’07 @ Las Vegas, June 2007 Qsort-Data Cache Fig. 12. Static energy for different data cache sizes (100nm).

Kyushu University Las Vegas, June 2007 The Effect of Nanometer-Scale Technologies on the Cache Size Selection for Low Energy Embedded Systems.

Similar presentations

Presentation on theme: "Kyushu University Las Vegas, June 2007 The Effect of Nanometer-Scale Technologies on the Cache Size Selection for Low Energy Embedded Systems."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Kyushu University Las Vegas, June 2007 The Effect of Nanometer-Scale Technologies on the Cache Size Selection for Low Energy Embedded Systems.

Similar presentations

Presentation on theme: "Kyushu University Las Vegas, June 2007 The Effect of Nanometer-Scale Technologies on the Cache Size Selection for Low Energy Embedded Systems."— Presentation transcript:

Similar presentations

About project

Feedback