Csaba Andras Moritz © 2007 Microprocessor Design in the Face of Process Variations Csaba Andras Moritz Electrical & Computer Engineering University of.

Slides:



Advertisements
Similar presentations
Tunable Sensors for Process-Aware Voltage Scaling
Advertisements

Mehmet Can Vuran, Instructor University of Nebraska-Lincoln Acknowledgement: Overheads adapted from those provided by the authors of the textbook.
Reducing Leakage Power in Peripheral Circuits of L2 Caches Houman Homayoun and Alex Veidenbaum Dept. of Computer Science, UC Irvine {hhomayou,
Semiconductor Memory Design. Organization of Memory Systems Driven only from outside Data flow in and out A cell is accessed for reading by selecting.
Robust Low Power VLSI R obust L ow P ower VLSI Sub-threshold Sense Amplifier (SA) Compensation Using Auto-zeroing Circuitry 01/21/2014 Peter Beshay Department.
Slides based on Kewal Saluja
Chapter 8. Pipelining.
Sim-alpha: A Validated, Execution-Driven Alpha Simulator Rajagopalan Desikan, Doug Burger, Stephen Keckler, Todd Austin.
0 1 Width-dependent Statistical Leakage Modeling for Random Dopant Induced Threshold Voltage Shift Jie Gu, Sachin Sapatnekar, Chris Kim Department of Electrical.
Adaptive Techniques for Leakage Power Management in L2 Cache Peripheral Circuits Houman Homayoun Alex Veidenbaum and Jean-Luc Gaudiot Dept. of Computer.
Mitigating the Performance Degradation due to Faults in Non-Architectural Structures Constantinos Kourouyiannis Veerle Desmet Nikolas Ladas Yiannakis Sazeides.
Chuanjun Zhang, UC Riverside 1 Low Static-Power Frequent-Value Data Caches Chuanjun Zhang*, Jun Yang, and Frank Vahid** *Dept. of Electrical Engineering.
June 20 th 2004University of Utah1 Microarchitectural Techniques to Reduce Interconnect Power in Clustered Processors Karthik Ramani Naveen Muralimanohar.
Defining Wakeup Width for Efficient Dynamic Scheduling A. Aggarwal, O. Ergin – Binghamton University M. Franklin – University of Maryland Presented by:
Die-Hard SRAM Design Using Per-Column Timing Tracking
Low-Power CMOS SRAM By: Tony Lugo Nhan Tran Adviser: Dr. David Parent.
Energy Efficient Instruction Cache for Wide-issue Processors Alex Veidenbaum Information and Computer Science University of California, Irvine.
Chung-Kuan Cheng†, Andrew B. Kahng†‡,
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 13: Power Dissipation Prof. Sherief Reda Division of Engineering, Brown.
Circuit Performance Variability Decomposition Michael Orshansky, Costas Spanos, and Chenming Hu Department of Electrical Engineering and Computer Sciences,
Lecture 5 – Power Prof. Luke Theogarajan
Lecture 19: SRAM.
Lecture 7: Power.
Author: D. Brooks, V.Tiwari and M. Martonosi Reviewer: Junxia Ma
Circuit-Level Timing Speculation: The Razor Latch Developed by Trevor Mudge’s group at the University of Michigan, 2003.
Parts from Lecture 9: SRAM Parts from
Pipelining. Overview Pipelining is widely used in modern processors. Pipelining improves system performance in terms of throughput. Pipelined organization.
Digital Integrated Circuits for Communication
Analysis of Instruction-level Vulnerability to Dynamic Voltage and Temperature Variations ‡ Computer Science and Engineering, UC San Diego variability.org.
Case Study - SRAM & Caches
EE466: VLSI Design Power Dissipation. Outline Motivation to estimate power dissipation Sources of power dissipation Dynamic power dissipation Static power.
ENGG 6090 Topic Review1 How to reduce the power dissipation? Switching Activity Switched Capacitance Voltage Scaling.
17 Sep 2002Embedded Seminar2 Outline The Big Picture Who’s got the Power? What’s in the bag of tricks?
Power Reduction for FPGA using Multiple Vdd/Vth
Korea Univ B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors 컴퓨터 · 전파통신공학과 최병준 1 Computer Engineering and Systems Group.
Dept. of Computer Science, UC Irvine
Drowsy Caches: Simple Techniques for Reducing Leakage Power Authors: ARM Ltd Krisztián Flautner, Advanced Computer Architecture Lab, The University of.
A Centralized Cache Miss Driven Technique to Improve Processor Power Dissipation Houman Homayoun, Avesta Makhzan, Jean-Luc Gaudiot, Alex Veidenbaum University.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 12.1 EE4800 CMOS Digital IC Design & Analysis Lecture 12 SRAM Zhuo Feng.
Low Power – High Speed MCML Circuits (II)
Chapter 8 CPU and Memory: Design, Implementation, and Enhancement The Architecture of Computer Hardware and Systems Software: An Information Technology.
LA-LRU: A Latency-Aware Replacement Policy for Variation Tolerant Caches Aarul Jain, Cambridge Silicon Radio, Phoenix Aviral Shrivastava, Arizona State.
Impact of Process Variation on Input Referred Current Offset in Current Sense Amplifiers Riya Shergill Meenakshi Sekhar.
Basics of Energy & Power Dissipation
1 Energy-Efficient Register Access Jessica H. Tseng and Krste Asanović MIT Laboratory for Computer Science, Cambridge, MA 02139, USA SBCCI2000.
UltraSPARC III Hari P. Ananthanarayanan Anand S. Rajan.
Copyright © 2010 Houman Homayoun Houman Homayoun National Science Foundation Computing Innovation Fellow Department of Computer Science University of California.
Patricia Gonzalez Divya Akella VLSI Class Project.
Computer Organization CS224 Fall 2012 Lessons 41 & 42.
1 Adapted from UC Berkeley CS252 S01 Lecture 18: Reducing Cache Hit Time and Main Memory Design Virtucal Cache, pipelined cache, cache summary, main memory.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 6.1 EE4800 CMOS Digital IC Design & Analysis Lecture 6 Power Zhuo Feng.
Seok-jae, Lee VLSI Signal Processing Lab. Korea University
Click to edit Master title style Progress Update Energy-Performance Characterization of CMOS/MTJ Hybrid Circuits Fengbo Ren 05/28/2010.
Cache Pipelining with Partial Operand Knowledge Erika Gunadi and Mikko H. Lipasti Department of Electrical and Computer Engineering University of Wisconsin—Madison.
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Presented by Rania Kilany.  Energy consumption  Energy consumption is a major concern in many embedded computing systems.  Cache Memories 50%  Cache.
LOW POWER DESIGN METHODS
Cache Issues Computer Organization II 1 Main Memory Supporting Caches Use DRAMs for main memory – Fixed width (e.g., 1 word) – Connected by fixed-width.
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
YASHWANT SINGH, D. BOOLCHANDANI
Temperature and Power Management
Raghuraman Balasubramanian Karthikeyan Sankaralingam
Alireza Shafaei, Shuang Chen, Yanzhi Wang, and Massoud Pedram
Hot Chips, Slow Wires, Leaky Transistors
ECE 445 – Computer Organization
Fine-Grain CAM-Tag Cache Resizing Using Miss Tags
Semiconductor Memories
Impact of Parameter Variations on Multi-core chips
Chapter 1 Introduction.
Presentation transcript:

Csaba Andras Moritz © 2007 Microprocessor Design in the Face of Process Variations Csaba Andras Moritz Electrical & Computer Engineering University of Massachusetts, Amherst Nov, 2007

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 Outline Introduction Impact of Process Variations A Process Variation Resilient Pipeline A Process Variation Resilient Adaptive Cache Architecture Results Conclusion

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 Introduction As technology scales, the feature size reduces thereby requiring a sophisticated fabrication process. The process variations increase as the feature reduces due to the difficulty of fabricating small structures consistently across a die or a wafer. These variations cause mismatches between identical structures. With respect to circuits, this translates to a change in all devices or interconnects parameters from their mean value. Device and interconnect variation trends for different technology generations Sani Nassif, etl. “Models of Process Variations in Device and Interconnect”. IEEE Press 2000

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 Two main sources of process variation:  Physical factors (intrinsic variation)  Environmental factors (dynamic variation) The physical factors are permanent and result from limitations in the fabrication process  Effective Channel Length (Geometric Variations): Imperfections in photolithography (mask, lens, photo system deviations)  Threshold Voltage (Electrical Parameter Variation): Variation in device geometry Random dopant fluctuations changes in oxide thickness The environmental factors depend on the operation of the circuit and include variations in:  Temperature, Power Supply, Switching Activity The performance and power consumption of integrated circuits can be greatly affected. Introduction

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 Pipeline design gate delays typically Let us review variation with a NAND chain

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © NAND gates and NAND2 C load “1”“1” “1”“1” “1”“1” A = “ 1 ” B = “ 0 ” → “ 1 ” C = “ 1 ” → “ 0 ” 15 NAND Gates A B C V BN V BP V BN

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 Assumptions We target a future 32-nm technology process where leakage and process variation are significant In the nominal delay we assume there is no process variations impact on the pipeline stage. In worst-case we assume the worst values of the parameter variations at each transistor that will result in the maximum delay or power consumption. A body bias is a voltage applied between the source or drain of a transistor and its substrate, effectively changing the transistor’s Vth. Depending on the polarity of the voltage applied, Vth increases or decreases. If it increases, the transistor becomes less leaky and slower (reverse body bias); if it decreases, the transistor becomes leakier and faster (forward body bias). Table 1 shows parameter values of process variations for different cases. Figure 3 and Table 2 show delay of the pipeline at different body bias voltages. Figure 4 and Table 3 show average power consumption of the pipeline stage with different body bias voltages.

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 Device parameter variations Leff, Vdd, and Vth Table 1. Parameter values for different cases Case Effective Channel Length (L eff ) Supply Voltage (V dd ) Threshold Voltage (V thn )(V thp ) Nominal25.32 nm0.90V0.20V-0.21V Best-case20.26 nm0.96V0.18V-0.19V Worst- case nm0.84V0.22V-0.23V

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 Table 2. Delay of the pipeline stage. Case Nominal Body Bias Delay V BN V BP Nominal0V0.9V1.363 ns Best- case 0V0.9V0.646 ns Worst- case 0V0.9V3.811 ns Case Forward Body Bias Delay V BN V BP Nominal0.5V0.4V1.271 ns Best- case 0.5V0.4V0.631 ns Worst- case 0.5V0.4V3.389 ns Case Reverse Body Bias Delay V BN Nominal-0.5V1.4V1.608 ns Best- case -0.5V1.4V0.696 ns Worst- case -0.5V1.4V4.731 ns Delay of Pipeline Stage

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 Delay of Pipeline Stage

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 Power of Pipe Stage Table 3. Average power of the pipeline stage. Case Nominal Body Bias Average Power V BN V BP Best- case 0V0.9V7.843 μW Nominal0V0.9V22.45 μW Worst- case 0V0.9V219.4 μW Case Forward Body Bias Average Power V BN V BP Best- case 0.5V0.4V13.00 μW Nominal0.5V0.4V30.32 μW Worst- case 0.5V0.4V294.5 μW Case Reverse Body Bias Average Power V BN Best- case -0.5V1.4V7.772 μW Nominal-0.5V1.4V19.68 μW Worst- case -0.5V1.4V178.7 μW

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 Average Power with BB

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 Effect of BB on delay and power Table 4. Effect of Body Bias Technique. Case Body Bias Voltages Delay (ns) Average Power (μW) V BN V BP Forward Body Bias 0.85V -0.6V V -0.1V V 0.4V Nominal0V 0.9V Reverse Body Bias -0.5V 1.4V V 1.9V V 2.3V

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 Delay Distribution

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 All parameters summary Table 5. Effect of all parameters on pipeline delay Maximum (ns) Minimum (ns)1.214 Mean (ns)1.389 Sigma0.056

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 Power Distribution Nominal

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 Summary power consumption Table 6. Effect of all parameters on pipeline power consumptions. Maximum (uW) Minimum (uW)19.51 Mean (uW)24.05 Sigma1.168

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 Razor Latches Latch concept to sample output of a stage two different times Compare outputs If not equal resample inter-stage latch and delay pipeline by one cycle Implications?

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007

Recovery Technique 1: Global Clock Gating If any stage detects a timing problem  Stall the entire pipeline for one clock cycle.  Use this additional clock cycle to recompute using the correct shadow-latch values

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007

Recovery Technique 2: Counterflow Pipelining When a mismatch (between regular and shadow latch contents) is detected:  Assert a bubble signal, to specify that the erring pipeline slot is now to be considered a bubble.  In the subsequent cycle, inject the shadow latch value into the next stage, allowing the errant operation to continue with the correct values  Trigger a flush train, traveling backwards from the errant stage, flushing operations at each stage it visits

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007

Process Variation Impact on Memory Systems The process variations are random in nature and are expected to become significant in the smaller geometry transistors commonly used in memories. Process variations in caches affect the performance of circuits like  Sense amplifiers that require identical device characteristics  SRAM cells that require near-minimum-sized cell stability for large arrays in embedded, low-power applications  The delay of the address decoders suffer from the process variations that can result in shorter time left for accessing the SRAM cells Question is whether there is a significant delay variation overall that will drive a change in memory architecture design.

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 To account for the worst-case scenario we might need to increase the cache access time by 2 to 3 cycles in conventional design. Application performance could be impacted by as much as 30-40%! These results suggest that process variations must be taken into consideration  New types of circuits and architectures? Motivation

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 There are several ideas that could be exploited in a memory system:  reduce performance by operating at a lower clock frequency (conservative approach)  increase cache access latency assuming worst-case delay (conservative approach)  variable-delay cache architecture (adaptive approach) Introduction

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 The focus of this presentation is on CAM-based caches. Cache Organization Overview Matchline CAM Tags Data SRAM Cache Bank 32 lines 8 words 16 Banks Tag Bank Word Byte Virtual Address: MUX Data

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 Critical Path of CAM-tag Cache

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 Cadence tool was used to design the circuits at layout level, and HSPICE simulation used to evaluate the performance. All the circuits were designed using 32-nm CMOS technology and simulated with a supply voltage of 0.9V. Experiment Setup Cache ComponentPower Techniques Bank Decoder4-input Static NOR gates Tag Array10-transistor CAM Cell Data Array6T SRAM Cell Cache lineWordline Gating Line decoderTwo level decoding: 1 st level 3-input DNAND gate and 2 nd level 2-input NOR gate Tag & Data ArraysCache subbanking (16 banks) Bank size1KB Sense AmplifiersAlpha latch & Sharing Sense Amps. Configuration of our 16 KB Low Power Cache

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 Effective Channel Length variation:  Imperfections in photolithography (mask, lens, photo system deviations) A 40% variation in Leff is expected within a die [ Sani Nassif, IEEE press 2000 ]. Worst-Case Conditions

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 Effective Channel Length variation: A small variation in the Leff value causes a change in the leakage power by as such as 60X from the nominal value. Worst-Case Conditions

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 Threshold Voltage Variation:  Accurate control of Vth is very important for many performance and power optimizations and for correct execution. Worst-Case Conditions

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 Threshold Voltage Variation  The impact on leakage power could be as much as 40X. Worst-Case Conditions

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 Power Supply Variation  One of the most important environmental factors that cause variations in operating condition is supply voltage.  Voltage variations due to non uniform power-supply distribution, switching activity, and IR drop;  A total variation of 15% in Vdd was considered with a nominal value of 0.9V. Worst-Case Conditions Vdd (V)Delay (ns)Power (W)

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 To accurately predict cache critical path delay distribution at the circuit level, cache delay variability can be studied through Monte-Carlo in HSPICE circuit simulations. Monte-Carlo simulations verify model predictions over a wide range of process and design conditions and provides an estimate for expected behavior. We assume parameter variations to be normally distributed with mean and sigma values derived from PTM and ITRS sources. Expected Conditions Technology32nm DeviceNMOSPMOS Leff25.32 nm (± 20%) Vth0.2V (± 7.5%)-0.2V (± 7.5%) Vdd0.9V (± 7.5%) Temperature75 o C Parameter values and σ variations

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 The distribution of delay of a cache critical path was determined by performing Monte-Carlo sampling at different supply voltages, threshold voltages, and transistor lengths. under the expected condition a large fraction of accesses would be still close to the nominal value Nominal Expected Conditions

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 How do we design a memory system in the face of process variations and help mitigate the negative impact on performance? We can select a cache design using worst case assumptions  ALL VARIATIONS and ALL COMPONENTS on the critical path Alternatively, we need to design circuits and architectures that would work adaptively depending on actual delay  Process variation resilient design  Resilience against delays in different parts of the cache Architectural Techniques

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 Two phases of operation: classification and execution Proposed Adaptive Cache Architecture Data Array Data Array CAM Tag CAM Tag FDEXMEMWB Delay Storage Delay Storage Adaptive Controller Test Mode Classifier addressdata

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 During classification phase  The cache is equipped with a built-in-self-test (BIST) technique to detect speed difference due to process variation.  Each cache line is tested using BIST when the test mode signal is on. A block is considered medium, slow, failure. Delay Storage Column MUX Sense Amplifiers Data Out Row Address Data Array BIST Test Mode Speed Information Operating Conditions Classification Phase

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 Delay Storage Controller Column MUX Sense Amplifiers Data Out Column Address Row Address Data Array During execution phase  The speed information stored in the delay storage is used to control sense amplifiers during regular operations of the circuit. Execution Phase

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 Instruction WindowRUU=16; LSQ=8 Fetch, dispatch, commit width4 Integer ALU/multi-div4/1 FP ALU/multi-div4/1 Number of Banks16 banks L1 D-cache Size16KB, 32-way set-assoc, 32B blocks L1 I-cache Size16KB, 32-way set-assoc, 32B blocks L2 Unified Cache Size128KB, 64-way, 64B blocks, 8cycle Memory Latency100 cycles Memory ports2 TLB Size128-entry, fully assoc., 30 cycles miss penalty Branch PredictorComb. Of bimodal and 2-level gshare; bimodal size 2048; level entry, history 10; level entry (global) Branch Target Buffer512-entry, 4-way associative Return-address-stack8-entry Experimental Setup The adaptive cache architecture is implemented in the SimpleScalar. We have conducted simulations of SPEC2000 benchmarks using the adaptive approach. The adaptive cache based on the delay distribution is determined by the Monte-Carlo simulation. SimpleScalar parameters for CPU

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 Baseline: 3 cycle D-cache with worst-case delay, 16KB total size, 16 banks each 32-way. Out of order 4-way issue. Adaptive caching scheme: 1% 3 cycle, 24% 2 cycle. 75% 1 cycle cache line access. Results below show performance is improved by 9% to 31%! Performance Speedup

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 Speedup values are normalized with respect to the worst-case delay of 3 cycles. As we can see, the 8-way issues design benefits more than the 4-way issues from the adaptive cache architecture. Sensitivity to Issue Width

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 Hardware required :  BIST circuit  delay storage  control circuitry We have evaluated the hardware needed for the adaptive cache by using the Synopsys Design Compiler tool. Hardware Required CircuitBIST, delay storage, and control circuitryCache Delay0 ns0.95 ns Power0.55 mW27.67 mW Area mm^20.54 mm^2

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 Power Issues

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 Leakage Power Variation

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 Leakage (contd.)

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 Leakage (contd.)

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 Leakage Enhanced Cells In the inactive state, when the cell is not being written to or read from, most of the leakage power is dissipated by the transistors that are off and that have a voltage differential across their drain and source. If the cell were storing a “0”, transistors T1, N1 and P2 would dissipate leakage power. A simple technique for reducing leakage power would be to replace all transistors with high-Vth ones, but this would degrade the bitlines discharge times affecting cell read performance significantly. In our design we instead applied the same high-Vth for all the NMOS transistors – asymmetric cell design. By changing the Vth we change perfomance and power tradeoffs.

Csaba Andras Moritz - Software Systems & Architecture Lab, Electrical & Computer Engineering; © 2007 Tradeoffs between performance and power – what is visible at appl. level? Distribution of cache delay and leakage power for different high-V th schemes. Results obtained by Monte Carlo simulations with adaptive cache for various scenarios. Scheme V th (V) Delay (ns) Mean Leakage (W) 1 cycle 2 cycles 3 cycles Conventional % 100% A %24%1% A %30%2% A %40%4% A %50%5%