Optimizing Design Time Memory

Slides:



Advertisements
Similar presentations
Semiconductor Memory Design. Organization of Memory Systems Driven only from outside Data flow in and out A cell is accessed for reading by selecting.
Advertisements

Robust Low Power VLSI R obust L ow P ower VLSI Sub-threshold Sense Amplifier (SA) Compensation Using Auto-zeroing Circuitry 01/21/2014 Peter Beshay Department.
Sistemi Elettronici Programmabili1 Progettazione di circuiti e sistemi VLSI Anno Accademico Lezione Memorie (vedi anche i file pcs1_memorie.pdf.
COEN 180 DRAM. Dynamic Random Access Memory Dynamic: Periodically refresh information in a bit cell. Else it is lost. Small footprint: transistor + capacitor.
Elettronica T A.A Digital Integrated Circuits © Prentice Hall 2003 Inverter CMOS INVERTER.
1 A 90nm 512Mb 166MHz Multilevel Cell Flash Memory with 1.5MByte/s Programming Adopted from ISSCC Dig. Tech. Papers, Feb.2005, Intel Corporation[2.6] Presented.
Benton H. Calhoun Jan M. Rabaey Low Power Design Essentials ©2008 Chapter 9 Optimizing Standby Memory.
Introduction to CMOS VLSI Design Lecture 13: SRAM
Fall 06, Sep 19, 21 ELEC / Lecture 6 1 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic.
SRAM Mohammad Sharifkhani. Effect of Mismatch.
Introduction to CMOS VLSI Design Lecture 18: Design for Low Power David Harris Harvey Mudd College Spring 2004.
11/29/2004EE 42 fall 2004 lecture 371 Lecture #37: Memory Last lecture: –Transmission line equations –Reflections and termination –High frequency measurements.
11/03/05ELEC / Lecture 181 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic Circuits.
Introduction to CMOS VLSI Design SRAM/DRAM
Spring 07, Feb 27 ELEC 7770: Advanced VLSI Design (Agrawal) 1 ELEC 7770 Advanced VLSI Design Spring 2007 Power Consumption in a Memory Vishwani D. Agrawal.
Die-Hard SRAM Design Using Per-Column Timing Tracking
Low-Power CMOS SRAM By: Tony Lugo Nhan Tran Adviser: Dr. David Parent.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 13: Power Dissipation Prof. Sherief Reda Division of Engineering, Brown.
Lecture 19: SRAM.
Lecture 7: Power.
Parts from Lecture 9: SRAM Parts from
© Digital Integrated Circuits 2nd Devices VLSI Devices  Intuitive understanding of device operation  Fundamental analytic models  Manual Models  Spice.
Power, Energy and Delay Static CMOS is an attractive design style because of its good noise margins, ideal voltage transfer characteristics, full logic.
Low Voltage Low Power Dram
The CMOS Inverter Slides adapted from:
Digital Integrated Circuits© Prentice Hall 1995 Inverter THE INVERTERS.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Case Study - SRAM & Caches
EE466: VLSI Design Power Dissipation. Outline Motivation to estimate power dissipation Sources of power dissipation Dynamic power dissipation Static power.
© Digital Integrated Circuits 2nd Sequential Circuits Digital Integrated Circuits A Design Perspective Designing Sequential Logic Circuits Jan M. Rabaey.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 30: November 12, 2014 Memory Core: Part.
Determining the Optimal Process Technology for Performance- Constrained Circuits Michael Boyer & Sudeep Ghosh ECE 563: Introduction to VLSI December 5.
EE415 VLSI Design DYNAMIC LOGIC [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
High Speed 64kb SRAM ECE 4332 Fall 2013 Team VeryLargeScaleEngineers Robert Costanzo Michael Recachinas Hector Soto.
Review: Basic Building Blocks  Datapath l Execution units -Adder, multiplier, divider, shifter, etc. l Register file and pipeline registers l Multiplexers,
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 28: November 16, 2012 Memory Periphery.
1 Review Of “A 125 MHz Burst-Mode Flexible Read While Write 256Mbit 2b/c 1.8V NOR Flash Memory” Adopted From: “ISSCC 2005 / SESSION 2 / NON-VOLATILE MEMORY.
Jennifer Winikus Computer Engineering Seminar Michigan Technological University February 10,2011 2/10/2011J Winikus EE
SRAM DESIGN PROJECT PHASE 2 Nirav Desai VLSI DESIGN 2: Prof. Kia Bazargan Dept. of ECE College of Science and Engineering University of Minnesota,
הפקולטה למדעי ההנדסה Faculty of Engineering Sciences.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 27: November 14, 2011 Memory Core.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 12.1 EE4800 CMOS Digital IC Design & Analysis Lecture 12 SRAM Zhuo Feng.
CSE477 L24 RAM Cores.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 24: RAM Cores Mary Jane Irwin ( )
A 256kb Sub-threshold SRAM in 65nm CMOS
Low-Power SRAM ECE 4332 Fall 2010 Team 2: Yanran Chen Cary Converse Chenqian Gan David Moore.
CSE477 L07 Pass Transistor Logic.1Irwin&Vijay, PSU, 2003 CSE477 VLSI Digital Circuits Fall 2003 Lecture 07: Pass Transistor Logic Mary Jane Irwin (
International Symposium on Low Power Electronics and Design A Charge Pump Based Receiver Circuit to Reduce Interconnect Power Dissipation Aatmesh Shrivastava,
Project SRAM Stevo Bailey Kevin Linger Roger Lorenzo John Thompson ECE 4332: Intro to VLSI.
Dynamic Data Stability in Low-power SRAM Design Mohammad Sharifkhani, Shah M. Jahinuzzaman and Manoj Sachdev Electrical & Computer Engineering University.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 28: November 16, 2011 Memory Periphery.
Washington State University
Content Addressable Memories
© Digital Integrated Circuits 2nd Inverter Digital Integrated Circuits A Design Perspective The Inverter Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.
CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 24: Peripheral Memory Circuits [Adapted from Rabaey’s Digital Integrated Circuits, Second Edition,
Low-Power BIST (Built-In Self Test) Overview 10/31/2014
Dynamic Memory Cell Wordline
Patricia Gonzalez Divya Akella VLSI Class Project.
A Class presentation for VLSI course by : Maryam Homayouni
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
Tae- Hyoung Kim, Hanyong Eom, John Keane Presented by Mandeep Singh
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 6.1 EE4800 CMOS Digital IC Design & Analysis Lecture 6 Power Zhuo Feng.
Seok-jae, Lee VLSI Signal Processing Lab. Korea University
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 27: November 14, 2012 Memory Core: Part.
Asynchronous SRAM in 45nM CMOS NCSU Free PDK Paper ID: CSMEPUN International Conference on Computer Science and Mechanical Engineering 10 th November.
LOW POWER DESIGN METHODS
Norhayati Soin 06 KEEE 4426 WEEK 15/1 6/04/2006 CHAPTER 6 Semiconductor Memories.
Alireza Shafaei, Shuang Chen, Yanzhi Wang, and Massoud Pedram
LOW POWER DESIGN METHODS V.ANANDI ASST.PROF,E&C MSRIT,BANGALORE.
INTRODUCTION: MD. SHAFIQUL ISLAM ROLL: REGI:
Analyzing Sub-threshold Bitcell Topologies and the Effects of Assist Methods on SRAM Vmin By: James Boley.
Presentation transcript:

Optimizing Power @ Design Time Memory

Role of Memory in ICs Memory is very important Focus in this chapter is embedded memory Percentage of area going to memory is increasing [Ref: V. De, Intel 2006]

Processor Area Becoming Memory Dominated On chip SRAM contains 50-90% of total transistor count Xeon: 48M/110M Itanium 2: 144M/220M SRAM is a major source of chip static power dissipation Dominant in ultra-low power applications Substantial fraction in others SRAM Intel Penryn™ (Picture courtesy of Intel)

New Memory Technologies Chapter Outline Memory Introduction Power in the Cell Array Power for Read Access Power for Write Access New Memory Technologies

Basic Memory Structures [Ref: J. Rabaey, Prentice’03]

Why is functionality a “metric”? SRAM Metrics Why is functionality a “metric”? Functionality Data retention Readability Writability Soft Errors Area Power Process variations increase with scaling Large number of cells requires analysis of tails (out to 6σ or 7σ) Within-die VTH variation due to Random Dopant Fluctuations (RDFs)

Where Does SRAM Power Go? Numerous analytical SRAM power models Great variety in power breakdowns Different applications cause different components of power to dominate Hence: Depends on applications: e.g. high speed versus low power, portable

Traditional 6-Transistor (6T) SRAM cell Three tasks of a cell Hold data WL=0; BLs=X Write WL=1; BLs driven with new data Read WL=1; BLs precharged and left floating BL WL M1 M2 M3 M4 M5 M6 Q QB Traditional 6-Transistor (6T) SRAM cell

Key SRAM cell metrics Key functionality metrics Hold Read Write Static Noise Margin (SNM) Data retention voltage (DRV Read Write Write Margin BL WL M1 M2 M3 M4 M5 M6 Q QB Metrics: Area is primary constraint Next: Power, Delay Traditional 6-Transistor (6T) SRAM cell

Static Noise Margin (SNM) Inv 2 Inv 1 BLB BL WL Q QB VN M3 M1 M2 M6 M4 M5 SNM gives a measure of the cell’s stability by quantifying the DC noise required to flip the cell VTC for Inv 2 VTC-1 for Inv 1 VTC for Inv2 with VN = SNM VTC-1 for Inv1 with VN = SNM SNM 0.15 0.3 QB(V) Q (V) SNM is length of side of the largest embedded square on the butterfly curve [Ref: E. Seevinck, JSSC’87]

Static Noise Margin with Scaling Tech and VDD scaling lower SNM Typical cell SNM deteriorates with scaling Variations lead to failure from insufficient SNM Variations worsen tail of SNM distribution (Results obtained from simulations with Predictive Technology Models – [Ref: PTM; Y. Cao ‘00])

Variability: Write Margin WL BLB BL 1 Normalized Q Normalized QB 0.2 0.4 0.6 0.8 1 Write failure: Positive SNM Dominant fight (ratioed) Normalized Q Normalized QB 0.2 0.4 0.6 0.8 1 Normalized Q Normalized QB 0.2 0.4 0.6 0.8 1 Cell stability prior to write: Successful write: Negative “SNM”

Variability: Cell Writability Write Fails Temperature (oC) SNM (V) VDD=0.6V 0.05 -0.05 -0.1 -0.15 -0.2 -0.25 -40 -20 20 40 60 80 100 120 TT WW SS WS SW Write margin limits VDD scaling for 6T cells to 600mV, best case. 65nm process, VDD = 0.6V Variability and large number of cells makes this worse

Leakage Power dominates while the memory holds data Cell Array Power Leakage Power dominates while the memory holds data BL BL WL Importance of Gate tunneling and GIDL depends on technology and voltages applied ‘0’ ‘1’ Sub-threshold leakage

Using Threshold Voltage to Reduce Leakage Average extrapolated VTH (V) at 25 ºC -0.2 0.2 0.4 0.6 0.8 1.0 100 Lg =0.1 m W (QT)=0.20 m W (QD)=0.28 m W (QL)=0.18 m Tj =125 C 100 C 75 C 50 C 25 C high speed (0.49) low power (0.71) 10 A 0.1 A 10-2 10-4 10-6 10-8 1-Mb array retention current (A) Extrapolated VTH =VTH (nA/m)+0.3 V High VTH cells necessary if all else is kept the same To keep leakage in 1 MB memory within bounds, VTH must be kept in [0.4, 0.6] range [Ref: K. Itoh, ISCAS’06]

Multiple Threshold Voltages BL WL BL WL ‘0’ Dual VTH cells with low VTH access transistors provide good tradeoffs in power and delay [Ref: Hamzaoglu, et al., TVLSI’02] Use high VTH devices to lower leakage for stored ‘0’, which is much more common than a stored ‘1’ High VTH Low VTH [Ref: N. Azizi, TVLSI’03]

Selective usage of multiple voltages in cell array e.g. 16 fA/cell at 25oC in 0.13 μm technology High VTH to lower sub-VTH leakage Raised source, raised VDD, and lower BL reduce gate stress while maintaining SNM 1.0V WL=0V 1.0V 1.5V 0.5V [Ref: K. Osada, JSSC’03]

Power Breakdown During Read VDD_Prech Accessing correct cell Decoders, WL drivers For Lower Power: hierarchical WLs pulsed decoders Performing read Charge and discharge large BL capacitance For Lower Power : WL Address Mem Cell Sense Amp Data SAs and low BL swing Lower VDD Hierarchical BLs May require read assist Lower BL precharge

Hierarchical Word-line Architecture Reduces amount of switched capacitance Saves power and lowers delay [Ref’s: Rabaey, Prentice’03; T. Hirose, JSSC’90]

Hierarchical Bitlines Local BLs Global BLs Divide up bitlines hierarchically Many variants possible Reduce RC delay, also decrease CV2 power Lower BL leakage seen by accessed cell

BL Leakage During Read Access Leakage into non-accessed cells Raises power and delay Affects BL differential “1” “0” Bit-line

Bitline Leakage Solutions VSSWL VSSWL “1” “0” “1” “0” VGND Vg Raise VSS in cell (VGND) Negative Wordline (NWL) Hierarchical BLs Raise VSS in cell Negative WL voltage Longer access FETs Alternative bit-cells Active compensation Lower BL precharge voltage [Ref: A. Agarwal, JSSC’03]

Lower Precharge Voltage Lower BL precharge voltage decreases power and improves Read SNM Internal bit-cell node rises less Sharp limit due to accidental cell writing if access FET pulls internal ‘1’ low

Lower VDD (and other voltages) via classic voltage scaling VDD Scaling Lower VDD (and other voltages) via classic voltage scaling Saves power Increases delay Limited by lost margin (read and write) Recover Read SNM with read assist Lower BL precharge Boosted cell VDD [Ref: Bhavnagarwala’04, Zhang’06] Pulsed WL and/or Write-After-Read [Ref: Khellah’06] Lower WL [Ref: Ohbayashi’06]

Power Breakdown During Write VDD_Prech Accessing cell Similar to Read For Lower Power: Hierarchical WLs Performing write Traditionally drive BLs full swing For Lower Power : Charge sharing Data dependencies Low swing BLs with amplification WL Address Mem Cell Data

Charge recycling to reduce write power Share charge between BLs or pairs of BLs Saves for consecutive write operations Need to assess overhead Basic charge recycling – saves 50% power in theory 1 1 BL= 0V BLB= VDD BL= VDD/2 BLB= VDD/2 BL= VDD BLB= 0V old values connect floating BLs disconnect and drive new values [Ref’s: K. Mai, JSSC’98; G. Ming, ASICON’05]

Memory Statistics 0’s more common SPEC2000: 90% 0s in data SPEC2000: 85% 0s in instructions Assumed write value using inverted data as necessary [Ref: Y. Chang, ISLPED’99] New Bitcell: BL WZ WL BL WWL 1R, 1W port W0: WZ=0, WWL=1, WS=1 W1: WZ=1, WWL=1, WS=0 WS [Ref: Y. Chang, TVLSI’04]

Drive the BLs with low swing Low-Swing Write Drive the BLs with low swing Use amplification in cell to restore values VDD_Prech EQ BL BLB SLC WL EQ WE BL/BLB Q/QB VDD-VTH-delVBL VDD-VTH WL Q QB SLC VWR=VDD-VTH-delVBL column decoder VWR Din WE [Ref: K. Kanda, JSSC’04]

Fundamental limit to most power-reducing techniques Write Margin Fundamental limit to most power-reducing techniques Recover write margin with write assist, e.g. Boosted WL Collapsed cell VDD [Itoh’96, Bhavnagarwala’04] Raised cell VSS [Yamaoka’04, Kanda’04] Cell with amplification [Kanda ’04]

Non-traditional cells Key tradeoff is with functional robustness Use alternative cell to improve robustness, then trade off for power savings e.g. Remove read SNM Register file cell 1R, 1W port Read SNM eliminated Allows lower VDD 30% area overhead Robust layout RWL WBL WBL WWL RBL 8T SRAM cell [Ref: L. Chang, VLSI’05]

Cellss with Pseudo-Static SNM Removal Isolate stored data during read Dynamic storage for duration of read BL BL WL BL BL WL WWL WLW WLB Differential read Single-ended read [Ref: S. Kosonocky, ISCICT’06] [Ref: K. Takeda, JSSC’06]

Emerging Devices: Double-gate MOSFET Emerging devices allow new SRAM structures Back-gate biasing of thin-body MOSFET provides improved control of short-channel effects, and re-instates effective dynamic control of VTH. Gate length = Lg Gate length = Lg Source Source Gate Fin Width = TSi Gate2 VTH Control Gate1 Drain Drain Switching Gate Fin Height HFIN = W Fin Height HFIN = W/2 Back-gated (BG) MOSFET Independent front and back gates One switching gate and VTH control gate Double-gated (DG) MOSFET [Ref: Z. Guo, ISLPED’05]

6T SRAM Cell with Feed-back Double-Gated (DG) NMOS pull-down and PMOS load devices. Back-Gated (BG) NMOS access devices dynamically increase β-ratio. SNM during read ~ 300mV. Area penalty ~ 19% 6T DG-MOS 6T BG-MOS [Ref: Z. Guo, ISLPED’05]

Summary and Perspectives Functionality is main constraint in SRAM Variation makes the outlying cells limiters Look at hold, read, write modes Use various methods to improve robustness, then trade off for power savings Cell voltages, thresholds Novel bit-cells Emerging devices Embedded memory major threat to continued technology scaling – innovative solutions necessary

References Books and Book Chapters Articles K. Itoh et al, Ultra-Low Voltage Nano-scale Memories, Springer 2007. A. Macii, “Memory Organization for Low-Energy Embedded Systems,” in Low-Power Electronics Design, C, Piguet Editor, Chapter 26, CRC Press, 2005. V. Moshnyaga and K. Inoue, “Low Power Cache Design,” in Low-Power Electronics Design, C, Piguet Editor, Chapter 25, CRC Press, 2005. J. Rabaey, A. Chandrakasan, and B. Nikolic, Digital Integrated Circuits, 2003. T. Takahawara and K. Itoh, “Memory Leakage Reduction,” in Leakage in Nanometer CMOS Technologies, S. Narendra, Ed, Chapter 7, Springer 2006. Articles A. Agarwal, H. Li, and K. Roy, “A Single-Vt Low-Leakage Gated-Ground Cache for Deep Submicron,” IEEE Journal of Solid-State Circuits, vol. 38, no. 2, pp. 319–328, Feb. 2003. N. Azizi, F. Najm, and A. Moshovos, “Low-leakage Asymmetric-Cell SRAM,” IEEE Transactions on VLSI, vol. 11, no. 4, pp. 701-715, August 2003. A. Bhavnagarwala, S. Kosonocky, S. Kowalczyk, R. Joshi, Y. Chan, U. Srinivasan, and J. Wadhwa, “A Transregional CMOS SRAM with Single, Logic VDD and Dynamic Power Rails,” in Symposium on VLSI Circuits, pp. 292–293, 2004. Y. Cao, T. Sato, D. Sylvester, M. Orshansky, and C. Hu, “New Paradigm of Predictive MOSFET and Interconnect Modeling for Early Circuit Design,” in Custom Integrated Circuits Conference (CICC), Oct. 2000, pp. 201–204. L. Chang, D. Fried, J. Hergenrother, et al., “Stable SRAM cell design for the 32 nm node and beyond,” Symposium on VLSI Technology, pp. 128-129, June 2005. Y. Chang, B. Park, and C. Kyung, “Conforming inverted data store for low power memory,” IEEE International Symposium on Low Power Electronics and Design, 1999.

References (cntd) Y. Chang, F. Lai, and C. Yang, “Zero-aware asymmetric SRAM cell for reducing cache power in writing zero,” IEEE Transactions on VLSI Systems, vol. 12, no. 8, pp. 827 – 836, August 2004. Z. Guo, S. Balasubramanian, R. Zlatanovici, T.-J. King, and B. Nikolic, ”FinFET-based SRAM design,” International Symposium on Low Power Electronics and Design, pp. 2-7, August 2005. F. Hamzaoglu, Y. Ye, A. Keshavarzi, K. Zhang, S. Narendra, S. Borkar, M. Stan, and V. De, “Analysis of Dual-VT SRAM Cells with Full-Swing Single-Ended Bit Line Sensing for On-Chip Cache,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 10, no. 2, pp. 91–95, Apr. 2002. T. Hirose, H. Kuriyama, S. Murakami, et al., IEEE Journal of Solid-State Circuits, vol. 25, no. 5, pp. 1068-1074, October 1990 K. Itoh, A. Fridi, A. Bellaouar, and M. Elmasry, “A Deep Sub-V, Single Power-Supply SRAM Cell with Multi-VT, Boosted Storage Node and Dynamic Load,” Symposium on VLSI Circuits, pp. 132–133, June 1996. K. Itoh, M. Horiguchi, and T. Kawahara, “Ultra-low voltage nano-scale embedded RAMs,” IEEE Symposium on Circuits and Systems, May 2006. K. Kanda, H. Sadaaki, and T. Sakurai, “90% Write Power-Saving SRAM Using Sense-Amplifying Memory Cell,” IEEE Journal of Solid-State Circuits, vol. 39, no. 6, pp. 927–933, June 2004. S. Kosonocky, A. Bhavnagarwala, and L. Chang, International Conference on Solid-State and Integrated Circuit Technology, pp. 689-692, October 2006. K. Mai, T. Mori, B. Amrutur, et al., IEEE Journal of Solid-State Circuits, vol. 33, no. 11, pp. 1659-1671, November 1998. G. Ming, Y. Jun, and X. Jun, "Low Power SRAM Design Using Charge Sharing Technique," pp. 102-105, ASICON, 2005. K. Osada, Y. Saitoh, E. Ibe, and K. Ishibashi, “16.7-fA/Cell Tunnel-Leakage- Suppressed 16-Mb SRAM for Handling Cosmic-Ray-Induced Multierrors,” IEEE Journal of Solid-State Circuits, vol. 38, no. 11, pp. 1952–1957, Nov. 2003. PTM – Predictive Models. Available: http://www.eas.asu.edu/˜ptm

References (cntd) E. Seevinck, F. List, and J. Lohstroh, “Static Noise Margin Analysis of MOS SRAM Cells,” IEEE J. of Solid-State Circuits, vol. SC-22, no. 5, pp. 748–754, Oct. 1987. K. Takeda, Y. Hagihara, Y. Aimoto, M. Nomura, Y. Nakazawa, T. Ishii, and H. Kobatake, “A Read-Static-Noise-Margin-Free SRAM Cell for Low-Vdd and High-Speed Applications,” in IEEE International Solid-State Circuits Conference, pp. 478–479, February 2005. M. Yamaoka, Y. Shinozaki, N. Maeda, Y. Shimazaki, K. Kato, S. Shimada, K. Yanagisawa, and K. Osadal, “A 300MHz 25μA/Mb Leakage On-Chip SRAM Module Featuring Process-Variation Immunity and Low-Leakage-Active Mode for Mobile-Phone Application Processor,” in IEEE International Solid-State Circuits Conference, 2004, pp. 494–495. 37