August 20101 VLSI Memory Design Shmuel Wimer Bar Ilan University, School of Engineering.

Slides:



Advertisements
Similar presentations
ECE555 Lecture 5 Nam Sung Kim University of Wisconsin – Madison
Advertisements

® Microsoft Office 2010 Excel Tutorial 3: Working with Formulas and Functions.
UNIT 5: CMOS subsystem design
MEMORY popo.
August 4, The following PEIMS reporting changes have been made to the PEIMS Collection in order to collect the Classroom Link information.
ACOT Intro/Copyright Succeeding in Business with Microsoft Excel 2010: Chapter1.
Fig Typical voltage transfer characteristic (VTC) of a logic inverter, illustrating the definition of the critical points.
Transmission Gate Based Circuits
RAM (RANDOM ACCESS MEMORY)
COEN 180 SRAM. High-speed Low capacity Expensive Large chip area. Continuous power use to maintain storage Technology used for making MM caches.
CSET 4650 Field Programmable Logic Devices
Semiconductor Memory Design. Organization of Memory Systems Driven only from outside Data flow in and out A cell is accessed for reading by selecting.
+ CS 325: CS Hardware and Software Organization and Architecture Internal Memory.
COEN 180 DRAM. Dynamic Random Access Memory Dynamic: Periodically refresh information in a bit cell. Else it is lost. Small footprint: transistor + capacitor.
C H A P T E R 15 Memory Circuits
CP208 Digital Electronics Class Lecture 11 May 13, 2009.
Introduction to CMOS VLSI Design Lecture 13: SRAM
Topic 9 MOS Memory and Storage Circuits
Designing Combinational Logic Circuits: Part2 Alternative Logic Forms:
11/29/2004EE 42 fall 2004 lecture 371 Lecture #37: Memory Last lecture: –Transmission line equations –Reflections and termination –High frequency measurements.
Introduction to CMOS VLSI Design SRAM/DRAM
Registers  Flip-flops are available in a variety of configurations. A simple one with two independent D flip-flops with clear and preset signals is illustrated.
Lecture 4: Computer Memory
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 31: Array Subsystems (SRAM) Prof. Sherief Reda Division of Engineering,
Memory and Advanced Digital Circuits 1.
Modern VLSI Design 2e: Chapter 6 Copyright  1998 Prentice Hall PTR Topics n Memories: –ROM; –SRAM; –DRAM. n PLAs.
Lecture 19: SRAM.
Parts from Lecture 9: SRAM Parts from
Power, Energy and Delay Static CMOS is an attractive design style because of its good noise margins, ideal voltage transfer characteristics, full logic.
Digital Integrated Circuits for Communication
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Case Study - SRAM & Caches
Khaled A. Al-Utaibi Memory Devices Khaled A. Al-Utaibi
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 28: November 15, 2013 Memory Periphery.
Lecture on Electronic Memories. What Is Electronic Memory? Electronic device that stores digital information Types –Volatile v. non-volatile –Static v.
1 Delay Estimation Most digital designs have multiple data paths some of which are not critical. The critical path is defined as the path the offers the.
MOS Transistors The gate material of Metal Oxide Semiconductor Field Effect Transistors was original made of metal hence the name. Present day devices’
EE415 VLSI Design DYNAMIC LOGIC [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
Review: Basic Building Blocks  Datapath l Execution units -Adder, multiplier, divider, shifter, etc. l Register file and pipeline registers l Multiplexers,
Abdullah Aldahami ( ) Feb26, Introduction 2. Feedback Switch Logic 3. Arithmetic Logic Unit Architecture a.Ripple-Carry Adder b.Kogge-Stone.
Washington State University
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 28: November 16, 2012 Memory Periphery.
Chapter 3 Digital Logic Structures. Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 3-2 Combinational vs.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n Latches and flip-flops. n RAMs and ROMs.
Modern VLSI Design 4e: Chapter 6 Copyright  2008 Wayne Wolf Topics Memories: –ROM; –SRAM; –DRAM; –Flash. Image sensors. FPGAs. PLAs.
Ratioed Circuits Ratioed circuits use weak pull-up and stronger pull-down networks. The input capacitance is reduced and hence logical effort. Correct.
DCSL & LVDCSL: A High Fan-in, High Performance Differential Current Switch Logic Families Dinesh Somasekhaar, Kaushik Roy Presented by Hazem Awad.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 12.1 EE4800 CMOS Digital IC Design & Analysis Lecture 12 SRAM Zhuo Feng.
Memory System Unit-IV 4/24/2017 Unit-4 : Memory System.
Digital Design: Principles and Practices
Advanced VLSI Design Unit 06: SRAM
CSE477 L24 RAM Cores.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 24: RAM Cores Mary Jane Irwin ( )
SEQUENTIAL CIRCUITS Component Design and Use. Register with Parallel Load  Register: Group of Flip-Flops  Ex: D Flip-Flops  Holds a Word of Data 
PHY 201 (Blum)1 Transistor Odds and Ends. PHY 201 (Blum)2 RTL NOR.
Chapter 1 Combinational CMOS Logic Circuits Lecture # 4 Pass Transistors and Transmission Gates.
Lecture 10: Circuit Families. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 10: Circuit Families2 Outline  Pseudo-nMOS Logic  Dynamic Logic  Pass Transistor.
Computer Memory Storage Decoding Addressing 1. Memories We've Seen SIMM = Single Inline Memory Module DIMM = Dual IMM SODIMM = Small Outline DIMM RAM.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 28: November 16, 2011 Memory Periphery.
Washington State University
Memory Devices 1. Memory concepts 2. RAMs 3. ROMs 4. Memory expansion & address decoding applications 5. Magnetic and Optical Storage.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 28: November 7, 2014 Memory Overview.
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
EE141 Project: 32x32 SRAM Abhinav Gupta, Glen Wong Optimization goals: Balance between area and performance Minimize area without sacrificing performance.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 28: November 8, 2013 Memory Overview.
Engineered for Tomorrow Date : 11/10/14 Prepared by : MN PRAPHUL & ASWINI N Assistant professor ECE Department Engineered for Tomorrow Subject Name: Fundamentals.
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
CSE477 L25 Memory Peripheral.1Irwin&Vijay, PSU, 2003 CSE477 VLSI Digital Circuits Fall 2003 Lecture 25: Peripheral Memory Circuits Mary Jane Irwin (
RAM RAM - random access memory RAM (pronounced ramm) random access memory, a type of computer memory that can be accessed randomly;
Norhayati Soin 06 KEEE 4426 WEEK 15/1 6/04/2006 CHAPTER 6 Semiconductor Memories.
Lecture 19: SRAM.
Presentation transcript:

August VLSI Memory Design Shmuel Wimer Bar Ilan University, School of Engineering

August A memory has 2 n words of 2 m bits each. Usually 2 n >> 2 m, (e.g. 1M Vs. 64) which will result a very tall structure. The array is therefore folded into 2 n-k rows, each containing 2 k words, namely, every row contains 2 m+k bits. Consider 8-words of 4-bit memory. We’d like to organize it in 4 lines and 8 columns. The memory is folded into 4- word by 8-bit, so n=3, m=2 and k=1. Larger memories are built from smaller sub-arrays to maintain short word and bit lines.

August Word-lines Bit-lines Bit-line Conditioning Row Decoder Column Decoder Column Circuitry Array of 2 n x2 m cells, organized in 2 n-k rows by 2 m+k columns 2 m bits n-k n k General Memory architecture 4-word by 8-bit folded memory

August Transistor SRAM Cell When write=1 the value at the bit is passed to the middle inverter while the upper tri-state inverter is in high Z. Once write=0 the upper and the center inverters are connected in a positive feedback loop to retain cell’s value as long as write=0. bit The value of bit-line needs to override the value stored at the cell. It requires careful design of transistor size for proper operation.

August Transistor SRAM Cell bit When read=1 the output of the lower tri-state inverter gets connected to the bit so cell’s value appears on the bit-line. The bit-line is first pre- charged to one, so only if the value stored at cell is zero the bit-line is pulled down.

August Though robust, 12-transistor cell consumes large area. Since it dominates the SRAM area, a 6-transistor is proposed, where some of the expense is charged on the peripheral circuits. 6-Transistor SRAM Cell

August Layout of IBM 0.18u SRAM cell Layout design Lithography simulation Silicon

August SRAM operation is divided into two phases called Φ1 and Φ2, which can be obtained by clk and its complement. Read Write Operations P1 P2 N1 N2 N4 N3 Pre-charge both bit-lines high. Turn on word-line. One of the bit-lines must be pulled-down. Since bit-line was high, the 0 node will go positive for a short time, but must not go too high to avoid cell switch. This is called read stability.read stability

August Read Stability A must remain below threshold, otherwise cell may flip. Therefore N1>>N2.

August Weak Medium Strong Let A=0 and assume that we write 1 into cell. In that case bit is pre-charged high and its complement should be pulled down. It follows from read stability that N1>>N2 hence A=1 cannot be enforced through N2. Hence A complement must be enforced through N4, implying N4>>P2. This constraint is called writability.writability P1 P2 N1 N2 N4 N3

August Writability

August SRAM Column Read Operation For delay reduction outputs can be sensed by high-skew inverters (low noise margin). SRAM Cell Bit-line Conditioning H H word Φ2Φ2 Φ1Φ1 bit Bit-lines are precharged high

August SRAM Column Write Operation data write Write Driver SRAM Cell Bit-line Conditioning Bit-lines (and complements) are precharged high. At write one is pulled down. Write operation overrides one of the pMOS transistors of the loop inverters. Therefore, the series resistance of transistors in write driver must be low enough to overpower the pMOS transistors.

August Decoders To decode word-lines we need AND gates of n-k inputs. This is a problem when fan-in of more than 4 since it slows down decoding. It is possible to break the AND gates into few levels as shown in the 4:16 decoder. common factor word15 word1 word0 A0 A1 A2 A3

August A0 A1 A2 A3 word15 word1 word0 Terms repeat themselves, so pre-decoding will eliminate the redundant ones. This is called pre-decoding. Less area with same drive as before. Pre-coded Lines

August word0 word1 word2 word3 Vcc x x x x 2x Lyon-Schediwy Fast Decoders In a NOR implementation output is pulled up via serial pMOS devices, which slows transition, so pMOS needs sizing, but this consumes lot of area. Since only single word is pulled up at a time, pMOS can be shared between words in a binary tree fashion and sized to yield same current as in pull down.

August Sum-addressed Decoders Sometimes an address of memory is calculated as BASE+OFFSET (e.g., in a cache), which requires an addition before decoding. Addition can be time consuming if Ripple Carry Adder (RCA) is used, and even Carry Look Ahead (CLA) my be too slow. It is possible to use a K = A + B comparator without carry propagation or look-ahead calculation.

August Sum-addressed Decoders If we know A and B, we can deduce what must be the carry in of every bit if it would happen that K = A + B. But then we can also deduce what should be the carry out. It follows that if every bit pair agrees on the carry out of the previous with the carry in of the next, then K=A+B is true indeed. We can therefore use a comparator to every word-line (k), where equality will hold only for one word.

August AiAi BiBi KiKi C in_i (required) C out_i (generated) We can derive the equations of the carries from the required and generated carries below.

August

August

August

August

August Below is a comparison of sum-addressed decoder with ordinary decoder combined with a ripple carry adder (RCA) and carry look ahead adder (CLA). A significant delay and area improvement is achieved.

August Bit-Line Conditioning Circuits Used to precharge bits high before R/W operation. Most simple is the following circuit. If a clock is not available it is possible to use a weak pMOS device connected as a pseudo-nMOS SRAM. Precharge can be done with nMOS, a case where precharge voltage is V dd -V t. It results faster R/W since swing is smaller, but noise margin is worsen.

August Each column contains write driver and read sensing circuit. A high skew read inverter has been shown. Sense amplifier provides faster sensing by responding to a smaller voltage swing.write driver high skew read inverter N2 N1 N3 P1 P2 This is a differential analog pair. N3 is a current source where current flows either in left or right branches. Circuit doesn’t need a clock but it consumes significant amount of DC power. Sense Amplifiers

August Regenerative Feedback Isolation Devices To speed up response bit-lines are disconnected at sensing to avoid their high capacitive load. The regenerative feedback loop is now isolated. When sense clock is high the values stored in bit-lines are regenerated, while the lines are disconnected, speeding up response.

August Regenerative Feedback Isolation Devices Sense amplifiers are susceptible to differential noise on bit-lines since they respond to small voltage differences.

August The SRAM is physically organized by 2 n-k rows and 2 m+k columns. Each row has 2 m groups of 2 k bits. Therefore, a 2 k :1 column multiplexers are required to extract the appropriate 2 m bits from the 2 m+k ones. Column Multiplexers

August Tree Decoder Column Multiplexer To sense Amps and Write Circuits The problem of this MUX is the delay occurring by the series of pass transistors.

August It is possible to implement the multiplexer such that data is passed trough a single transistor, while column decoding takes place concurrently with row decoding, thus not affecting delay. A1 A0 B0 B1B2 B3 B0 B1 Y

August DRAM – Dynamic RAM Store their charge on a capacitor rather than in a feedback loop Basic cell is substantially smaller than SRAM. To avoid charge leakage it must be periodically read and refresh It is built in a special process technology optimized for density Offers order of magnitude higher density than SRAM but has much higher latency than SRAM

August word bit C cell x A 1-transistor (1T) DRAM cell consists of a transistor and a capacitor. Cell is accessed by asserting the word- line to connect the capacitor to the bit- line. word On a read the bit-line is first precharged to V DD /2. When the word-line rises, the capacitor shares its charge with the bit- line, causing a voltage change of ΔV that can be sensed. bit x The read disturbs the cell contents at x, so the cell must be re-written after each read. On a write the voltage of the bit-line is forced onto the capacitor

August DRAM Cell C cell must be small to obtain high density, but big enough to obtain voltage swing at read.

August Like SRAMs, large DRAMs are divided into sub-arrays, whose size represents a tradeoff between area and performance. Large sub-arrays amortize sense amplifiers and decoders among more cells but are slower and have less swing due to higher capacitance of word and bit lines. bit0 bit1bit511 word0 word1 word255 Bit-line capacitance is far larger than cell, hence voltage swing ΔV during read is very small and sense amplifier is used.

August Open bit-line architecture Open bit-line architecture Is useful for small DRAMs. It has dense layout but sense amps are exposed to differential noise since their inputs come from different sub-arrays, while word line is asserted in one array. Folded bit-line architecture Folded bit-line architecture solves the problem of differential noise on the account of area expansion. Sense amps input are connected to adjacent bit-lines exposed to similar noise sources. When a word-line is asserted, one bit line is being read while its neighbor serves as the quiet reference. Smart layout Smart layout and aggressive manufacturing design rules (e.g. 45 degrees polygons) enable effective area increase of only 33%.

August Sub-array 1 Sub-array 2 Word-Line Decoders Sense Amps Open Bit-Line Architecture

August Word-Line Decoders Sense Amps Folded Bit-Line Architecture

August Word-Line Decoder Sense Amp Polysilicon Word-Line Metal Bit-Line n+ Diffusion Bit-Line Contact Capacitor

August P1 P2 N1 N2 VpVp VnVn bit’ bit” DRAM Sense Amp bit’ and bit” are initialized to V DD /2. V p =0 and V n = V DD /2, so all transistors are initially OFF. During read one bit-line is changing while the other stays float in V DD /2. Let bit’ change to 0. Once it reaches V DD /2-V t, N1 conducts and it follows bit’. Hence V n is pulled down.Hence V n is pulled down Meanwhile bit” is pulled up, which opens P2 and raise V p to V DD.

August V DD /2 V DD Bit’ Bit” 0 V DD /2 V DD VnVn VpVp 0