UNIT- III SEQUENTIAL LOGIC CIRCUITS. Static Latches and Registers The Bistability Principle: Static memories use positive feedback to create a bistable.

Slides:



Advertisements
Similar presentations
Fig Typical voltage transfer characteristic (VTC) of a logic inverter, illustrating the definition of the critical points.
Advertisements

Semiconductor Memory Design. Organization of Memory Systems Driven only from outside Data flow in and out A cell is accessed for reading by selecting.
Sistemi Elettronici Programmabili1 Progettazione di circuiti e sistemi VLSI Anno Accademico Lezione Memorie (vedi anche i file pcs1_memorie.pdf.
VLSI Design EE 447/547 Sequential circuits 1 EE 447/547 VLSI Design Lecture 9: Sequential Circuits.
Introduction to CMOS VLSI Design Sequential Circuits
ECE C03 Lecture 81 Lecture 8 Memory Elements and Clocking Hai Zhou ECE 303 Advanced Digital Design Spring 2002.
Designing Sequential Logic Circuits
MICROELETTRONICA Sequential circuits Lection 7.
Sequential circuits Part 1: flip flops All illustrations  , Jones & Bartlett Publishers LLC, (
Lecture 11: Sequential Circuit Design. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 11: Sequential Circuits2 Outline  Sequencing  Sequencing Element Design.
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits David Harris Harvey Mudd College Spring 2004.
CP208 Digital Electronics Class Lecture 11 May 13, 2009.
Digital Integrated Circuits A Design Perspective
Sequential Circuits. Outline  Floorplanning  Sequencing  Sequencing Element Design  Max and Min-Delay  Clock Skew  Time Borrowing  Two-Phase Clocking.
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
Introduction to CMOS VLSI Design Clock Skew-tolerant circuits.
Sequential Definitions  Use two level sensitive latches of opposite type to build one master-slave flipflop that changes state on a clock edge (when the.
CSE477 L19 Timing Issues; Datapaths.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 19: Timing Issues; Introduction to Datapath.
Clock Design Adopted from David Harris of Harvey Mudd College.
ECE 424 – Introduction to VLSI Design Emre Yengel Department of Electrical and Communication Engineering Fall 2014.
Designing Combinational Logic Circuits: Part2 Alternative Logic Forms:
11/29/2004EE 42 fall 2004 lecture 371 Lecture #37: Memory Last lecture: –Transmission line equations –Reflections and termination –High frequency measurements.
Digital Integrated Circuits© Prentice Hall 1995 Memory SEMICONDUCTOR MEMORIES.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 32: Array Subsystems (DRAM/ROM) Prof. Sherief Reda Division of Engineering,
Introduction to CMOS VLSI Design SRAM/DRAM
Registers  Flip-flops are available in a variety of configurations. A simple one with two independent D flip-flops with clear and preset signals is illustrated.
Memory and Advanced Digital Circuits 1.
Chapter #6: Sequential Logic Design 6.2 Timing Methodologies
Contemporary Logic Design Sequential Logic © R.H. Katz Transparency No Chapter #6: Sequential Logic Design Sequential Switching Networks.
Digital Integrated Circuits for Communication
55:035 Computer Architecture and Organization
Semiconductor Memories Lecture 1: May 10, 2006 EE Summer Camp Abhinav Agarwal.
© Digital Integrated Circuits 2nd Sequential Circuits Digital Integrated Circuits A Design Perspective Designing Sequential Logic Circuits Jan M. Rabaey.
Digital Integrated Circuits A Design Perspective
MOS Transistors The gate material of Metal Oxide Semiconductor Field Effect Transistors was original made of metal hence the name. Present day devices’
EE415 VLSI Design DYNAMIC LOGIC [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
Review: Basic Building Blocks  Datapath l Execution units -Adder, multiplier, divider, shifter, etc. l Register file and pipeline registers l Multiplexers,
CSE477 L17 Static Sequential Logic.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 17: Static Sequential Circuits Mary Jane Irwin.
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n Latches and flip-flops. n RAMs and ROMs.
Modern VLSI Design 4e: Chapter 6 Copyright  2008 Wayne Wolf Topics Memories: –ROM; –SRAM; –DRAM; –Flash. Image sensors. FPGAs. PLAs.
Microelectronic Circuits - Fourth Edition Sedra/Smith 0 PowerPoint Overheads to Accompany Sedra/Smith Microelectronic Circuits 4/e ©1999 Oxford University.
DCSL & LVDCSL: A High Fan-in, High Performance Differential Current Switch Logic Families Dinesh Somasekhaar, Kaushik Roy Presented by Hazem Awad.
Memory and Storage Dr. Rebhi S. Baraka
Memory Semiconductor Memory Classification ETEG 431 SG Size: Bits, Bytes, Words. Timing Parameter: Read, Write Cycle… Function: ROM, RWM, Volatile, Static,
CSE477 L24 RAM Cores.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 24: RAM Cores Mary Jane Irwin ( )
ECE 300 Advanced VLSI Design Fall 2006 Lecture 19: Memories
SEQUENTIAL CIRCUITS Component Design and Use. Register with Parallel Load  Register: Group of Flip-Flops  Ex: D Flip-Flops  Holds a Word of Data 
CSE477 L23 Memories.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 23: Semiconductor Memories Mary Jane Irwin (
Washington State University
Advanced VLSI Design Unit 04: Combinational and Sequential Circuits.
Digital Integrated Circuits for Communication
Computer Memory Storage Decoding Addressing 1. Memories We've Seen SIMM = Single Inline Memory Module DIMM = Dual IMM SODIMM = Small Outline DIMM RAM.
Washington State University
CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 22: Memery, ROM
Dynamic Logic Dynamic Circuits will be introduced and their performance in terms of power, area, delay, energy and AT2 will be reviewed. We will review.
Sp09 CMPEN 411 L18 S.1 CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 16: Static Sequential Circuits [Adapted from Rabaey’s Digital Integrated Circuits,
CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 24: Peripheral Memory Circuits [Adapted from Rabaey’s Digital Integrated Circuits, Second Edition,
ECE C03 Lecture 81 Lecture 8 Memory Elements and Clocking Hai Zhou ECE 303 Advanced Digital Design Spring 2002.
Introduction to Computer Organization and Architecture Lecture 7 By Juthawut Chantharamalee wut_cha/home.htm.
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
Synchronous Sequential Circuits by Dr. Amin Danial Asham.
CS151 Introduction to Digital Design Chapter 5: Sequential Circuits 5-1 : Sequential Circuit Definition 5-2: Latches 1Created by: Ms.Amany AlSaleh.
Review: Sequential Definitions
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
Latches, Flip Flops, and Memory ECE/CS 252, Fall 2010 Prof. Mikko Lipasti Department of Electrical and Computer Engineering University of Wisconsin – Madison.
Memory (Contd..) Memory Timing: Definitions ETEG 431 SG.
Digital Integrated Circuits A Design Perspective
SEQUENTIAL LOGIC -II.
Semiconductor Memories
332:578 Deep Submicron VLSI Design Lecture 14 Design for Clock Skew
Presentation transcript:

UNIT- III SEQUENTIAL LOGIC CIRCUITS

Static Latches and Registers The Bistability Principle: Static memories use positive feedback to create a bistable circuit — a circuit having two stable states that represent 0 and 1. 2 cascaded inverters

Static Latches and Registers voltage transfer characteristics

The resulting circuit has only three possible operation points (A, B, and C), as demonstrated on the combined VTC. Under the condition that the gain of the inverter in the transient region is larger than 1, only A and B are stable operation points, and C is a metastable operation point. A bistable circuit has two stable states. Static Latches and Registers

In absence of any triggering, the circuit remains in a single state (assuming that the power supply remains applied to the circuit), and hence remembers a value. A trigger pulse must be applied to change the state of the circuit. Another common name for a bistable circuit is flip-flop. Static Latches and Registers

SR Flip-Flops The cross-coupled inverter pair provides an approach to store a binary variable in a stable way. However, extra circuitry must be added to enable control of the memory states. NOR-based SR flip-flop

SR Flip-Flops

When both S and R are 0, the flip-flop is in a quiescent state and both outputs retain their value. If a positive (or 1) pulse is applied to the S input, the Q output is forced into the 1 state. Vice versa, a 1 pulse on R resets the flip-flop and the Q output goes to 0. The characteristic table is the truth table of the gate and lists the output states as functions of all possible input conditions. SR Flip-Flops

Most systems operate in a synchronous fashion with transition events referenced to a clock. One possible realization of a clocked SR flip-flop— a level-sensitive positive latch. It consists of a cross-coupled inverter pair i, plus 4 extra transistors to drive the flip-flop from one state to another and to provide clocked operation. SR Flip-Flops

The combination of transistorsM4, M7, and M8 forms a ratioed inverter. In order to make the latch switch, we must succeed in bringingQ below the switching threshold of the inverterM1 -M2. Once this is achieved, the positive feedback causes the flip-flop to invert states. The presented flip-flop does not consume any static power. SR Flip-Flops

Multiplexer Based Latches Multiplexer based latches can provide similar functionality to the SR latch, but has the important added advantage that the sizing of devices only affects performance and is not critical to the functionality.

For a negative latch, when the clock signal is low, the input 0 of the multiplexer is selected, and the D input is passed to the output. When the clock signal is high, the input 1 of the multiplexer, which connects to the output of the latch, is selected. The feedback holds the output stable while the clock signal is high. Similarly in the positive latch, the D input is selected when clock is high, and the output is held (using feedback) when clock is low. Multiplexer Based Latches

Transistor level implementation of a positive latch built using transmission gates. When CLK is high, the bottom transmission gate ison and the latch is transparent - that is, the D input is copied to the Q output. During this phase, the feedback loop is open since the top transmission gate is off.

Master-Slave Based Edge Triggered Register The most common approach for constructing an edge- triggered register is to use a master-slave configuration. The register consists of cascading a negative latch (master stage) with a positive latch (slave stage).

On the low phase of the clock, the master stage is transparent and the D input is passed to the master stage output, Q M. During this period, the slave stage is in the hold mode, keeping its previous value using feedback. On the rising edge of the clock, the master slave stops sampling the input, and the slave stage starts sampling. During the high phase of the clock, the slave stage samples the output of the master stage (QM), while the master stage remains in a hold mode. Master-Slave Based Edge Triggered Register

When clock is low (CLK = 1), T1 is on and T2 is off, and the D input is sampled onto node Q M. When the clock goes high, the master stage stops sampling the input and goes into a hold mode.

Low-Voltage Static Latches The scaling of supply voltages is critical for low power operation. Unfortunately, certain latch structures don’t function at reduced supply voltages. Scaling to low supply voltages hence requires the use of reduced threshold devices. When the registers are constantly accessed, the leakage energy is typically insignificant compared to the switching power. However, with the use of conditional clocks, it is possible that registers are idle for extended periods and the leakage energy expended by registers can be quite significant.

Low-Voltage Static Latches Many solutions are being explored to address the problem of high leakage during idle periods.

Dynamic Latches and Registers Storage in a static sequential circuit relies on the concept that a cross-coupled inverter pair produces a bistable element and can thus be used to memorize binary values. The major disadvantage of the static gate, however, is its complexity. The principle is exactly identical to the one used in dynamic logic — charge stored on a capacitor can be used to represent a logic signal. The absence of charge denotes a 0, while its presence stands for a stored 1.

Dynamic Transmission-Gate Based Edge-triggred Registers When CLK = 0, the input data is sampled on storage node 1, which has an equivalent capacitance of C 1 consisting of the gate capacitance of I 1, the junction capacitance of T 1, and the overlap gate capacitance of T 1.

During this period, the slave stage is in a hold mode, with node 2 in a high-impedance (floating) state. On the rising edge of clock, the transmission gate T2 turns on, and the value sampled on node 1 right before the rising edge propagates to the output Q (note that node 1 is stable during the high phase of the clock since the first transmission gate is turned off). Node 2 now stores the inverted version of node 1. This implementation of an edge-triggered register is very efficient as it requires only 8 transistors. Dynamic Transmission-Gate Based Edge-triggred Registers

C MOS Dynamic Register: A Clock Skew Insensitive Approach The C MOS Register 2 2

CLK = 0 (CLK = 1): The first tri-state driver is turned on, and the master stage acts as an inverter sampling the inverted version of D on the internal node X. The master stage is in the evaluation mode. Meanwhile, the slave section is in a high-impedance mode, or in a hold mode. The roles are reversed when CLK = 1. C MOS Dynamic Register: A Clock Skew Insensitive Approach

True Single-Phase Clocked Register (TSPCR) In the two-phase clocking schemes described above, care must be taken in routing the two clock signals to ensure that overlap is minimized. The True Single-Phase Clocked Register (TSPCR) uses a single clock (without an inverse clock).

For the positive latch, when CLK is high, the latch is in the transparent mode and corresponds to two cascaded inverters; the latch is non-inverting, and propagates the input to the output. When CLK = 0, both inverters are disabled, and the latch is in hold-mode. Only the pull-up networks are still active, while the pull- down circuits are deactivated. A register can be constructed by cascading positive and negative latches. True Single-Phase Clocked Register (TSPCR)

The main advantage is the use of a single clock phase. The disadvantage is the slight increase in the number of transistors — 12 transistors are required. TSPC offers an additional advantage: the possibility of embedding logic functionality into the latches. This reduces the delay overhead associated withthe latches. True Single-Phase Clocked Register (TSPCR)

When CLK = 0, the input inverter is sampling the inverted D input on node X. The second (dynamic) inverter is in the precharge mode. The third inverter is in the hold mode. True Single-Phase Clocked Register (TSPCR)

Pulse Registers A fundamentally different approach for constructing a register uses pulse signals. The idea is to construct a short pulse around the rising (or falling) edge of the clock. This pulse acts as the clock input to a latch, sampling the input only in a short window. Race conditions are thus avoided by keeping the opening time (i.e, the transparent period) of the latch very short. The combination of the glitch generation circuitry and the latch results in a positive edge-triggered register.

Pulse Registers

This in turn activates MN, pulling X and eventually CLKG low. The length of the pulse is controlled by the delay of the AND gate and the two inverters. Pulse Registers

The advantage of the approach is the reduced clock load and the small number of transistors required. The glitch-generation circuitry can be amortized over multiple register bits. The disadvantage is a substantial increase in verification complexity. This has prevented a wide-spread use. Pulse Registers

Sense-Amplifier Based Registers A sense amplifier structure to implement an edge- triggered register. Sense amplifier circuits accept small input signals and amplify them to generate rail-to-rail swings. There are many techniques to construct these amplifiers, with the use of feedback (e.g., cross-coupled inverters).

Sense-Amplifier Based Registers Positive edge-triggered register based on sense-amplifier

The circuit uses a precharged front-end amplifier that samples the differential input signal on the rising edge of the clock signal. The outputs of front-end are fed into a NAND cross- coupled SR FF that holds the data and gurantees that the differential outputs switch only once per clock cycle. The differential inputs in this implementation don’t have to have rail-to-rail swing and hence this register can be used as a receiver for a reduced swing differential bus. Sense-Amplifier Based Registers

Pipelining: An approach to optimize sequential circuits Pipelining is a popular design technique often used to accelerate the operation of the datapaths in digital processors. The goal of the presented circuit is to computelog(|a - b|), where both a and b represent streams of numbers, that is, the computation must be performed on a large set of input values. The minimal clock period Tmin necessary to ensure correct evaluation is given as:

Where t c-q and t su are the propagation delay and the set- up time of the register, respectively. The term tpd,logic stands for the worst-case delay path through the combinatorial network, which consists of the adder, absolute value, and logarithm functions. Pipelining is a technique to improve the resource utilization, and increase the functional throughput. Pipelining: An approach to optimize sequential circuits

The advantage of pipelined operation becomes apparent when examining the minimum clock period of the modified circuit. The combinational circuit block has been partitioned into three sections, each of which has a smallerp ropagation delay than the original function. This effectively reduces the value of the minimum allowable clock period: Pipelining: An approach to optimize sequential circuits

Suppose that all logic blocks have approximately the same propagation delay, and that the register overhead is small with respect to the logic delays. The pipelined network outperforms the original circuit by a factor of three under these assumptions, oTr min,pipe= Tmin/3. The increased performance comes at the relatively small cost of two additional registers, and an increased latency. Pipelining: An approach to optimize sequential circuits

Latch- vs. Register-Based Pipelines Pipelined circuits can be constructed using level- sensitive latches instead of edge-triggered registers. The pipeline system is implemented based on pass- transistor-based positive and negative latches instead of edge triggered registers. That is, logic is introduced between the master and slave latches of a master-slave system. Latch-based systems give significantly more flexibility in implementing a pipelined system, and often offers higher performance.

Latch- vs. Register-Based Pipelines Operation of two-phase pipelined circuit using dynamic registers

NORA-CMOS— A Logic Style for Pipelined Structures This topology has one important property: A - based pipelined circuit is race-free as long as all the logic functionsF (implemented using static logic) between the latches are noninverting. The only way a signal can race from stage to stage under this condition is when the logic functionF is inverting where F is replaced by a single, static CMOS inverter.

NORA-CMOS— A Logic Style for Pipelined Structures

Logic and latch are clocked in such a way that both are simultaneously in either evaluation, or hold (precharge) mode. A block that is in evaluation during CLK = 1 is called a CLK-module, while the inverse is called a CLK-module. A NORA datapath consists of a chain of alternating CLK and CLK modules. While one class of modules is precharging with its output latch in hold mode, preserving the previous output value, the other class is evaluating. NORA-CMOS— A Logic Style for Pipelined Structures -----

Memory architecture

Semiconductor Memory Classification Read-Write Memory Non-Volatile Read-Write Memory Read-Only Memory EPROM E 2 PROM FLASH Random Access Non-Random Access SRAM DRAM Mask-Programmed Programmable (PROM) FIFO Shift Register CAM LIFO

Memory Timing: Definitions

Memory Architecture: Decoders Word 0 Word 1 Word 2 WordN 2 2 N 2 1 Storage cell M bitsM N words S 0 S 1 S 2 S N 2 2 A 0 A 1 A K 2 1 K 5 log 2 N S N 2 1 Word 0 Word 1 Word 2 WordN 2 2 N 2 1 Storage cell S 0 Input-Output (M bits) Intuitive architecture for N x M memory Too many select signals: N words == N select signals K = log 2 N Decoder reduces the number of select signals Input-Output (M bits) Decoder

Contents-Addressable Memory Address Decoder I/O Buffers Commands 2 9 Validity Bits Priority Encoder Address Decoder I/O Buffers Commands 2 9 Validity Bits Priority Encoder

Memory Timing: Approaches DRAM Timing Multiplexed Adressing SRAM Timing Self-timed

Read-Only Memory Cells WL BL WL BL 1 WL BL WL BL WL BL 0 V DD WL BL GND Diode ROMMOS ROM 1MOS ROM 2

MOS OR ROM WL[0] V DD BL[0] WL[1] WL[2] WL[3] V bias BL[1] Pull-down loads BL[2]BL[3] V DD

MOS NOR ROM WL[0] GND BL[0] WL[1] WL[2] WL[3] V DD BL[1] Pull-up devices BL[2]BL[3] GND

MOS NAND ROM All word lines high by default with exception of selected row WL[0] WL[1] WL[2] WL[3] V DD Pull-up devices BL[3]BL[2]BL[1]BL[0]

Equivalent Transient Model for MOS NOR ROM Word line parasitics –Wire capacitance and gate capacitance –Wire resistance (polysilicon) Bit line parasitics –Resistance not dominant (metal) –Drain and Gate-Drain capacitance Model for NOR ROM V DD C bit r word c WL BL

Equivalent Transient Model for MOS NAND ROM  Word line parasitics  Similar to NOR ROM  Bit line parasitics  Resistance of cascaded transistors dominates  Drain/Source and complete gate capacitance Model for NAND ROM V DD C L r word c c bit r WL BL

Non-Volatile Memories The Floating-gate transistor (FAMOS) Floating gate Source Substrate Gate Drain n + n +_ p t ox t Device cross-section Schematic symbol G S D

Floating-Gate Transistor Programming 0 V 2 5 V 0 V DS Removing programming voltage leaves charge trapped 5 V V 5 V DS Programming results in higherV T. 20 V 10 V5 V 20 V DS Avalanche injection

Flash EEPROM Control gate erasure p-substrate Floating gate Thin tunneling oxide n 1 source n 1 drain programming Many other options …

Basic Operations in a NOR Flash Memory―Erase

Basic Operations in a NOR Flash Memory―Write

Basic Operations in a NOR Flash Memory―Read

NAND Flash Memory Unit Cell Word line(poly) Source line (Diff. Layer) Courtesy Toshiba

Read-Write Memories (RAM)  STATIC (SRAM)  DYNAMIC (DRAM) Data stored as long as supply is applied Large (6 transistors/cell) Fast Differential Periodic refresh required Small (1-3 transistors/cell) Slower Single Ended

6-transistor CMOS SRAM Cell WL BL V DD M 5 M 6 M 4 M 1 M 2 M 3 BL Q Q

CMOS SRAM Analysis (Read) WL BL V DD M 5 M 6 M 4 M 1 V V V BL Q = 1 Q = 0 C bit C

CMOS SRAM Analysis (Write) BL = 1 = 0 Q = 0 Q = 1 M 1 M 4 M 5 M 6 V DD V WL

3-Transistor DRAM Cell No constraints on device ratios Reads are non-destructive Value stored at node X when writing a “1” = V WWL -V Tn WWL BL1 M 1 X M 3 M 2 C S 2 RWL V DD V 2 V T D V V 2 V T BL2 1 X RWL WWL

1-Transistor DRAM Cell Write: C S is charged or discharged by asserting WL and BL. Read: Charge redistribution takes places between bit line and storage capacitance Voltage swing is small; typically around 250 mV.  V BL V PRE –V BIT V PRE – C S C S C BL == V

DRAM Cell Observations 1T DRAM requires a sense amplifier for each bit line, due to charge redistribution read-out. DRAM memory cells are single ended in contrast to SRAM cells. The read-out of the 1T DRAM cell is destructive; read and refresh operations are necessary for correct operation. Unlike 3T cell, 1T cell requires presence of an extra capacitance that must be explicitly included in the design. When writing a “1” into a DRAM cell, a threshold voltage is lost. This charge loss can be circumvented by bootstrapping the word lines to a higher value than V DD

Static CAM Memory Cell

CAM in Cache Memory Address Decoder Hit Logic CAM ARRAY Input Drivers TagHit Address SRAM ARRAY Sense Amps / Input Drivers DataR/W

Row Decoders Collection of 2 M complex logic gates Organized in regular and dense fashion (N)AND Decoder NOR Decoder

Hierarchical Decoders A 2 A 2 A 2 A 3 WL 0 A 2 A 3 A 2 A 3 A 2 A 3 A 3 A 3 A 0 A 0 A 0 A 1 A 0 A 1 A 0 A 1 A 0 A 1 A 1 A 1 1 Multi-stage implementation improves performance NAND decoder using 2-input pre-decoders

Dynamic Decoders Precharge devices V DD  GND WL A 0 A 0 GND A 1 A 1  WL 3 A 0 A 0 A 1 A V DD V V V 2-input NOR decoder 2-input NAND decoder

4-input pass-transistor based column decoder Advantages: speed (t pd does not add to overall memory access time) Only one extra transistor in signal path Disadvantage: Large transistor count 2-input NOR decoder A 0 S 0 BL A 1 S 1 S 2 S 3 D

4-to-1 tree based column decoder Number of devices drastically reduced Delay increases quadratically with # of sections; prohibitive for large decoders buffers progressive sizing combination of tree and pass transistor approaches Solutions :: BL D A 0 A 0 A 1 A 1

Decoder for circular shift- register

Sense Amplifiers t p C  V  I av = make  V as small as possible smalllarge Idea: Use Sense Amplifer output input s.a. small transition

Differential Sense Amplifier Directly applicable to SRAMs M 4 M 1 M 5 M 3 M 2 V DD bit SE Out y

Differential Sensing ― SRAM

Latch-Based Sense Amplifier (DRAM) Initialized in its meta-stable point with EQ Once adequate voltage gap created, sense amp enabled with SE Positive feedback quickly forces output to a stable operating point. EQ V DD BL SE

Sources of Power Dissipation in Memories PERIPHERY ROW DEC selected non-selected CHIP COLUMN DEC nC DE V INT f mC DE V INT f C PT V INT f I DCP ARRAY m n m(n 2 1)i hld mi act V DD V SS I DD 5S C i D V i f 1S I DCP From [Itoh00]

Suppressing Leakage in SRAM SRAM cell SRAM cell SRAM cell V DD,int V DD V V DDL V SS,int sleep SRAM cell SRAM cell SRAM cell V DD,int sleep low-threshold transistor Reducing the supply voltage Inserting Extra Resistance

Clocking Synchronous systems use a clock to keep operations in sequence –Distinguish this from previous or next –Determine speed at which machine operates Clock must be distributed to all the sequencing elements –Flip-flops and latches Also distribute clock to other elements –Domino circuits and memories

Clock Distribution On a small chip, the clock distribution network is just a wire –And possibly an inverter for clkb On practical chips, the RC delay of the wire resistance and gate load is very long –Variations in this delay cause clock to get to different elements at different times –This is called clock skew Most chips use repeaters to buffer the clock and equalize the delay –Reduces but doesn’t eliminate skew

Review: Skew Impact Ideally full cycle is available for work Skew adds sequencing overhead Increases hold time too

Reduce clock skew –Careful clock distribution network design –Plenty of metal wiring resources Analyze clock skew –Only budget actual, not worst case skews –Local vs. global skew budgets Tolerate clock skew –Choose circuit structures insensitive to skew

Skew Tolerance Flip-flops are sensitive to skew because of hard edges –Data launches at latest rising edge of clock –Must setup before earliest next rising edge of clock –Overhead would shrink if we can soften edge Latches tolerate moderate amounts of skew –Data can arrive anytime latch is transparent

Skew: Latches 2-Phase Latches Pulsed Latches

Dynamic Circuit Review Static circuits are slow because fat pMOS load input Dynamic gates use precharge to remove pMOS transistors from the inputs –Precharge:  = 0output forced high –Evaluate:  = 1output may pull low

Domino Circuits Dynamic inputs must monotonically rise during evaluation –Place inverting stage between each dynamic gate –Dynamic / static pair called domino gate Domino gates can be safely cascaded

Clock Skew Skew increases sequencing overhead –Traditional domino has hard edges –Evaluate at latest rising edge –Setup at latch by earliest falling edge

Time Borrowing Logic may not exactly fit half-cycle –No flexibility to borrow time to balance logic between half cycles Traditional domino sequencing overhead is about 25% of cycle time in fast systems!

Skew-Tolerant Domino Use overlapping clocks to eliminate latches at phase boundaries. –Second phase evaluates using results of first

Full Keeper After second phase evaluates, first phase precharges Input to second phase falls –Violates monotonicity? But we no longer need the value Now the second gate has a floating output –Need full keeper to hold it either high or low

Time Borrowing Overlap can be used to –Tolerate clock skew –Permit time borrowing No sequencing overhead

Multiple Phases With more clock phases, each phase overlaps more –Permits more skew tolerance and time borrowing

Clock Generation

Timing issues Set up and hold time: Every flip-flop has restrictive time regions around the active clock edge in which input should not change We call them restrictive because any change in the input in this regions the output may be the expected one It may be derived from either the old input, the new input, or even in between the two.

Timing issues The setup time is the interval before the clock where the data must be held stable. The hold time is the interval after the clock where the data must be held stable. Hold time can be negative, which means the data can change slightly before the clock edge and still be properly captured. Most of the current day flip-flops has zero or negative hold time.

Timing issues

To avoid setup time violations: The combinational logic between the flip-flops should be optimized to get minimum delay. Redesign the flip-flops to get lesser setup time. Tweak launch flip-flop to have better slew at the clock pin, this will make launch flip-flop to be fast there by helping fixing setup violations. Play with clock skew (useful skews). To avoid hold time violations: By adding delays (using buffers). One can add lockup-latches (in cases where the hold time requirement is very huge, basically to avoid data slip).