332:578 Deep Submicron VLSI Design Lecture 14 Design for Clock Skew

Slides:



Advertisements
Similar presentations
Transmission Gate Based Circuits
Advertisements

ECE C03 Lecture 71 Lecture 7 Delays and Timing in Multilevel Logic Synthesis Hai Zhou ECE 303 Advanced Digital Design Spring 2002.
Sequential Circuit Design
Modern VLSI Design 4e: Chapter 5 Copyright  2008 Wayne Wolf Topics n Performance analysis of sequential machines.
Introduction to CMOS VLSI Design Sequential Circuits.
VLSI Design EE 447/547 Sequential circuits 1 EE 447/547 VLSI Design Lecture 9: Sequential Circuits.
Introduction to CMOS VLSI Design Sequential Circuits
MICROELETTRONICA Sequential circuits Lection 7.
Lecture 11: Sequential Circuit Design. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 11: Sequential Circuits2 Outline  Sequencing  Sequencing Element Design.
Lecture: 1.6 Tri-states, Mux, Latches & Flip Flops
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits David Harris Harvey Mudd College Spring 2004.
Introduction to CMOS VLSI Design Lecture 1: Circuits & Layout
(Neil west - p: ). Finite-state machine (FSM) which is composed of a set of logic input feeding a block of combinational logic resulting in a set.
Sequential Circuits. Outline  Floorplanning  Sequencing  Sequencing Element Design  Max and Min-Delay  Clock Skew  Time Borrowing  Two-Phase Clocking.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis EE4800 CMOS Digital IC Design & Analysis Lecture 11 Sequential Circuit Design Zhuo Feng.
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN1600) Lecture 21: Dynamic Combinational Circuit Design Prof. Sherief Reda Division of.
Introduction to CMOS VLSI Design Clock Skew-tolerant circuits.
EE141 © Digital Integrated Circuits 2nd Timing Issues 1 Digital Integrated Circuits A Design Perspective Timing Issues Jan M. Rabaey Anantha Chandrakasan.
Lecture 21: Packaging, Power, & Clock
CSE477 L19 Timing Issues; Datapaths.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 19: Timing Issues; Introduction to Datapath.
Clock Design Adopted from David Harris of Harvey Mudd College.
Chapter 11 Timing Issues in Digital Systems Boonchuay Supmonchai Integrated Design Application Research (IDAR) Laboratory August 20, 2004; Revised - July.
Lecture 8: Clock Distribution, PLL & DLL
Introduction to CMOS VLSI Design Circuit Families.
Circuit Families Adopted from David Harris of Harvey Mudd College.
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits Credits: David Harris Harvey Mudd College (Material taken/adapted from Harris’ lecture.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 23: Sequential Circuit Design (1/3) Prof. Sherief Reda Division of Engineering,
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 10.1 EE4800 CMOS Digital IC Design & Analysis Lecture 10 Combinational Circuit Design Zhuo Feng.
© Digital Integrated Circuits 2nd Sequential Circuits Digital Integrated Circuits A Design Perspective Designing Sequential Logic Circuits Jan M. Rabaey.
EE415 VLSI Design DYNAMIC LOGIC [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
EE 447 VLSI Design Lecture 8: Circuit Families.
Introduction to CMOS VLSI Design Lecture 25: Package, Power, Clock, and I/O David Harris Harvey Mudd College Spring 2007.
Modern VLSI Design 3e: Chapter 3 Copyright  1998, 2002 Prentice Hall PTR Topics n Pseudo-nMOS gates. n DCVS gates. n Domino gates.
Digital System Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic Wiley-Interscience.
Modern VLSI Design 4e: Chapter 3 Copyright  2008 Wayne Wolf Topics n Pseudo-nMOS gates. n DCVS logic. n Domino gates. n Design-for-yield. n Gates as IP.
Lecture 10: Circuit Families. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 10: Circuit Families2 Outline  Pseudo-nMOS Logic  Dynamic Logic  Pass Transistor.
Advanced VLSI Design Unit 04: Combinational and Sequential Circuits.
Introduction to CMOS VLSI Design Lecture 9: Circuit Families
Clocking System Design
Dynamic Logic.
EE141 Timing Issues 1 Chapter 10 Timing Issues Rev /11/2003 Rev /28/2003 Rev /05/2003.
EE141 Timing Issues 1 Chapter 10 Timing Issues Rev /11/2003.
Digital Integrated Circuits A Design Perspective
Lecture 11: Sequential Circuit Design
CSE 140 – Discussion 7 Nima Mousavi.
REGISTER TRANSFER LANGUAGE (RTL)
Chapter 7 Designing Sequential Logic Circuits Rev 1.0: 05/11/03
Sequential circuit design with metastability
SEQUENTIAL LOGIC -II.
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits
CMOS VLSI Design Chapter 13 Clocks, DLLs, PLLs
Timing Analysis 11/21/2018.
Lecture 10: Circuit Families
触发器 Flip-Flops 刘鹏 浙江大学信息与电子工程学院 March 27, 2018
Subject Name: Fundamentals Of CMOS VLSI Subject Code: 10EC56
332:479 Concepts in VLSI Design Lecture 24 Power Estimation
Future Directions in Clocking Multi-GHz Systems ISLPED 2002 Tutorial This presentation is available at: under Presentations.
Clocking in High-Performance and Low-Power Systems Presentation given at: EPFL Lausanne, Switzerland June 23th, 2003 Vojin G. Oklobdzija Advanced.
Topics Performance analysis..
Chapter 10 Timing Issues Rev /11/2003 Rev /28/2003
CMOS VLSI Design Chapter 13 Clocks, DLLs, PLLs
Introduction to CMOS VLSI Design Lecture 5: Logical Effort
Day 21: October 29, 2010 Registers Dynamic Logic
Lecture 10: Circuit Families
Lecture 19 Logistics Last lecture Today
Lecture 3: Timing & Sequential Circuits
Presentation transcript:

332:578 Deep Submicron VLSI Design Lecture 14 Design for Clock Skew David Harris Harvey Mudd College Spring 2005

Deep Submicron VLSI Des. Lec. 14 Outline Clock Distribution Clock Skew Skew-Tolerant Static Circuits Traditional Domino Circuits Skew-Tolerant Domino Circuits Summary Material from: CMOS VLSI Design, by Weste and Harris, Addison-Wesley, 2005 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 Clocking Synchronous systems use a clock to keep operations in sequence Distinguish this from previous or next Determine speed at which machine operates Clock must be distributed to all the sequencing elements Flip-flops and latches Also distribute clock to other elements Domino circuits and memories 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 Clock Distribution On a small chip, the clock distribution network is just a wire And possibly an inverter for clkb On practical chips, the RC delay of the wire resistance and gate load is very long Variations in this delay cause clock to get to different elements at different times This is called clock skew Most chips use repeaters to buffer the clock and equalize the delay Reduces but doesn’t eliminate skew 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 Example Skew comes from differences in gate and wire delay With right buffer sizing, clk1 and clk2 could ideally arrive at the same time. But power supply noise changes buffer delays clk2 and clk3 will always see RC skew 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 Review: Skew Impact Ideally full cycle is available for work Skew adds sequencing overhead Increases hold time too 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 Review: Skew Impact 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 Review: Skew Impact 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 Cycle Time Trends Much of CPU performance comes from higher f f is improving faster than simple process shrinks Sequencing overhead is bigger part of cycle 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 Cycle Time Trends 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 Cycle Time Trends 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 Cycle Time Trends 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 Solutions Reduce clock skew Careful clock distribution network design Plenty of metal wiring resources Analyze clock skew Only budget actual, not worst case skews Local vs. global skew budgets Tolerate clock skew Choose circuit structures insensitive to skew 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 Clock Dist. Networks Ad hoc Grids H-tree Hybrid 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 Clock Grids Use grid on two or more levels to carry clock Make wires wide to reduce RC delay Ensures low skew between nearby points But possibly large skew across die 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 Alpha Clock Grids 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 Alpha Clock Grids 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 Alpha Clock Grids 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 H-Trees Fractal structure Gets clock arbitrarily close to any point Matched delay along all paths Delay variations cause skew A and B might see big skew 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 Itanium 2 H-Tree Four levels of buffering: Primary driver Repeater Second-level clock buffer (SLCB) Gater Route around obstructions 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 Itanium 2 H-Tree 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 Hybrid Networks Use H-tree to distribute clock to many points Tie these points together with a grid Ex: IBM Power4, PowerPC H-tree drives 16-64 sector buffers Buffers drive total of 1024 points All points shorted together with grid 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 Skew Tolerance Flip-flops are sensitive to skew because of hard edges Data launches at latest rising edge of clock Must setup before earliest next rising edge of clock Overhead would shrink if we can soften edge Latches tolerate moderate amounts of skew Data can arrive anytime latch is transparent 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 Skew: Latches 2-Phase Latches 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 Skew: Latches Pulsed Latches 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Dynamic Circuit Review Static circuits are slow because fat pMOS load input Dynamic gates use precharge to remove pMOS transistors from the inputs Precharge: f = 0 output forced high Evaluate: f = 1 output may pull low 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 Domino Circuits Dynamic inputs must monotonically rise during evaluation Place inverting stage between each dynamic gate Dynamic / static pair called domino gate Domino gates can be safely cascaded 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 Domino Timing Domino gates are 1.5 – 2x faster than static CMOS Lower logical effort because of reduced Cin Challenge is to keep precharge off critical path Look at clocking schemes for precharge and eval Traditional schemes have severe overhead Skew-tolerant domino hides this overhead 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Traditional Domino Ckts. Hide precharge time by ping-ponging between half-cycles One evaluates while other precharges Latches hold results during precharge 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 Clock Skew Skew increases sequencing overhead Traditional domino has hard edges Evaluate at latest rising edge Setup at latch by earliest falling edge 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 Time Borrowing Logic may not exactly fit half-cycle No flexibility to borrow time to balance logic between half cycles Traditional domino sequencing overhead is about 25% of cycle time in fast systems! 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 Relaxing the Timing Sequencing overhead caused by hard edges Data departs dynamic gate on late rising edge Must setup at latch on early falling edge Latch functions Prevent glitches on inputs of domino gates Holds results during precharge Is the latch really necessary? No glitches if inputs come from other domino Can we hold the results in another way? 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 Skew-Tolerant Domino Use overlapping clocks to eliminate latches at phase boundaries. Second phase evaluates using results of first 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 Skew-Tolerant Domino 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 Full Keeper After second phase evaluates, first phase precharges Input to second phase falls Violates monotonicity? But we no longer need the value Now the second gate has a floating output Need full keeper to hold it either high or low 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 Time Borrowing Overlap can be used to Tolerate clock skew Permit time borrowing No sequencing overhead 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 Multiple Phases With more clock phases, each phase overlaps more Permits more skew tolerance and time borrowing 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 Clock Generation 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 Methods to Avoid Skew 12/31/2018 Deep Submicron VLSI Des. Lec. 14

Deep Submicron VLSI Des. Lec. 14 Summary Clock skew effectively increases setup and hold times in systems with hard edges Managing skew Reduce: good clock distribution network Analyze: local vs. global skew Tolerate: use systems with soft edges Flip-flops and traditional domino are costly Latches and skew-tolerant domino perform at full speed even with moderate clock skews 12/31/2018 Deep Submicron VLSI Des. Lec. 14