Reading Assignment: Rabaey: Chapter 10

Slides:



Advertisements
Similar presentations
Modern VLSI Design 4e: Chapter 5 Copyright  2008 Wayne Wolf Topics n Performance analysis of sequential machines.
Advertisements

4/22/ Clock Network Synthesis Prof. Shiyan Hu Office: EREC 731.
Digital Integrated Circuits A Design Perspective
Timing Issues Mohammad Sharifkhani. Reading Textbook II, Chapter 10 Textbook I, Chapters 12 and 13.
Introduction to CMOS VLSI Design Sequential Circuits.
VLSI Design EE 447/547 Sequential circuits 1 EE 447/547 VLSI Design Lecture 9: Sequential Circuits.
Introduction to CMOS VLSI Design Sequential Circuits
ECE C03 Lecture 81 Lecture 8 Memory Elements and Clocking Hai Zhou ECE 303 Advanced Digital Design Spring 2002.
MICROELETTRONICA Sequential circuits Lection 7.
ELEC 256 / Saif Zahir UBC / 2000 Timing Methodology Overview Set of rules for interconnecting components and clocks When followed, guarantee proper operation.
Lecture 11: Sequential Circuit Design. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 11: Sequential Circuits2 Outline  Sequencing  Sequencing Element Design.
L06 – Clocks Spring /18/05 Clocking.
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits David Harris Harvey Mudd College Spring 2004.
FPGA-Based System Design: Chapter 5 Copyright  2004 Prentice Hall PTR Clocking disciplines Flip-flops. Latches.
(Neil west - p: ). Finite-state machine (FSM) which is composed of a set of logic input feeding a block of combinational logic resulting in a set.
Sequential Circuits. Outline  Floorplanning  Sequencing  Sequencing Element Design  Max and Min-Delay  Clock Skew  Time Borrowing  Two-Phase Clocking.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE VLSI Circuit Design Lecture 17 - Sequential.
1 Lecture 28 Timing Analysis. 2 Overview °Circuits do not respond instantaneously to input changes °Predictable delay in transferring inputs to outputs.
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
Introduction to CMOS VLSI Design Clock Skew-tolerant circuits.
EE141 © Digital Integrated Circuits 2nd Timing Issues 1 Digital Integrated Circuits A Design Perspective Timing Issues Jan M. Rabaey Anantha Chandrakasan.
Sequential Definitions  Use two level sensitive latches of opposite type to build one master-slave flipflop that changes state on a clock edge (when the.
EE141 © Digital Integrated Circuits 2nd Timing Issues 1 Digital Integrated Circuits A Design Perspective Timing Issues Jan M. Rabaey Anantha Chandrakasan.
CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 17: Dynamic Sequential Circuits And Timing Issues [Adapted from Rabaey’s Digital Integrated Circuits,
CSE477 L19 Timing Issues; Datapaths.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 19: Timing Issues; Introduction to Datapath.
Clock Design Adopted from David Harris of Harvey Mudd College.
Technical Seminar on Timing Issues in Digital Circuits
Chapter 11 Timing Issues in Digital Systems Boonchuay Supmonchai Integrated Design Application Research (IDAR) Laboratory August 20, 2004; Revised - July.
Assume array size is 256 (mult: 4ns, add: 2ns)
CSE477 L19 Timing Issues; Datapaths.1Irwin&Vijay, PSU, 2002 Complex Digital Circuits Design Lecture 2: Timing Issues; [Adapted from Rabaey’s Digital Integrated.
Digital Integrated Circuits A Design Perspective
ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN Dr. Shi Dept. of Electrical and Computer Engineering.
ENGIN112 L28: Timing Analysis November 7, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 28 Timing Analysis.
EE141 © Digital Integrated Circuits 2nd Timing Issues 1 Latch-based Design.
Lecture 8: Clock Distribution, PLL & DLL
Modern VLSI Design 2e: Chapter 5 Copyright  1998 Prentice Hall PTR Topics n Memory elements. n Basics of sequential machines.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 25: Sequential Circuit Design (3/3) Prof. Sherief Reda Division of Engineering,
COMPUTER ARCHITECTURE & OPERATIONS I Instructor: Hao Ji.
Chapter #6: Sequential Logic Design 6.2 Timing Methodologies
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits Credits: David Harris Harvey Mudd College (Material taken/adapted from Harris’ lecture.
Temporizzazioni e sincronismo1 Progettazione di circuiti e sistemi VLSI Anno Accademico Lezione Temporizzazioni e sincronizzazione.
CS 151 Digital Systems Design Lecture 28 Timing Analysis.
Digital System Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic Wiley-Interscience.
CSE477 L17 Static Sequential Logic.1Irwin&Vijay, PSU, 2003 CSE477 VLSI Digital Circuits Fall 2003 Lecture 17: Static Sequential Circuits Mary Jane Irwin.
© Digital Integrated Circuits 2nd Sequential Circuits Digital Integrated Circuits A Design Perspective Designing Sequential Logic Circuits Jan M. Rabaey.
DELAY INSERTION METHOD IN CLOCK SKEW SCHEDULING BARIS TASKIN and IVAN S. KOURTEV ISPD 2005 High Performance Integrated Circuit Design Lab. Department of.
CSE477 L17 Static Sequential Logic.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 17: Static Sequential Circuits Mary Jane Irwin.
Digital System Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic Wiley-Interscience.
Reading Assignment: Rabaey: Chapter 9
Sp09 CMPEN 411 L18 S.1 CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 16: Static Sequential Circuits [Adapted from Rabaey’s Digital Integrated Circuits,
Sequential Networks: Timing and Retiming
Digital System Clocking: High-Performance and Low-Power Aspects Vojin G. Oklobdzija, Vladimir M. Stojanovic, Dejan M. Markovic, Nikola M. Nedovic Wiley-Interscience.
Clocking System Design
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
04/21/20031 ECE 551: Digital System Design & Synthesis Lecture Set : Functional & Timing Verification 10.2: Faults & Testing.
FAMU-FSU College of Engineering EEL 3705 / 3705L Digital Logic Design Spring 2007 Instructor: Dr. Michael Frank Module #10: Sequential Logic Timing & Pipelining.
EE141 Timing Issues 1 Chapter 10 Timing Issues Rev /11/2003 Rev /28/2003 Rev /05/2003.
EE141 Timing Issues 1 Chapter 10 Timing Issues Rev /11/2003.
CSE477 L19 Timing Issues; Datapaths.1Irwin&Vijay, PSU, 2003 CSE477 VLSI Digital Circuits Fall 2003 Lecture 19: Timing Issues; Introduction to Datapath.
Lecture 11: Sequential Circuit Design
CMOS VLSI Design Chapter 13 Clocks, DLLs, PLLs
Timing Analysis 11/21/2018.
Clocking in High-Performance and Low-Power Systems Presentation given at: EPFL Lausanne, Switzerland June 23th, 2003 Vojin G. Oklobdzija Advanced.
Topics Performance analysis..
Chapter 10 Timing Issues Rev /11/2003 Rev /28/2003
CMOS VLSI Design Chapter 13 Clocks, DLLs, PLLs
332:578 Deep Submicron VLSI Design Lecture 14 Design for Clock Skew
Topics Clocking disciplines. Flip-flops. Latches..
Presentation transcript:

Reading Assignment: Rabaey: Chapter 10 ELEC 516 VLSI System Design and Design Automation Spring 2010 Lecture 6: Timing and Clocking Issues Reading Assignment: Rabaey: Chapter 10 Note: some of the figures in this slide set are adapted from the slide set of “ Digital Integrated Circuits” by Rabaey et. al., Copyright 2002

System Timing Clocking is very important to ensure that improper values are never stored. Flip-flop-based pipeline system: Reg. A B Combinational Logic (Td) clock Tq Ts Primary inputs change after clock () edge. Primary inputs must stabilize before next clock edge. Rules allow changes to propagate through combinational logic for next cycle. Flip-flop outputs hold current-state values for next-state computation

Timing Definition-Latch Parameters Q Clk T Clk PWm tsu D thold tc-q td-q Q Delays can be different for rising and falling data transitions

Register Parameters D Q Clk T Clk thold D tsu tc-q Q Delays can be different for rising and falling data transitions

Clock period For each clock cycle, cycle period must be longer than sum of: combinational delay; Memory element propagation delay. period depends on longest path. Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: short clock period long clock period

Retiming Retiming moves memory elements through combinational logic: Retiming properties: Retiming changes encoding of values in registers, but proper values can be reconstructed with combinational logic. Retiming may increase number of registers required. Retiming must preserve number of latches around a cycle—may not be possible with reconvergent fanout.

Latch-based design Latch A B Combinational Logic A (Tda) clock Tq Ts Logic B (Tdb) C Latch-based machines must use multiple ranks of latches. Multiple ranks require multiple phases of clock.

Clock Race In a synchronous system, if the data input to a register does not obey the setup and hold-time constraints, then potential clock race problems may occur. Clock race results in erroneous data being stored in registers. Assuming a perfectly synchronous system with perfect clocks, zero hold-time registers, and clock-to-Q time greater than the setup time, no clock race problem should occur. However, at the chip level this might be hard to ensure.

Hold time violation Td2 Reg Reg Logic d q d q clk M1 M2 delay delay Tc1 Tc2 Hold time Violation clk Tc1 Td2 Old data New data Tc2 Tc2 is sampling the new data while it’s supposed to sample the old. This happens when Tc2 lags behind the data Td2 and which is more likely to happen for extended delay on clk and shorter delay on Registers and Logic. Worst case will corresponds to the min delay of Logic.

Hold time condition Need to make sure that data are properly held and avoid race between data and clock. Hold time constraint: tc-q + tlogic,min> thold Also called contamination delay tc_q + tlogic,min must be higher than a certain threshold defined by the hold time of the FF.

How fast can we run Reg Reg Logic d q d q clk M1 M2 delay delay Tc1 Setup time requirement: Minimum cycle time: T = tc-q + tsu + tlogic Tq1 There is still a margin Tq1 + Tlmax Tsetup2 Setup time Violation Problem

The earliest that data appears at the input of register M2 is at time Tc1+Tq1, assuming zero delay in the logic block. The clock appears at the register M2 at time Tc2. Assume zero setup and hold times, if Tc2 lags the data change (Tc2 > (Tc1+ Tq1)), the module M2 will store the data from the current cycle rather than the previous cycle. This is a hold-time violation and may be caused in practice by Tc1 and Tq1 being close to zero while a delay is introduced into the Tc2 clock line. If the delay (Tc1+ Tq1) - Tc2 is larger than the cycle time Tc, then the data will arrive late at M2. This will cause a setup-time violation. This occurs when the circuit is too slow for the clock cycle used. While Tc2 may be artificially increased to allow more time for the data to set up, the constraints Tc2 < (Tc1+ Tq1), becomes harder to meet and data delays may have to be artificially added to meet the constraints.

Combating racing for latch-based design Strict two-phase clocking discipline Strict two-phase discipline is conservative but works. Strict two-phase machine makes latch-based machine behave more like flip-flop design, but requires multiple phases Phases must not overlap: non-overlap region

Two phase clocking Each phase has a one-sided constraint: phase must be long enough for all combinational delays. If there are no combinational loops, phases can always be stretched to make that section of the machine work. Total clock period depends on sum of phase periods.

Clock Uncertainties Sources of clock uncertainty

Clock Nonidealities Clock skew Spatial variation in temporally equivalent clock edges; deterministic + random, tSK Clock jitter Temporal variations in consecutive edges of the clock signal; modulation + random noise Cycle-to-cycle (short-term) tJS Long term tJL Variation of the pulse width Important for level sensitive clocking

Clock Skew and Jitter Clk Clk tSK Clk tJS Both skew and jitter affect the effective cycle time Only skew affects the race margin

Clock Skew # of registers Earliest occurrence of Clk edge Nominal – /2 Latest occurrence of Clk edge Nominal +  /2 Bad design Clk delay Insertion delay Max Clk skew  Absolute delay through a clock distribution path is not important. What matters is the relative arrival time at registers points at the end of each path. We can have positive and negative skew SKEW: No Clock period variation but only phase shift

Sources of skew and Jitter Systematic errors are nominally identical from chip to chip and are predictable while random errors are due to manufacturing variations that are difficult to model. Clock-signal generation: achieved by generating a high frequency signal from a low frequency one (VCO): sensitive to device noise, power supply variations, substrate coupling. Manufacturing Device variations: matching of devices in the buffers along multiple clock paths is critical. Interconnect variations: Vertical and lateral dimension variations cause the interconnect cap and resistance to vary. Source of problem: Inter layer Diele (ILD) thickness variations. Environmental variations: temperature and power supply. Temperature gradients across the chip are large as a consequence of clock gating. Device parameters (Vth and m) depend on temperature and the clock delay can vary from path to path. Does temperature contributes to skew or jitter? Capacitive coupling: Any coupling between clock wire and adjacent signal results in timing uncertainties.

The Clock Skew Problem Clock Rates as High as 2 Ghz in CMOS! (T=0.5ns) f t t l,min r,min t t l,max r,max In CL1 R1 CL2 R2 CL3 R3 Out t i Clock Edge Timing Depends upon Position Positive skew: data and clock routed in the same direction clk1 clk2

Delay of Clock Wire C r c R r = 0.07 W / q , c = 0.04 fF/ m S r = 0.07 W / q , c = 0.04 fF/ m 2 (Tungsten wire)

Positive Skew Launching edge arrives before the receiving edge

Positive Skew The output of the combinational circuit must be valid one setup time before the rising edge of CLK2 (point 4). This equation suggests that clock skew actually has the potential to improve the performance of the circuit. This is indeed true but increasing skew makes the circuit susceptible to race conditions. The problem may arise if the new value at the output of R1 propagates through the logic is valid at the input of R2 before 2. To avoid this we have to ensure that: T +  >= tc-q + tsu + tlogic)max or T >= tc-q + tsu + tlogic)max -   + thold < tc-q + tlogic)min or  < tc-q + tlogic)min - thold

Negative Skew clk Receiving edge arrives before the launching edge R1 Q Combinational Logic In t CLK1 R2 CLK2 c - q q, cd su, hold logic logic, clk Receiving edge arrives before the launching edge

Negative Skew Negative slow impacts the performance as the effective period (from position 1 to position 4) is made shorter by : However, a negative skew implies that the system never fails since edge 2 happens before edge 1. There is no race issue. T -  >= tc-q + tsu + tlogic)max or T >= tc-q + tsu + tlogic)max + 

Positive and Negative Skew f (a) Positive skew(clock is routed in the same direction of the data flow. Data CL R CL R CL R Skew has to be strictly controlled and satisfy the maximum value of skew. Otherwise the circuit will be mal-function. Reducing the clock frequency does not help. f (b) Negative skew(clock is routed in the opposite direction of the data Data CL R CL R CL R When the skew is -ve, the race condition will never happen. The circuit operates correctly independent of skew. However, -ve skew impact the throughput in a negative way. The skew reduces the time available for the actual computation so that the clock period has to increased by |d|.

How to counter Clock Skew? Routing the clock is opposition direction can relieve the race problem of clock skew. But it will hamper performance. Also sometimes the data-flow of circuit is not uni-directional. R E G f . log Out In Clock Distribution Positive Skew Negative Skew The best solution is to ensure the clock skew between communicating registers is bound

Example of Clock skew tg = gate delay, tm= mux delay, ts = setup time REG MUX f tg = gate delay, tm= mux delay, ts = setup time tq = reg, clock-to-q delay, T = clock period Assume input signals arrive early enough, max bound on the skew is The equilibrium requirement at the time of latching imposes another constraints on the skew Combining these constraints we have

Example –Propagation and contamination delay evaluation Propagation and contamination delay are not always easy to evaluate due to false paths. A B C D Out OR1 OR2 AND3 AND2 AND1 In1 PATH2 PATH1 The contamination is defined a 2tgates (through OR1,OR2) It would appear that the worst case is path 1, 5tgates, but this is a false path (output does not even depend on C &D): If A=1 the critical path (CP) is through OR1 and OR2. If A=0, B=0, CP through I1, OR1 OR2 If A=0, B=1, CP through I1, OR1, AND3, OR2 which is 4tgates Computation of worst case delay cannot be obtained just by adding propagation delay due to false path. REG

Static Timing Analysis 0->1 and 1->0 delays are generally different. The simplest delay problem to analyze is to change the value at only one input and determine how long it takes for the effect to be propagated to a single output (provided there must be a path from the selected input to the output). Can use a logic simulator, however have to simulate all possible transition values Static Timing analysis - value-independent. It builds a graph which models delays through the network and identifies the longest(shortest) delay path.

Critical Path The longest delay path is known as critical path since that path limits the system performance. The critical path not only tells us the system cycle time, it points out what part of the combinational logic must be changed to improved system performance. Speed up gates on the critical path by increasing transistor sizes, or reducing wiring capacitance, or redesign logic along the critical path to use a faster gate configuration. Speeding up the system may require modifying several sections of logic since the critical path can have multiple branches. Identify the critical path and identify the cutset of the graph represents the critical path. Then determine the edge (gate) to speed up.

False Path False path - critical paths that can never be exercised during normal circuit operation. In this case the actual critical path is thus shorter than what would be predicted from the first-order analysis. Detecting false path is not easy since it requires an understanding of the logic functionality of the network. Also it is a N-P complete problem to determine whether a path is false or not, however new CAD tools/algorithm are available now to find false paths in practical networks.

Example of False Path a y c d z b e V a-> V c-> V d-> V e-> V z is a false path

Impact of Jitter Temporal variation in the clock edge.

Longest Logic Path in Edge-Triggered Systems Setup time Condition Clk T TSU TClk-Q TLM Latest point of launching Earliest arrival of next cycle TJI + d If launching edge is late and receiving edge is early, the data will not be too late if: Tc-q + TLM + TSU < T – TJI,1 – TJI,2 - d Minimum cycle time is determined by the maximum delays through the logic Tc-q + TLM + TSU + d + 2 TJI < T Skew can be either positive or negative

Clock Constraints in Edge-Triggered Systems –Shortest path Hold time Condition Clk TClk-Q TLm Earliest point of launching Data must not arrive before this time TH Nominal clock edge If launching edge is early and receiving edge is late: Tc-q + TLM – TJI,1 < TH + TJI,2 + d Minimum logic delay Tc-q + TLM < TH + 2TJI+ d

Latch-Based Design L1 latch is transparent when f = 0 Logic Latch Latch Logic

Slack-borrowing

Clock-distribution network design parameters Interconnect material used for the clock network Shape of the clock-distribution network Clock driver and the buffer scheme used Load on the clock lines (I.e. the clock fan-out) Rise and fall time of the clock

Clock Distribution to bound skew Very attractive for regular structure

Clock Network with Distributed Buffering Module CLOCK main clock driver secondary clock drivers Reduces absolute delay, and makes Power-Down easier Sensitive to variations in Buffer Delay Local Area Equalizing the local clock delay through a careful routing of the clock signals combining with a hierarchical clock-buffering scheme

More realistic H-tree [Restle98]

The Grid System No rc-matching Large power

Example: DEC Alpha 21164 Use Clock grid instead of clock tree

Clock Skew in Alpha Processor

EV6 (Alpha 21264) Clocking 600 MHz – 0.35 micron CMOS trise = 0.35ns tskew = 50ps tcycle= 1.67ns Global clock waveform 2 Phase, with multiple conditional buffered clocks 2.8 nF clock load 40 cm final driver width Local clocks can be gated “off” to save power Reduced load/skew Reduced thermal issues Multiple clocks complicate race checking

Hybrid Grid DEC Alpha 21264, Bailey JSSC 11/98

DEC Alpha 21264 global clock distribution network

Global Clock Grid

EV7 Clock Hierarchy Active Skew Management and Multiple Clock Domains + widely dispersed drivers + DLLs compensate static and low- frequency variation + divides design and verification effort - DLL design and verification is added work + tailored clocks

Example 2: Intel IA-64 Itanium Use of Deskew buffers 3-level Hierarchy Global distribution On-die Phase-lock loop Deskew buffer (DSK) Regional distribution From deskew buffer to 30 clock regions (region clock grid, RCD) Local distribution Lock clock buffer (LCB) Opportunity-time-borrowing (OTB) delay clocks generation

Intel IA-64 Itanium clock distribution topology

Global Clock Distribution Distribute two clocks Core clock and reference clock Using two identical and balanced H-tree on the top two metal layers To reduce cap. noise coupling and to ensure good inductive return path, the H-tree is fully shield laterally with Vcc/Vss.

Regional clock distribution Distributed array of deskew buffer (DSK) to reduce within-die process variations Regional clock grid driven by modular Regional Clock Drivers 30 clock regions M4 for x-direction, M5 for y-direction Full support for scan and clock gating

Local Clock distribution Local clock buffer Delay clocks that are needed for the opportunity-time-borrowing (OTB) delay clock generation, I.e. intentional skew buffer