Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reading Assignment: Rabaey: Chapter 10

Similar presentations


Presentation on theme: "Reading Assignment: Rabaey: Chapter 10"— Presentation transcript:

1 Reading Assignment: Rabaey: Chapter 10
ELEC 516 VLSI System Design and Design Automation Spring Lecture 6: Timing and Clocking Issues Reading Assignment: Rabaey: Chapter 10 Note: some of the figures in this slide set are adapted from the slide set of “ Digital Integrated Circuits” by Rabaey et. al., Copyright 2002

2 System Timing Clocking is very important to ensure that improper values are never stored. Flip-flop-based pipeline system: Reg. A B Combinational Logic (Td) clock Tq Ts Primary inputs change after clock () edge. Primary inputs must stabilize before next clock edge. Rules allow changes to propagate through combinational logic for next cycle. Flip-flop outputs hold current-state values for next-state computation

3 Timing Definition-Latch Parameters
Q Clk T Clk PWm tsu D thold tc-q td-q Q Delays can be different for rising and falling data transitions

4 Register Parameters D Q Clk T Clk thold D tsu tc-q Q
Delays can be different for rising and falling data transitions

5 Clock period For each clock cycle, cycle period must be longer than sum of: combinational delay; Memory element propagation delay. period depends on longest path. Unbalanced delays Logic with unbalanced delays leads to inefficient use of logic: short clock period long clock period

6 Retiming Retiming moves memory elements through combinational logic:
Retiming properties: Retiming changes encoding of values in registers, but proper values can be reconstructed with combinational logic. Retiming may increase number of registers required. Retiming must preserve number of latches around a cycle—may not be possible with reconvergent fanout.

7 Latch-based design Latch A B Combinational Logic A (Tda) clock Tq Ts
Logic B (Tdb) C Latch-based machines must use multiple ranks of latches. Multiple ranks require multiple phases of clock.

8 Clock Race In a synchronous system, if the data input to a register does not obey the setup and hold-time constraints, then potential clock race problems may occur. Clock race results in erroneous data being stored in registers. Assuming a perfectly synchronous system with perfect clocks, zero hold-time registers, and clock-to-Q time greater than the setup time, no clock race problem should occur. However, at the chip level this might be hard to ensure.

9 Hold time violation Td2 Reg Reg Logic d q d q clk M1 M2 delay delay
Tc1 Tc2 Hold time Violation clk Tc1 Td2 Old data New data Tc2 Tc2 is sampling the new data while it’s supposed to sample the old. This happens when Tc2 lags behind the data Td2 and which is more likely to happen for extended delay on clk and shorter delay on Registers and Logic. Worst case will corresponds to the min delay of Logic.

10 Hold time condition Need to make sure that data are properly held and avoid race between data and clock. Hold time constraint: tc-q + tlogic,min> thold Also called contamination delay tc_q + tlogic,min must be higher than a certain threshold defined by the hold time of the FF.

11 How fast can we run Reg Reg Logic d q d q clk M1 M2 delay delay Tc1
Setup time requirement: Minimum cycle time: T = tc-q + tsu + tlogic Tq1 There is still a margin Tq1 + Tlmax Tsetup2 Setup time Violation Problem

12 The earliest that data appears at the input of register M2 is at time Tc1+Tq1, assuming zero delay in the logic block. The clock appears at the register M2 at time Tc2. Assume zero setup and hold times, if Tc2 lags the data change (Tc2 > (Tc1+ Tq1)), the module M2 will store the data from the current cycle rather than the previous cycle. This is a hold-time violation and may be caused in practice by Tc1 and Tq1 being close to zero while a delay is introduced into the Tc2 clock line. If the delay (Tc1+ Tq1) - Tc2 is larger than the cycle time Tc, then the data will arrive late at M2. This will cause a setup-time violation. This occurs when the circuit is too slow for the clock cycle used. While Tc2 may be artificially increased to allow more time for the data to set up, the constraints Tc2 < (Tc1+ Tq1), becomes harder to meet and data delays may have to be artificially added to meet the constraints.

13 Combating racing for latch-based design
Strict two-phase clocking discipline Strict two-phase discipline is conservative but works. Strict two-phase machine makes latch-based machine behave more like flip-flop design, but requires multiple phases Phases must not overlap: non-overlap region

14 Two phase clocking Each phase has a one-sided constraint: phase must be long enough for all combinational delays. If there are no combinational loops, phases can always be stretched to make that section of the machine work. Total clock period depends on sum of phase periods.

15 Clock Uncertainties Sources of clock uncertainty

16 Clock Nonidealities Clock skew Spatial variation in temporally equivalent clock edges; deterministic + random, tSK Clock jitter Temporal variations in consecutive edges of the clock signal; modulation + random noise Cycle-to-cycle (short-term) tJS Long term tJL Variation of the pulse width Important for level sensitive clocking

17 Clock Skew and Jitter Clk Clk
tSK Clk tJS Both skew and jitter affect the effective cycle time Only skew affects the race margin

18 Clock Skew # of registers Earliest occurrence of Clk edge Nominal – /2 Latest occurrence of Clk edge Nominal +  /2 Bad design Clk delay Insertion delay Max Clk skew Absolute delay through a clock distribution path is not important. What matters is the relative arrival time at registers points at the end of each path. We can have positive and negative skew SKEW: No Clock period variation but only phase shift

19 Sources of skew and Jitter
Systematic errors are nominally identical from chip to chip and are predictable while random errors are due to manufacturing variations that are difficult to model. Clock-signal generation: achieved by generating a high frequency signal from a low frequency one (VCO): sensitive to device noise, power supply variations, substrate coupling. Manufacturing Device variations: matching of devices in the buffers along multiple clock paths is critical. Interconnect variations: Vertical and lateral dimension variations cause the interconnect cap and resistance to vary. Source of problem: Inter layer Diele (ILD) thickness variations. Environmental variations: temperature and power supply. Temperature gradients across the chip are large as a consequence of clock gating. Device parameters (Vth and m) depend on temperature and the clock delay can vary from path to path. Does temperature contributes to skew or jitter? Capacitive coupling: Any coupling between clock wire and adjacent signal results in timing uncertainties.

20 The Clock Skew Problem Clock Rates as High as 2 Ghz in CMOS! (T=0.5ns)
f t t l,min r,min t t l,max r,max In CL1 R1 CL2 R2 CL3 R3 Out t i Clock Edge Timing Depends upon Position Positive skew: data and clock routed in the same direction clk1 clk2

21 Delay of Clock Wire C r c R r = 0.07 W / q , c = 0.04 fF/ m
S r = 0.07 W / q , c = 0.04 fF/ m 2 (Tungsten wire)

22 Positive Skew Launching edge arrives before the receiving edge

23 Positive Skew The output of the combinational circuit must be valid one setup time before the rising edge of CLK2 (point 4). This equation suggests that clock skew actually has the potential to improve the performance of the circuit. This is indeed true but increasing skew makes the circuit susceptible to race conditions. The problem may arise if the new value at the output of R1 propagates through the logic is valid at the input of R2 before 2. To avoid this we have to ensure that: T +  >= tc-q + tsu + tlogic)max or T >= tc-q + tsu + tlogic)max -   + thold < tc-q + tlogic)min or  < tc-q + tlogic)min - thold

24 Negative Skew clk Receiving edge arrives before the launching edge R1
Q Combinational Logic In t CLK1 R2 CLK2 c - q q, cd su, hold logic logic, clk Receiving edge arrives before the launching edge

25 Negative Skew Negative slow impacts the performance as the effective period (from position 1 to position 4) is made shorter by : However, a negative skew implies that the system never fails since edge 2 happens before edge 1. There is no race issue. T -  >= tc-q + tsu + tlogic)max or T >= tc-q + tsu + tlogic)max + 

26 Positive and Negative Skew
f (a) Positive skew(clock is routed in the same direction of the data flow. Data CL R CL R CL R Skew has to be strictly controlled and satisfy the maximum value of skew. Otherwise the circuit will be mal-function. Reducing the clock frequency does not help. f (b) Negative skew(clock is routed in the opposite direction of the data Data CL R CL R CL R When the skew is -ve, the race condition will never happen. The circuit operates correctly independent of skew. However, -ve skew impact the throughput in a negative way. The skew reduces the time available for the actual computation so that the clock period has to increased by |d|.

27 How to counter Clock Skew?
Routing the clock is opposition direction can relieve the race problem of clock skew. But it will hamper performance. Also sometimes the data-flow of circuit is not uni-directional. R E G f . log Out In Clock Distribution Positive Skew Negative Skew The best solution is to ensure the clock skew between communicating registers is bound

28 Example of Clock skew tg = gate delay, tm= mux delay, ts = setup time
REG MUX f tg = gate delay, tm= mux delay, ts = setup time tq = reg, clock-to-q delay, T = clock period Assume input signals arrive early enough, max bound on the skew is The equilibrium requirement at the time of latching imposes another constraints on the skew Combining these constraints we have

29 Example –Propagation and contamination delay evaluation
Propagation and contamination delay are not always easy to evaluate due to false paths. A B C D Out OR1 OR2 AND3 AND2 AND1 In1 PATH2 PATH1 The contamination is defined a 2tgates (through OR1,OR2) It would appear that the worst case is path 1, 5tgates, but this is a false path (output does not even depend on C &D): If A=1 the critical path (CP) is through OR1 and OR2. If A=0, B=0, CP through I1, OR1 OR2 If A=0, B=1, CP through I1, OR1, AND3, OR2 which is 4tgates Computation of worst case delay cannot be obtained just by adding propagation delay due to false path. REG

30 Static Timing Analysis
0->1 and 1->0 delays are generally different. The simplest delay problem to analyze is to change the value at only one input and determine how long it takes for the effect to be propagated to a single output (provided there must be a path from the selected input to the output). Can use a logic simulator, however have to simulate all possible transition values Static Timing analysis - value-independent. It builds a graph which models delays through the network and identifies the longest(shortest) delay path.

31 Critical Path The longest delay path is known as critical path since that path limits the system performance. The critical path not only tells us the system cycle time, it points out what part of the combinational logic must be changed to improved system performance. Speed up gates on the critical path by increasing transistor sizes, or reducing wiring capacitance, or redesign logic along the critical path to use a faster gate configuration. Speeding up the system may require modifying several sections of logic since the critical path can have multiple branches. Identify the critical path and identify the cutset of the graph represents the critical path. Then determine the edge (gate) to speed up.

32 False Path False path - critical paths that can never be exercised during normal circuit operation. In this case the actual critical path is thus shorter than what would be predicted from the first-order analysis. Detecting false path is not easy since it requires an understanding of the logic functionality of the network. Also it is a N-P complete problem to determine whether a path is false or not, however new CAD tools/algorithm are available now to find false paths in practical networks.

33 Example of False Path a y c d z b e V a-> V c-> V d-> V e-> V z is a false path

34 Impact of Jitter Temporal variation in the clock edge.

35 Longest Logic Path in Edge-Triggered Systems
Setup time Condition Clk T TSU TClk-Q TLM Latest point of launching Earliest arrival of next cycle TJI + d If launching edge is late and receiving edge is early, the data will not be too late if: Tc-q + TLM + TSU < T – TJI,1 – TJI,2 - d Minimum cycle time is determined by the maximum delays through the logic Tc-q + TLM + TSU + d + 2 TJI < T Skew can be either positive or negative

36 Clock Constraints in Edge-Triggered Systems –Shortest path
Hold time Condition Clk TClk-Q TLm Earliest point of launching Data must not arrive before this time TH Nominal clock edge If launching edge is early and receiving edge is late: Tc-q + TLM – TJI,1 < TH + TJI,2 + d Minimum logic delay Tc-q + TLM < TH + 2TJI+ d

37 Latch-Based Design L1 latch is transparent when f = 0
Logic Latch Latch Logic

38 Slack-borrowing

39 Clock-distribution network design parameters
Interconnect material used for the clock network Shape of the clock-distribution network Clock driver and the buffer scheme used Load on the clock lines (I.e. the clock fan-out) Rise and fall time of the clock

40 Clock Distribution to bound skew
Very attractive for regular structure

41 Clock Network with Distributed Buffering
Module CLOCK main clock driver secondary clock drivers Reduces absolute delay, and makes Power-Down easier Sensitive to variations in Buffer Delay Local Area Equalizing the local clock delay through a careful routing of the clock signals combining with a hierarchical clock-buffering scheme

42 More realistic H-tree [Restle98]

43 The Grid System No rc-matching Large power

44 Example: DEC Alpha 21164 Use Clock grid instead of clock tree

45

46 Clock Skew in Alpha Processor

47 EV6 (Alpha 21264) Clocking 600 MHz – 0.35 micron CMOS trise = 0.35ns
tskew = 50ps tcycle= 1.67ns Global clock waveform 2 Phase, with multiple conditional buffered clocks 2.8 nF clock load 40 cm final driver width Local clocks can be gated “off” to save power Reduced load/skew Reduced thermal issues Multiple clocks complicate race checking

48 Hybrid Grid DEC Alpha 21264, Bailey JSSC 11/98

49 DEC Alpha 21264 global clock distribution network

50 Global Clock Grid

51 EV7 Clock Hierarchy Active Skew Management and Multiple Clock Domains
+ widely dispersed drivers + DLLs compensate static and low- frequency variation + divides design and verification effort - DLL design and verification is added work + tailored clocks

52 Example 2: Intel IA-64 Itanium
Use of Deskew buffers 3-level Hierarchy Global distribution On-die Phase-lock loop Deskew buffer (DSK) Regional distribution From deskew buffer to 30 clock regions (region clock grid, RCD) Local distribution Lock clock buffer (LCB) Opportunity-time-borrowing (OTB) delay clocks generation

53 Intel IA-64 Itanium clock distribution topology

54 Global Clock Distribution
Distribute two clocks Core clock and reference clock Using two identical and balanced H-tree on the top two metal layers To reduce cap. noise coupling and to ensure good inductive return path, the H-tree is fully shield laterally with Vcc/Vss.

55 Regional clock distribution
Distributed array of deskew buffer (DSK) to reduce within-die process variations Regional clock grid driven by modular Regional Clock Drivers 30 clock regions M4 for x-direction, M5 for y-direction Full support for scan and clock gating

56 Local Clock distribution
Local clock buffer Delay clocks that are needed for the opportunity-time-borrowing (OTB) delay clock generation, I.e. intentional skew buffer


Download ppt "Reading Assignment: Rabaey: Chapter 10"

Similar presentations


Ads by Google