Presentation is loading. Please wait.

Presentation is loading. Please wait.

Timing Issues Mohammad Sharifkhani. Reading Textbook II, Chapter 10 Textbook I, Chapters 12 and 13.

Similar presentations


Presentation on theme: "Timing Issues Mohammad Sharifkhani. Reading Textbook II, Chapter 10 Textbook I, Chapters 12 and 13."— Presentation transcript:

1 Timing Issues Mohammad Sharifkhani

2 Reading Textbook II, Chapter 10 Textbook I, Chapters 12 and 13

3 Motivation Time is the essence! –We do things in order, do does the processors Procedural dependency Resource Reusability Synchronous architectures are preferred –Ease of implementation –Predictability –Compatibility with well known arithmetic algorithms A reference clock plays a key role –We usually neglect the non-idealities in the clock in the design cycle

4 Timinng

5 Clock frequency

6 Two signals Signals that can only transition at predetermined times with respect to a signal clock are called “{syn,meso,plesio}chronous” An asynchronous signal can transition at any arbitrary time.

7 Definitions data passed between two different clock domains

8 Mesochronous Timing

9 unknown interconnect delay

10 Pelsichronous two interacting modules have independent clocks generated from separate crystal oscillators

11 Asynchronous Interconnect No clock is needed Speed is determined by job completion

12 Hand Shaking The four-phase handshake is level-sensitive while the two- phase handshake is edge- triggered (lower transitions at the expense of edge triggered circuitry). System A places data on the bus. It then raises Req to indicate that the data is valid. System B samples the data when it sees a high value on Req and raises Ack to indicate that the data has been captured. System A lowers Req, then system B lowers Ack. Req is not synch to clkB  synchronizer is needed Req is not synch to clkB  synchronizer is needed

13 Hand Shaking (Cont’)

14 Synchronous Timing

15 A quick look

16 Timing Definitions and Basics

17 Latch Parameters D Q Clk t c-q t hold PW m t su t d-q Delays can be different for rising and falling data transitions T Transparent Opaque

18 Register Parameters D Q Clk t c-q t hold T t su Delays can be different for rising and falling data transitions

19 Clock Uncertainties Sources of clock uncertainty

20 Clock Nonidealities Clock skew –Spatial variation in temporally equivalent clock edges; deterministic + random, t SK Clock jitter –Temporal variations in consecutive edges of the clock signal; modulation + random noise –Cycle-to-cycle (short-term) t JS –Long term t JL Variation of the pulse width –Important for level sensitive clocking

21 Clock Skew and Jitter Both skew and jitter affect the effective cycle time Only skew affects the race margin Clk t SK t JS

22 Clock Skew and Jitter Do not touch the clock signal if not necessary! –Sometimes the simplest architecture is the safest –But not necessarily the lowest power!  Clk t SK t JS

23 Clock skew and Jitter Data and state independent clock distribution is desired Enabled FF is a popular choice in the design Consider clock load on power!

24 Clock Skew # of registers Clk delay Insertion delay Max Clk skew Earliest occurrence of Clk edge Nominal –  /2 Latest occurrence of Clk edge Nominal +  /2 

25 Positive and Negative Skew

26 Positive Skew Launching edge arrives before the receiving edge

27 Negative Skew Receiving edge arrives before the launching edge

28 Timing Constraints (positive skew) Minimum cycle time: T +  > t c-q + t su + t logic Worst case is when receiving edge arrives early (positive  ) More time to process the data

29 Timing Constraints (positive skew) Hold time constraint: t (c-q, cd) + t (logic, cd) > t hold +  Worst case is when receiving edge arrives late Race between data and clock (positive skew) Otherwise it can not latch In1 before it changes after CLK1 edge 1  t hold  independent of the T

30 Considerations δ > 0—This corresponds to a clock routed in the same direction as the flow of the data through the pipeline. The skew has to be strictly controlled. If this constraint is not met, the circuit does malfunction independent of the clock period.

31 Question Would there be any race if the skew is negative? What would you do to avoid race?

32 Negative Skew δ < 0—When the clock is routed in the opposite direction of the data, the skew is negative and condition to avoid race is unconditionally met. The circuit operates correctly independent of the skew. The skew reduces the time available for actual computation so that the clock period, T, has to be increased by |δ|. If race (hold time) is a problem, route the clock in the opposite direction

33 Impact of Jitter Both skew and jitter should be accounted for in feedback structures

34 Longest Logic Path in Edge-Triggered Systems Clk T T SU T Clk-Q T LM Latest point of launching considering jitter Earliest arrival of next cycle T JI + 

35 Clock Constraints in Edge-Triggered Systems If launching edge is late and receiving edge is early, the data will not be too late if: Minimum cycle time is determined by the maximum delays through the logic T c-q + T LM + T SU < T – T JI,1 – T JI,2 -  T c-q + T LM + T SU +  + 2 T JI < T Skew can be either positive or negative

36 Shortest Path Clk T Clk-Q T Lm Earliest point of launching Data must not arrive before this time Clk THTH Nominal clock edge

37 Clock Constraints in Edge-Triggered Systems Minimum logic delay If launching edge is early and receiving edge is late: T c-q + T LM – T JI,1 > T H + T JI,2 +  T c-q + T LM > T H + 2T JI + 

38 False path Path 1 (5 tgate) never exercised. If A = 1, the critical path goes through OR1 and OR2. If A = 0 and B = 0, the critical path is through I1,OR1 and OR2 (corresponding to a delay of 3 tgate). For the case when A= 0 and B =1, the critical path is through I1,OR1, AND3 and OR2. Does not depend on C,D.

39 How to counter Clock Skew?

40 Sources of uncertainity

41 Device variation Variation Matching –Poly orientation –Dopant profiles Can be modeled and compensated for

42 Interconnect variation (ILD)

43 Pattern and ILD correlation Use of fillers is necessary

44 Temp. and Power Temp. –Time varying (milisecond) –Effect of clock gating –Has a gradient  systematic  compensated for Power –Instantaneous IR Drop (switching activity) –Jitter (short pulses, data dependent) –Can not be compensated for (only decoupling caps)

45 Data dependent loading It is modeled as a form of jitter due to its random nature Capacitive coupling and X-talk works the same way.

46 Clock Distribution Clock is distributed in a tree-like fashion H-tree

47 Example Clock H-Tree –Clock skew: time difference between the arrival time of the clock signal between two leaves –Identical branches and leaves

48 Example Considering three parameters: –Both FETs and wires; 64 samples + main buffer –All deterministic factors are nulled out  only within chip variation is considered –Random ΔL of FET with distribution stat: N(0, 0.035um) –Random ΔW of wires with N(0,0.25um) –Spatial ΔL; ΔL = w 0 +w x.x+w y.y

49 Example

50 Results –In case of Random ΔL  139ps vs. 171ps without considering spatial constraints –In case of Random ΔW  41ps vs. 49ps –Without considering spatial constraints; worst case is too pessimistic

51 More realistic H-tree [Restle98] 10 Balanced segments Each segments contain 580 drivers All-RC matched If we leave Clock Tree for last minute we may end-up with multiple timing constraints violations!

52 The Grid System No rc-matching Large power Absolute delay is minimized Allows late design changes

53 Examples Alpha 21064 (0.75um) 200MHz Clock load 3.25nF (40%) Skew < 200pSec (10%)

54 Example: DEC Alpha 21164

55 21164 Clocking 2 phase single wire clock, distributed globally 2 distributed driver channels –Reduced RC delay/skew –Improved thermal distribution –3.75nF clock load –58 cm final driver width Local inverters for latching Conditional clocks in caches to reduce power More complex race checking Device variation Skew: 90pSec (65pSec effective) t rise = 0.35ns t skew = 150ps t cycle = 3.3ns Clock waveform Location of clock driver on die pre-driver final drivers

56 Clock buffers carefully sized to minimize the skew The direction of the clock is considered One gate between the latches Dummy fillers  (increase cap) –Dummies are shielded 21164 Clocking

57 Reducing Skew 1. balance clock paths from a central distribution source to individual clocking elements using H-tree structures 2. The use of local clock grids (instead of routed trees) can reduce skew at the cost of increased capacitive load and power dissipation. 3. If data dependent clock load variations causes significant jitter, differential registers that have a data independent clock load should be used. –The use of gated clocks to save also results in data dependent clock load and increased jitter. In clock networks where the fixed load is large (e.g., using clock grids), the data dependent variation might not be significant. 4. If data flows in one direction, route data and clock in opposite directions. This eliminates races at the cost of performance. 5. shielding clock wires from adjacent signal wires 6. ILD: Dummy fills 7. Temperature: delay locked loops as discussed later in this chapter can easily compensate for temperature variations. 8. Power supply variation : on-chip decoupling capacitors. Unfortunately, decoupling capacitors require a significant amount of area and efficient packaging solutions must be leveraged to reduce chip area.

58

59 Clock Skew in Alpha Processor

60 2 Phase, with multiple conditional buffered clocks –2.8 nF clock load –40 cm final driver width Local clocks can be gated “off” to save power Reduced load/skew Reduced thermal issues Multiple clocks complicate race checking t rise = 0.35nst skew = 50ps t cycle = 1.67ns EV6 (Alpha 21264) Clocking 600 MHz – 0.35 micron CMOS Global clock waveform

61 21264 Clocking Hierarchical clocking Trade-off between power and skew Flexibility in types of clocks at each reagion Not shielded

62 EV6 Clock Results GCLK Skew (at Vdd/2 Crossings) ps 5 10 15 20 25 30 35 40 45 50 ps 300 305 310 315 320 325 330 335 340 345 GCLK Rise Times (20% to 80% Extrapolated to 0% to 100%)

63 EV7 Clock Hierarchy + widely dispersed drivers + DLLs compensate static and low- frequency variation + divides design and verification effort - DLL design and verification is added work + tailored clocks Active Skew Management and Multiple Clock Domains

64 Latch based timing We can have comb. Circuits between the two latches of a FF –More flexibility in terms of timing

65 Flip-Flop – Based Timing Flip -flop Logic    Flip-flop delay Skew Logic delay T SU T Clk-Q Representation after M. Horowitz, VLSI Circuits 1996.

66 Latch timing D Clk Q t D-Q t Clk-Q When data arrives to transparent latch When data arrives to closed latch Data has to be ‘re-launched’ Latch is a ‘soft’ barrier

67 Single-Phase Clock with Latches Latch Logic  Clk P PW T skl T skt latch transparent

68 Preventing late arrivals Case 1: - The LM can start ahead of time - c2q limits Case 2: d2q limits Lgk can still operate

69 Preventing late arrivals

70 Preventing Premature Arrivals Otherwise the data loops within the transparent window of time Data should not pass through the latch more than once during its transparent mode

71 Single latch timing

72 Latch-Based Design L1 Latch Logic L2 Latch  L1 latch is transparent when  = 0 L2 latch is transparent when  = 1

73 Latch-Based Timing L1 Latch Logic Path1 Logic L2 Latch    L1 latch L2 latch Skew Can tolerate skew! Long Path 1 Short Path 1 Static logic L2 trans. L1 trans. Hits L2 latch  has to wait till L2 becomes transparent Hits L2 transparent  goes through L2

74 Latch based timing Trans. when high Trans. when low

75 Slack-borrowing tpdA tpdB Trans. when high CLB_B starts before (3) kicks to latch its input. ie, since CLB_A finished earlier than (3), the extra time is passed to CLB_B  again e is valid before (4) to latch the input of the next CLB

76 Example T=125 L4 Becomes transp. at edge  no problem when exactly f arrives L4

77 Design consideration If the falling edge of clk2 comes with too much skew, THL might not be able to latch the previous data because of hold time violation (ie, D2 is overwritten too quickly after the edge) Data available for CLL Hold time violation

78 Domino logic with delays

79 Clock skew

80 No time slack borrowing

81 Skew tolerant domino Can we borrow time?

82 Multiphase

83 Time borrowing is possible

84 Self-timed and Asynchronous Design Functions of clock in synchronous design 1) Acts as completion signal 2) Ensures the correct ordering of events Truly asynchronous design 2) Ordering of events is implicit in logic 1) Completion is ensured by careful timing analysis Self-timed design 1) Completion ensured by completion signal 2) Ordering imposed by handshaking protocol

85 Synchronous Pipelined Datapath What clock does is that: 1- physical timing constraints are met 2- Clock events serve as a logical ordering mechanism for the global system events If we guarantee these two items, we can remove the clock: -power, area, complexity of clock tree…

86 Synch. design It assumes that all clock events or timing references happen simultaneously over the complete circuit. This is not the case in reality, because of effects such as clock skew and jitter. significant current flows over a very short period of time linking of physical and logical constraints has some obvious effects (e.g. throughput)

87 Self-Timed Pipelined Datapath Hand shaking blocks What each signal does? The logical ordering of the operations is ensured by the acknowledge-request scheme, often called a handshaking protocol.

88 Asynch. properties Timing signals are generated locally… no high precision clock distribution over the chip (skew, etc) Separating the physical and logical ordering  Performance (data dependency and no worst case design) The automatic shut-down of blocks that are not in use can result in power savings.(power) Robust to variations in manufacturing and operating conditions such as temperature.

89 Completion Signal Generation

90

91 Completion Signal in DCVSL PDN B0 In1 1 2 2 B1 Start V DD V Done B0 B1

92 Self-Timed Adder

93 Completion Signal Using Current Sensing Minimum delay Data independent  reference!

94 Hand-Shaking Protocol Two Phase Handshake Every transition means that the action is valid! The four events, data change, request, data acceptance, acknowledge proceed in a cyclic order.

95 Event Logic – The Muller-C Element Seq. element

96 2-Phase Handshake Protocol Advantage : FAST - minimal # of signaling events (important for global interconnect) Disadvantage : requires the detection of transitions that may occur in either direction  initialization is important Start from DataReady, Ack=0,0. when go to 1,0, Req=1. The C-element is blocked (and locked), and no new data is sent to the data bus (Req stays high) as long as the transmitted data is not processed by the receiver, no matter what DataReady is.

97 Problem: Self-timed FIFO All 1s or 0s -> pipeline empty Alternating 1s and 0s -> pipeline full

98 2-Phase Protocol

99 Example From [Horowitz] Assume there is a register at the input which loads the data at the beginning of Eval phase

100 Example DataReady1 is asserted.  Req to the second block is asserted, First C-element is locked.  The second block loads data and starts the evaluation process.

101 Example DataReady2 is asserted.  Req to the third block is asserted, Second C-element is locked.  The third block loads data and starts the evaluation process.  The first C-element is released. Can accept a DataReady from the previous stage. (If Req has already come, the first Req is unleashed and goes to eval phase.) DataReady2 is asserted.  Req to the third block is asserted, Second C-element is locked.  The third block loads data and starts the evaluation process.  The first C-element is released. Can accept a DataReady from the previous stage. (If Req has already come, the first Req is unleashed and goes to eval phase.)

102 Example

103 4-Phase Handshake Protocol Slower, but unambiguous Also known as RTZ

104 Problem: 4-Phase Handshake Protocol Implementation using Muller-C elements

105 Example Latches: positive edge-triggered or a level- sensitive implementation (latch when level=1)

106 Self-Resetting Logic Post-charge logic Self- reseting

107 Clock-Delayed Domino This is a style of dynamic logic, where there is no global clock signal. Instead, the clock for one stage is derived from the previous stage.

108

109 Asynchronous-Synchronous Interface

110 Synchronizers and Arbiters Arbiter: Circuit to decide which of 2 events occurred first Synchronizer: Arbiter with clock  as one of the inputs Problem: Circuit HAS to make a decision in limited time - which decision is not important Caveat: It is impossible to ensure correct operation But, we can decrease the error probability at the expense of delay

111 A Simple Synchronizer Data sampled on rising edge of the clock Latch will eventually resolve the signal value, but... this might take infinite time!

112 Synchronizer: Output Trajectories Single-pole model for a flip-flop

113 Mean Time to Failure

114 Example

115 Influence of Noise Low amplitude noise does not influence synchronization behavior

116 Typical Synchronizers Using delay line 2 phase clocking circuit

117 Cascaded Synchronizers Reduce MTF

118 Arbiters

119 PLL-Based Synchronization

120 PLL Block Diagram

121 Phase Detector Output before filtering Transfer characteristic

122 Phase-Frequency Detector

123 PFD Response to Frequency

124 PFD Phase Transfer Characteristic

125 Charge Pump

126 PLL Simulation

127 Clock Generation using DLLs Phase Det Charge Pump Filter DL PDCPVCO ÷N Delay-Locked Loop (Delay Line Based) Phase-Locked Loop (VCO-Based) U D U D f REF fOfO fOfO Filter

128 Delay Locked Loop

129 DLL-Based Clock Distribution


Download ppt "Timing Issues Mohammad Sharifkhani. Reading Textbook II, Chapter 10 Textbook I, Chapters 12 and 13."

Similar presentations


Ads by Google