Download presentation

Presentation is loading. Please wait.

Published byIsaiah Colmer Modified about 1 year ago

1
Timing Issues Mohammad Sharifkhani

2
Reading Textbook II, Chapter 10 Textbook I, Chapters 12 and 13

3
Motivation Time is the essence! –We do things in order, do does the processors Procedural dependency Resource Reusability Synchronous architectures are preferred –Ease of implementation –Predictability –Compatibility with well known arithmetic algorithms A reference clock plays a key role –We usually neglect the non-idealities in the clock in the design cycle

4
Timinng

5
Clock frequency

6
Two signals Signals that can only transition at predetermined times with respect to a signal clock are called “{syn,meso,plesio}chronous” An asynchronous signal can transition at any arbitrary time.

7
Definitions data passed between two different clock domains

8
Mesochronous Timing

9
unknown interconnect delay

10
Pelsichronous two interacting modules have independent clocks generated from separate crystal oscillators

11
Asynchronous Interconnect No clock is needed Speed is determined by job completion

12
Hand Shaking The four-phase handshake is level-sensitive while the two- phase handshake is edge- triggered (lower transitions at the expense of edge triggered circuitry). System A places data on the bus. It then raises Req to indicate that the data is valid. System B samples the data when it sees a high value on Req and raises Ack to indicate that the data has been captured. System A lowers Req, then system B lowers Ack. Req is not synch to clkB synchronizer is needed Req is not synch to clkB synchronizer is needed

13
Hand Shaking (Cont’)

14
Synchronous Timing

15
A quick look

16
Timing Definitions and Basics

17
Latch Parameters D Q Clk t c-q t hold PW m t su t d-q Delays can be different for rising and falling data transitions T Transparent Opaque

18
Register Parameters D Q Clk t c-q t hold T t su Delays can be different for rising and falling data transitions

19
Clock Uncertainties Sources of clock uncertainty

20
Clock Nonidealities Clock skew –Spatial variation in temporally equivalent clock edges; deterministic + random, t SK Clock jitter –Temporal variations in consecutive edges of the clock signal; modulation + random noise –Cycle-to-cycle (short-term) t JS –Long term t JL Variation of the pulse width –Important for level sensitive clocking

21
Clock Skew and Jitter Both skew and jitter affect the effective cycle time Only skew affects the race margin Clk t SK t JS

22
Clock Skew and Jitter Do not touch the clock signal if not necessary! –Sometimes the simplest architecture is the safest –But not necessarily the lowest power! Clk t SK t JS

23
Clock skew and Jitter Data and state independent clock distribution is desired Enabled FF is a popular choice in the design Consider clock load on power!

24
Clock Skew # of registers Clk delay Insertion delay Max Clk skew Earliest occurrence of Clk edge Nominal – /2 Latest occurrence of Clk edge Nominal + /2

25
Positive and Negative Skew

26
Positive Skew Launching edge arrives before the receiving edge

27
Negative Skew Receiving edge arrives before the launching edge

28
Timing Constraints (positive skew) Minimum cycle time: T + > t c-q + t su + t logic Worst case is when receiving edge arrives early (positive ) More time to process the data

29
Timing Constraints (positive skew) Hold time constraint: t (c-q, cd) + t (logic, cd) > t hold + Worst case is when receiving edge arrives late Race between data and clock (positive skew) Otherwise it can not latch In1 before it changes after CLK1 edge 1 t hold independent of the T

30
Considerations δ > 0—This corresponds to a clock routed in the same direction as the flow of the data through the pipeline. The skew has to be strictly controlled. If this constraint is not met, the circuit does malfunction independent of the clock period.

31
Question Would there be any race if the skew is negative? What would you do to avoid race?

32
Negative Skew δ < 0—When the clock is routed in the opposite direction of the data, the skew is negative and condition to avoid race is unconditionally met. The circuit operates correctly independent of the skew. The skew reduces the time available for actual computation so that the clock period, T, has to be increased by |δ|. If race (hold time) is a problem, route the clock in the opposite direction

33
Impact of Jitter Both skew and jitter should be accounted for in feedback structures

34
Longest Logic Path in Edge-Triggered Systems Clk T T SU T Clk-Q T LM Latest point of launching considering jitter Earliest arrival of next cycle T JI +

35
Clock Constraints in Edge-Triggered Systems If launching edge is late and receiving edge is early, the data will not be too late if: Minimum cycle time is determined by the maximum delays through the logic T c-q + T LM + T SU < T – T JI,1 – T JI,2 - T c-q + T LM + T SU + + 2 T JI < T Skew can be either positive or negative

36
Shortest Path Clk T Clk-Q T Lm Earliest point of launching Data must not arrive before this time Clk THTH Nominal clock edge

37
Clock Constraints in Edge-Triggered Systems Minimum logic delay If launching edge is early and receiving edge is late: T c-q + T LM – T JI,1 > T H + T JI,2 + T c-q + T LM > T H + 2T JI +

38
False path Path 1 (5 tgate) never exercised. If A = 1, the critical path goes through OR1 and OR2. If A = 0 and B = 0, the critical path is through I1,OR1 and OR2 (corresponding to a delay of 3 tgate). For the case when A= 0 and B =1, the critical path is through I1,OR1, AND3 and OR2. Does not depend on C,D.

39
How to counter Clock Skew?

40
Sources of uncertainity

41
Device variation Variation Matching –Poly orientation –Dopant profiles Can be modeled and compensated for

42
Interconnect variation (ILD)

43
Pattern and ILD correlation Use of fillers is necessary

44
Temp. and Power Temp. –Time varying (milisecond) –Effect of clock gating –Has a gradient systematic compensated for Power –Instantaneous IR Drop (switching activity) –Jitter (short pulses, data dependent) –Can not be compensated for (only decoupling caps)

45
Data dependent loading It is modeled as a form of jitter due to its random nature Capacitive coupling and X-talk works the same way.

46
Clock Distribution Clock is distributed in a tree-like fashion H-tree

47
Example Clock H-Tree –Clock skew: time difference between the arrival time of the clock signal between two leaves –Identical branches and leaves

48
Example Considering three parameters: –Both FETs and wires; 64 samples + main buffer –All deterministic factors are nulled out only within chip variation is considered –Random ΔL of FET with distribution stat: N(0, 0.035um) –Random ΔW of wires with N(0,0.25um) –Spatial ΔL; ΔL = w 0 +w x.x+w y.y

49
Example

50
Results –In case of Random ΔL 139ps vs. 171ps without considering spatial constraints –In case of Random ΔW 41ps vs. 49ps –Without considering spatial constraints; worst case is too pessimistic

51
More realistic H-tree [Restle98] 10 Balanced segments Each segments contain 580 drivers All-RC matched If we leave Clock Tree for last minute we may end-up with multiple timing constraints violations!

52
The Grid System No rc-matching Large power Absolute delay is minimized Allows late design changes

53
Examples Alpha (0.75um) 200MHz Clock load 3.25nF (40%) Skew < 200pSec (10%)

54
Example: DEC Alpha 21164

55
21164 Clocking 2 phase single wire clock, distributed globally 2 distributed driver channels –Reduced RC delay/skew –Improved thermal distribution –3.75nF clock load –58 cm final driver width Local inverters for latching Conditional clocks in caches to reduce power More complex race checking Device variation Skew: 90pSec (65pSec effective) t rise = 0.35ns t skew = 150ps t cycle = 3.3ns Clock waveform Location of clock driver on die pre-driver final drivers

56
Clock buffers carefully sized to minimize the skew The direction of the clock is considered One gate between the latches Dummy fillers (increase cap) –Dummies are shielded Clocking

57
Reducing Skew 1. balance clock paths from a central distribution source to individual clocking elements using H-tree structures 2. The use of local clock grids (instead of routed trees) can reduce skew at the cost of increased capacitive load and power dissipation. 3. If data dependent clock load variations causes significant jitter, differential registers that have a data independent clock load should be used. –The use of gated clocks to save also results in data dependent clock load and increased jitter. In clock networks where the fixed load is large (e.g., using clock grids), the data dependent variation might not be significant. 4. If data flows in one direction, route data and clock in opposite directions. This eliminates races at the cost of performance. 5. shielding clock wires from adjacent signal wires 6. ILD: Dummy fills 7. Temperature: delay locked loops as discussed later in this chapter can easily compensate for temperature variations. 8. Power supply variation : on-chip decoupling capacitors. Unfortunately, decoupling capacitors require a significant amount of area and efficient packaging solutions must be leveraged to reduce chip area.

58

59
Clock Skew in Alpha Processor

60
2 Phase, with multiple conditional buffered clocks –2.8 nF clock load –40 cm final driver width Local clocks can be gated “off” to save power Reduced load/skew Reduced thermal issues Multiple clocks complicate race checking t rise = 0.35nst skew = 50ps t cycle = 1.67ns EV6 (Alpha 21264) Clocking 600 MHz – 0.35 micron CMOS Global clock waveform

61
21264 Clocking Hierarchical clocking Trade-off between power and skew Flexibility in types of clocks at each reagion Not shielded

62
EV6 Clock Results GCLK Skew (at Vdd/2 Crossings) ps ps GCLK Rise Times (20% to 80% Extrapolated to 0% to 100%)

63
EV7 Clock Hierarchy + widely dispersed drivers + DLLs compensate static and low- frequency variation + divides design and verification effort - DLL design and verification is added work + tailored clocks Active Skew Management and Multiple Clock Domains

64
Latch based timing We can have comb. Circuits between the two latches of a FF –More flexibility in terms of timing

65
Flip-Flop – Based Timing Flip -flop Logic Flip-flop delay Skew Logic delay T SU T Clk-Q Representation after M. Horowitz, VLSI Circuits 1996.

66
Latch timing D Clk Q t D-Q t Clk-Q When data arrives to transparent latch When data arrives to closed latch Data has to be ‘re-launched’ Latch is a ‘soft’ barrier

67
Single-Phase Clock with Latches Latch Logic Clk P PW T skl T skt latch transparent

68
Preventing late arrivals Case 1: - The LM can start ahead of time - c2q limits Case 2: d2q limits Lgk can still operate

69
Preventing late arrivals

70
Preventing Premature Arrivals Otherwise the data loops within the transparent window of time Data should not pass through the latch more than once during its transparent mode

71
Single latch timing

72
Latch-Based Design L1 Latch Logic L2 Latch L1 latch is transparent when = 0 L2 latch is transparent when = 1

73
Latch-Based Timing L1 Latch Logic Path1 Logic L2 Latch L1 latch L2 latch Skew Can tolerate skew! Long Path 1 Short Path 1 Static logic L2 trans. L1 trans. Hits L2 latch has to wait till L2 becomes transparent Hits L2 transparent goes through L2

74
Latch based timing Trans. when high Trans. when low

75
Slack-borrowing tpdA tpdB Trans. when high CLB_B starts before (3) kicks to latch its input. ie, since CLB_A finished earlier than (3), the extra time is passed to CLB_B again e is valid before (4) to latch the input of the next CLB

76
Example T=125 L4 Becomes transp. at edge no problem when exactly f arrives L4

77
Design consideration If the falling edge of clk2 comes with too much skew, THL might not be able to latch the previous data because of hold time violation (ie, D2 is overwritten too quickly after the edge) Data available for CLL Hold time violation

78
Domino logic with delays

79
Clock skew

80
No time slack borrowing

81
Skew tolerant domino Can we borrow time?

82
Multiphase

83
Time borrowing is possible

84
Self-timed and Asynchronous Design Functions of clock in synchronous design 1) Acts as completion signal 2) Ensures the correct ordering of events Truly asynchronous design 2) Ordering of events is implicit in logic 1) Completion is ensured by careful timing analysis Self-timed design 1) Completion ensured by completion signal 2) Ordering imposed by handshaking protocol

85
Synchronous Pipelined Datapath What clock does is that: 1- physical timing constraints are met 2- Clock events serve as a logical ordering mechanism for the global system events If we guarantee these two items, we can remove the clock: -power, area, complexity of clock tree…

86
Synch. design It assumes that all clock events or timing references happen simultaneously over the complete circuit. This is not the case in reality, because of effects such as clock skew and jitter. significant current flows over a very short period of time linking of physical and logical constraints has some obvious effects (e.g. throughput)

87
Self-Timed Pipelined Datapath Hand shaking blocks What each signal does? The logical ordering of the operations is ensured by the acknowledge-request scheme, often called a handshaking protocol.

88
Asynch. properties Timing signals are generated locally… no high precision clock distribution over the chip (skew, etc) Separating the physical and logical ordering Performance (data dependency and no worst case design) The automatic shut-down of blocks that are not in use can result in power savings.(power) Robust to variations in manufacturing and operating conditions such as temperature.

89
Completion Signal Generation

90

91
Completion Signal in DCVSL PDN B0 In B1 Start V DD V Done B0 B1

92
Self-Timed Adder

93
Completion Signal Using Current Sensing Minimum delay Data independent reference!

94
Hand-Shaking Protocol Two Phase Handshake Every transition means that the action is valid! The four events, data change, request, data acceptance, acknowledge proceed in a cyclic order.

95
Event Logic – The Muller-C Element Seq. element

96
2-Phase Handshake Protocol Advantage : FAST - minimal # of signaling events (important for global interconnect) Disadvantage : requires the detection of transitions that may occur in either direction initialization is important Start from DataReady, Ack=0,0. when go to 1,0, Req=1. The C-element is blocked (and locked), and no new data is sent to the data bus (Req stays high) as long as the transmitted data is not processed by the receiver, no matter what DataReady is.

97
Problem: Self-timed FIFO All 1s or 0s -> pipeline empty Alternating 1s and 0s -> pipeline full

98
2-Phase Protocol

99
Example From [Horowitz] Assume there is a register at the input which loads the data at the beginning of Eval phase

100
Example DataReady1 is asserted. Req to the second block is asserted, First C-element is locked. The second block loads data and starts the evaluation process.

101
Example DataReady2 is asserted. Req to the third block is asserted, Second C-element is locked. The third block loads data and starts the evaluation process. The first C-element is released. Can accept a DataReady from the previous stage. (If Req has already come, the first Req is unleashed and goes to eval phase.) DataReady2 is asserted. Req to the third block is asserted, Second C-element is locked. The third block loads data and starts the evaluation process. The first C-element is released. Can accept a DataReady from the previous stage. (If Req has already come, the first Req is unleashed and goes to eval phase.)

102
Example

103
4-Phase Handshake Protocol Slower, but unambiguous Also known as RTZ

104
Problem: 4-Phase Handshake Protocol Implementation using Muller-C elements

105
Example Latches: positive edge-triggered or a level- sensitive implementation (latch when level=1)

106
Self-Resetting Logic Post-charge logic Self- reseting

107
Clock-Delayed Domino This is a style of dynamic logic, where there is no global clock signal. Instead, the clock for one stage is derived from the previous stage.

108

109
Asynchronous-Synchronous Interface

110
Synchronizers and Arbiters Arbiter: Circuit to decide which of 2 events occurred first Synchronizer: Arbiter with clock as one of the inputs Problem: Circuit HAS to make a decision in limited time - which decision is not important Caveat: It is impossible to ensure correct operation But, we can decrease the error probability at the expense of delay

111
A Simple Synchronizer Data sampled on rising edge of the clock Latch will eventually resolve the signal value, but... this might take infinite time!

112
Synchronizer: Output Trajectories Single-pole model for a flip-flop

113
Mean Time to Failure

114
Example

115
Influence of Noise Low amplitude noise does not influence synchronization behavior

116
Typical Synchronizers Using delay line 2 phase clocking circuit

117
Cascaded Synchronizers Reduce MTF

118
Arbiters

119
PLL-Based Synchronization

120
PLL Block Diagram

121
Phase Detector Output before filtering Transfer characteristic

122
Phase-Frequency Detector

123
PFD Response to Frequency

124
PFD Phase Transfer Characteristic

125
Charge Pump

126
PLL Simulation

127
Clock Generation using DLLs Phase Det Charge Pump Filter DL PDCPVCO ÷N Delay-Locked Loop (Delay Line Based) Phase-Locked Loop (VCO-Based) U D U D f REF fOfO fOfO Filter

128
Delay Locked Loop

129
DLL-Based Clock Distribution

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google