Presentation on theme: "경종민 1 Clock skew and signal reflection."— Presentation transcript:
경종민 1 Clock skew and signal reflection
2 1. Clocking Schemes based on each storage element Waveforms for D-latch, +ve edge-triggered D-f/f, and 2-phase double latch(latter two are equivalent to each other)
3 Finite State Machines based on each storage element Clk1 and clk2 are non-overlapping each other
4 Clock skewpositive skew negative skew CL 12 clk clk’ signal direction clk +ve -ve –Positive skew t delay,min must be obeyed. Otherwise, 2nd f/f, at the current sample point, samples the next value, not the current one (which is the correct one). called double clocking –Negative skew T p -(t delay,max + t setup ) must be obeyed. Otherwise, 2nd f/f, at the next sample point, samples the old value, not the updated value(which is the correct one). clk ’
5 Single-phase system with edge-triggered flip-flops negative skew 의 경우임
6 i) maximum allowable clock skew = t skew,max (Race equation) to prevent race condition, i.e., to prevent f/f from deciding Q with next input rather than current input. t skew,max < t f/f,min +t cl,min - t hold,max (t hold :min. time a signal needs to stay stable after clock edge) ii) min. clock cycle time for correct operation with stable f/f inputs considering clock skew(Delay equation) T p,min >t f/f,max +t cl.max +t setup,max - t skew,max iii) t clk-width >t hold, to guarantee correct data capture.
7 Single-phase system with latches - ve skew 의 경우임
8 i) Race condition: double-sided constraint on clock width, t clk-width –clock width must be greater than t setup.( t setup for latch is the min. time a signal should remain stable before the fall of clock edge) t clk-width t setup,max –clock width must be shorter than the sum of 1-stage delay(consisting of t latch and t cl ) minus hold time and skew, to prevent any signal from passing through more than one stage. T clk-width t latch,min + t cl,min - t hold,max - t skew,max ii) min. cycle time(in the critical stage) t cycle,min > t latch,max + t cl,max + t setup,max + t skew,max - t clk-width,min –some delay as much as this can be transferred to the preceding or succeeding non-critical stages.
9 2-phase non-overlapping clock using double latchl
10 Intentional clock skew
11 t cl2,min > t cl1,min, t cl3,min 일 때 –CLK ’ 은 앞으로 CLK ” 은 뒤로 shift 시킴으로써 –CL1, CL3 에서 남은 시간을 CL2 에서 활용 – 이때 CLK ’ 을 너무 advance( 혹은 CLK ” 를 너무 delay) 시키면 CL2 의 min. delay path 가 t a (edge-triggered f/f 인 경우 ) 혹은 t b (latch 인 경우 ) 보다 짧게 되어 race 가 발생할 수 있다.
12 Relation between race condition( on max, clock skew) and delay condition( on min. clock period) i) when data & clk are running in the same direction(positive skew) Clock skew should be tightly controlled to prevent race condition. With +ve skew, clock frequency can be increased for higher performance. ii) when data and clock are running in opposite direction(negative skew) No need to worry about race condition, But, -ve skew degrades the performance by increasing the min. clock period according to the delay equation.
13 How to suppress race condition 1) routing clock in the opposite direction of data(easy to implement in datapath) at the cost of performance degradation 2) controlling the non-overlap periods of clock( in 2-phase clocking) 3) Try to obtain good clock distribution network to obtain as uniform clock skew as possible at the local clock point.( Absolute skew between clock source and local clock point is irrelevant) 4) Clock dist. Network interconnect material shape of the dist. Network clock driver/buffering schemes load, i.e., fan-out on the clock lines rise/fall time of the clock 5) Avoid global clock/ Use self-timed approach
14 2. Clock Distribution Network H-tree as clock dist. Network –clock receiver(photo-diode) at the center receiving sharp laser pulse through a glass window in the package
15 Two-level buffering(Hierarchy)
16 Composition of a PLL(Phase-Locked Loop) i) Loop filter : –loop filter is introduced to remove clock jitter. –1st to 3rd-order LPF is generally needed, as excessive phase shift due to high-order filtering can cause instability in this feedback structure. ii) Lock range : range of input frequency over which output follows input frequency over which output follows input with given relationship. iii) Lock time : time for PLL to lock into the input iv) Jitter : Loop filter(LPF) helps remove jitter.
17 How to minimize clock skew in multi-chip system, i.e., board or multiple-board system. Global Clock Source i) Global dist. Network. ii) On-chip clock generator/buffer; PLL can help here only. iii) Local dist. Network.
18 Each Component of Skew : i) Chip-to-chip clock skew due to global dist. Network ; can be suppressed by ; –placing clock pins/pads at the identical positions on the chip carrier/chip. –Keeping the lead length and capacitive loading of clock pins and wires from the global clock source to each clock pin as identical as possible. ii) Skew due to on-chip clock generator/buffer can be suppressed by PLL;
19 Each Component of PLL(Phase detector, LPF, voltage-controlled delay line)
20 Methodology for dealing with timing problems in LARGE systems ; 1) Divide the whole system into a number of regions, with each region operating in synchronous manner. 2) Communication among each region is either i) through a global clock slower( N) than local clock or ii) asynchronously using self-timed discipline.
21 Using PLL for local synchronization between global & local clocks. Delay of local clock is adjusted via. PLL to make the local clock edge occurring simultaneously with global clock edge.
22 Minimal skew system 1) equal-length chip-to-chip interconnection 2) PLL-based clock generator/buffer, and 3) equal-length on-chip distribution(H-tree)
23 Symmetric clock trees(H- vs X- tree) - H-tree is better than X-tree in that i) in H-tree, no corners sharper than 90 , thus with smaller inductive discontinuity, reflection is small. ii) in H-tree, fan-out is only 2, simplifying impedance matching
24 Reduction of inductive discontinuities at the corners of H-tree.
25 Matching condition at the branch point : - impedance matching occurs when Z k =Z k+1 //Z k+1 = Z k+1 2 Z k+1 ZkZk
26 Driving the clock lines :
27 Sharpening clock signal at the receiver front before distribution in the subblock using schmitt trigger or source-end-terminated buffer.(Look at sharp rise of V b in previous slide.)
28 RC network representation of H-clock tree(simplified as a distributed RC line): When tailored H-clock tree is used, I.e., if the line width is halved at each branching point, above distributed RC tree network is equivalent to a uniformly distributed RC line.(R 1 = R 2 = R 3 = …, C 1 =2C 2 =4C 3 =...)
29 Requirement of the cross-sectional geometry(height, width) of interconnection line : 1) From distributed RC model ; –Total distance from clock source to end point(l tot ) in H-tree : –Time required for the last node to reach 90% of its final value : R int : resistance of interconnection per unit length C int : capacitance per unit length
30 2) From lossy transmission line RC model : Eq. (1), (2) need to be considered in determining H&W. For high frequency, skin effect prevents thickening the interconnection by more than 2-4 times the skin depth ineffective. For 1GHz, skin depth of aluminum is 2.8 m.
31 Simulation of H-clock tree with the last stage unmatched.