Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 17: Dynamic Sequential Circuits And Timing Issues [Adapted from Rabaey’s Digital Integrated Circuits,

Similar presentations


Presentation on theme: "CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 17: Dynamic Sequential Circuits And Timing Issues [Adapted from Rabaey’s Digital Integrated Circuits,"— Presentation transcript:

1 CMPEN 411 VLSI Digital Circuits Spring Lecture 17: Dynamic Sequential Circuits And Timing Issues [Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, © J. Rabaey, A. Chandrakasan, B. Nikolic]

2 This Lecture Reading Dynamic sequential circuits
Reading assignment – Rabaey, et al, 7.3, 7.7 Timing issues, Intro to datapath design Reading assignment – Rabaey, et al, ; Next lecture Intro to datapath design Reading assignment – Rabaey, et al, Adder design Reading assignment – Rabaey, et al, 11.3

3 Last Lecture: Static MS ET Implementation
Slave Master Q D clk QM I1 I2 I3 I4 I5 I6 T2 T1 T3 T4 For class handout !clk clk

4 Dynamic ET Flipflop master slave T1 T2 I1 I2 Q QM D C1 C2 !clk clk
tsu = thold = tc-q = tpd_tx zero master transparent slave hold 2 tpd_inv + tpd_tx For lecture C1 is the gate cap of I1, the junction cap of T1 and the overlap gate cap of T1 8 transistors, so very efficient tsetup is delay of the transmission gate (time it takes C1 to sample D input) thold is zero since T1 is turned off on the clock edge so further input changes are ignored tc-q is two inverter delays plus the delay of T2 Remember – dynamic nodes (C1 and C2) only hold their state so long, so ff has to be refreshed periodically to prevent state loss due to charge leakage Typical question form class is “why do you need the inverters?” – because inverters add necessary robustness (regenerative whereas transmission gates aren’t) and the gate capacitances contribute to C1 (and C2) so there is enough capacitance to hold the state for a reasonable period of time. !clk clk master hold slave transparent

5 Pseudostatic Dynamic Latch
Robustness considerations limit the use of dynamic FF’s coupling between signal nets and internal storage nodes can inject significant noise and destroy the FF state leakage currents cause state to leak away with time internal dynamic nodes don’t track fluctuations in VDD that reduces noise margins A simple fix is to make the circuit pseudostatic clk QM Q adding a weak feedback inverter to each latch (or at least, for sure the slave) comes at a slight cost in delay (adds to the capacitive load) and power consumption, but it improves noise immunity significantly !clk Add above logic added to all dynamic latches

6 Dynamic ET FF Race Conditions
!clk clk QM T1 I1 T2 I2 D Q C1 C2 clk !clk 0-0 overlap race condition toverlap0-0 < tT1 + tI1 + tT2 clock overlap leads to race conditions 1-1 race fixed by enforcing a hold time - data must be stable during the high-high overlap period 0-0 race fixed by making sure there is enough delay between D and C2 so that new data sampled by the master does not propagate to the slave (can be ensured by enforcing appropriate setup time) clk !clk 1-1 overlap race condition toverlap1-1 < thold

7 Fix 1: Dynamic Two-Phase ET FF
Keep clock non-overlap large enough, but with 4 clock singals to route clk1 clk2 QM T1 I1 T2 I2 D Q C1 C2 !clk1 !clk2 master transparent slave hold clk1 Keep clock nonoverlap time large enough that no overlap occurs even in the presence of clock skew But now have 4 clock signals to route! tnon_overlap clk2 master hold slave transparent

8 Fix 2: C2MOS (Clocked CMOS) ET Flipflop
A clock-skew insensitive FF clk !clk QM C1 C2 Q D M1 M3 M4 M2 M6 M8 M7 M5 Master Slave For class handout !clk clk

9 C2MOS (Clocked CMOS) ET Flipflop
A clock-skew insensitive FF clk !clk QM C1 C2 Q D M1 M3 M4 M2 M6 M8 M7 M5 Master Slave on off on off For lecture Positive edge-triggered MS flipflop, just like the one two slides ago (and again only 8 transistors and 4 clock loads), however with one important difference A C2MOS flipflp with clk and !clk clocking is insensitive to clock overlap as long as the rise and fall times of the clock edges are sufficiently small master transparent slave hold !clk clk master hold slave transparent

10 C2MOS FF 0-0 Overlap Case Clock-skew insensitive as long as the rise and fall times of the clock edges are sufficiently small M2 M6 M4 M8 QM Q D C1 C2 M1 M5 Does any new data sampled during the overlap window propagate to Q (race)? New data is sampled on QM, but cannot propagate to Q since M7 is off (slave is in hold). Any new data sampled on the falling clock edge is not seen at Q For clocking on left – at the end of the overlap period !clk = 1 and both M7 and M8 are turned off, putting the slave stage in the hold mode For the clocking on the right – at the end of the overlap period clk = 1 and both M3 and M4 are turned off, putting the master in the hold mode (affects setup time as well) Means that the FF is slower (slower tc-q time) !clk clk !clk clk

11 C2MOS FF 1-1 Overlap Case QM Q D 1 C1 1 C2 !clk clk !clk clk M2 M6 M3
Does any new data sampled during the overlap window (right after the clock goes high) propagate to Q (race)? New data is sampled on QM, but cannot propagate to Q since M8 is off (slave is in hold). Any new data sampled on the falling clock edge is not seen at Q A bit more problematic than 0-0 overlap. For clocking on the right - must enforce a hold time on D, so that D changing that makes it to QM is not copied to Q when overlap time is over (and !clk goes to zero turning on M8) - first clocking condition. By imposing a hold time on D - that D must be stable during clock overlap - overcome this problem as well However, if the rise/fall times of the clock are sufficiently slow, have possible race. Works correctly as long as the clock rise/fall times is smaller than approximately five times the propagation delay of the flipflop.

12 Fix 3: True Single Phase Clocked (TSPC) Latches
Negative Latch Positive Latch Q clk clk In In clk clk Q Uses only a single clock – so no clock overlap (skew) to worry about; also reduced clock load Transparent mode is equivalent to two cascaded inverters (latch is non-inverting) hold when clk = 1 transparent when clk = 0 transparent when clk = 1 hold when clk = 0

13 Embedding Logic in TSPC Latch
clk A Q B PUN Q In clk clk PDN Can embed logic into latch (or ff) - reduces the delay overhead associated with the latch. Set-up time increased, but overall performance improved: the increase in the set-up time is typically smaller than the delay of an AND gate. E.g., using minimum size devices set-up of AND latch is 140 psec. Using the conventional approach of AND gate followed by latch has an effective set-up time of 600 psec. Technique used extensively in the design of the EV4 DEC Alpha microprocessor and many other high performance processors.

14 TSPC ET FF clk D Master Slave Q QM on off on off master transparent
slave hold For lecture Clock load of 4 transistors (similar to transmission gate or C2MOS) but only one clock to drive and route (12 transistors as compared to 8 in the previous two designs) Virtually all constraints removed - no clocks to overlap, no race Warning - similar to C2MOS, TSPC malfunctions when the slope of the clock is not sufficiently steep. Slow clock cause both the NMOS and PMOS clocked transistors to be on simultaneously, resulting in undefined values of the states and race conditions. Clock slopes thus must be carefully engineered. If necessary, local buffers must be introduced to ensure the quality of the clock signal. master hold slave transparent clk

15 Choosing a Clocking Strategy
Choosing the right clocking scheme affects the functionality, speed, and power of a circuit Two-phase designs + robust and conceptually simple - need to generate and route two clock signals - have to design to accommodate possible skew between the two clock signals Single phase designs + only need to generate and route one clock signal + supported by most automated design methodologies + don’t have to worry about skew between the two clocks - have to have guaranteed slopes on the clock edges

16 Review: Sequential Definitions
Use two, level sensitive latches of opposite type to build one master-slave flipflop that changes state on a clock edge (when the slave is transparent) Static storage static uses a bistable element with feedback to store its state and thus preserves state as long as the power is on Loading new data into the element: 1) cutting the feedback path (mux based); 2) overpowering the feedback path (SRAM based) Dynamic storage dynamic stores state on parasitic capacitors so the state held for only a period of time (milliseconds); requires periodic refresh dynamic is usually simpler (fewer transistors), higher speed, lower power but due to noise immunity issues always modify the circuit (by adding a feedback loop on the output) so that it is pseudostatic Dynamic storage requires periodic refresh of the value. Reading the value of the stored signal from a capacitor without disrupting the charge requires the availability of a device with a high input impedance

17 Timing Classifications
Synchronous systems All memory elements in the system are simultaneously updated using a globally distributed periodic synchronization signal (i.e., a global clock signal) Functionality is ensure by strict constraints on the clock signal generation and distribution to minimize Clock skew (spatial variations in clock edges) Clock jitter (temporal variations in clock edges) Asynchronous systems Self-timed (controlled) systems No need for a globally distributed clock, but have asynchronous circuit overheads (handshaking logic, etc.) Hybrid systems Synchronization between different clock domains Interfacing between asynchronous and synchronous domains

18 Review: Synchronous Timing Basics
Combinational logic In D Q D Q tclk1 tclk2 clk tc-q, tsu, thold, tcdreg tplogic, tcdlogic Under ideal conditions (i.e., when tclk1 = tclk2) T  tc-q + tplogic + tsu thold ≤ tcdlogic + tcdreg Under real conditions, the clock signal can have both spatial (clock skew) and temporal (clock jitter) variations skew is constant from cycle to cycle (by definition); skew can be positive (clock and data flowing in the same direction) or negative (clock and data flowing in opposite directions) jitter causes T to change on a cycle-by-cycle basis Skew delta is tphi’’ – tphi’ where tphi’s are the local clock times. Skew can be positive or negative depending on the routing direction of the clock. tmin’s are the best case delays of the logic, tmax are the worst case delays (assume register set up time is included in the combinational logic time). ti is the propagation delay of the interconnect. Clock skew is due to spatial variations in the arrival time of a clock transition skew is constant from cycle to cycle (by definition) can be positive (clock and data flowing in the same direction) or negative (clock and data flowing in the opposite direction) Clock jitter is due to temporal variations in the clock period (T changes on a cycle-by-cycle basis)

19 Sources of Clock Skew and Jitter in Clock Network
power supply 4 3 interconnect 6 capacitive load clock generation 1 7 capacitive coupling PLL 2 clock drivers 5 temperature Skew manufacturing device variations in clock drivers interconnect variations environmental variations (power supply and temperature) Jitter clock generation capacitive loading and coupling environmental variations (power supply and temperature) clock is distributed using multiple matched paths to the sequential elements. The clock path includes the wiring and the associated distributed buffers required to drive the interconnect and loads. What matters is the relative arrival time at the register points at the end of each path – not the absolute delay through the clock distribution path. Both systematic (nominally identical from chip to chip and predictable – so easy to model and correct for at design time) and random (due to manufacturing variations – so are difficult to model and eliminate)

20 Positive Clock Skew Clock and data flow in the same direction T :
Combinational logic In D Q D Q tclk1 tclk2 clk delay T T +  1 3  > 0 2 4  + thold T : thold : T +   tc-q + tplogic + tsu so T  tc-q + tplogic + tsu -  thold +  ≤ tcdlogic + tcdreg so thold ≤ tcdlogic + tcdreg -  For lecture clock skew has the potential to improve the performance of the circuit. Unfortunately, increasing skew makes the circuit more susceptible to race conditions! If the minimum delay of the combinational logic block is small, the inputs to R2 may change before R2’s first rising edge. To avoid races, we must ensure that the minimum delay through the register and logic must e long enough that the inputs to R2 are valid for a hold time after that edge. Reducing the clock frequency can’t fix it!  > 0: Improves performance, but makes thold harder to meet. If thold is not met (race conditions), the circuit malfunctions independent of the clock period!

21 Negative Clock Skew Clock and data flow in opposite directions T :
Combinational logic In D Q D Q tclk1 tclk2 clk delay T T +  1 3 2 4  < 0 T : thold : T +   tc-q + tplogic + tsu so T  tc-q + tplogic + tsu -  thold +  ≤ tcdlogic + tcdreg so thold ≤ tcdlogic + tcdreg -  For lecture a negative skew adversely impacts the performance of the system. However, assuming thold + delta < tcdreg + tcdlogic, a negative skew implies that the system never fails since edge 2 happens before edge 1 – i.e., there is never a race condition. Unfortunately, for general logic signals flow in both directions so skew can be both positive and negative in the same circuit  < 0: Degrades performance, but thold is easier to meet (eliminating race conditions)

22 Clock Jitter Jitter causes T to vary on a cycle-by- cycle basis
Combinational logic In tclk clk T -tjitter +tjitter T : T - 2tjitter  tc-q + tplogic + tsu so T  tc-q + tplogic + tsu + 2tjitter for lecture Jitter directly reduces the performance of a sequential circuit

23 Combined Impact of Skew and Jitter
Constraints on the minimum clock period ( > 0) R1 R2 Combinational logic In D Q D Q tclk1 tclk2 T T +  1  > 0 6 12 -tjitter T  tc-q + tplogic + tsu -  + 2tjitter thold ≤ tcdlogic + tcdreg –  – 2tjitter  > 0 with jitter: Degrades performance, and makes thold even harder to meet. (The acceptable skew is reduced by jitter.)

24 Clock Distribution Networks
Clock skew and jitter can ultimately limit the performance of a digital system, so designing a clock network that minimizes both is important In many high-speed processors, a majority of the dynamic power is dissipated in the clock network. To reduce dynamic power, the clock network must support clock gating (shutting down (disabling the clock) units) Clock distribution techniques Balanced paths (H-tree network, matched RC trees) In the ideal case, can eliminate skew Could take multiple cycles for the clock signal to propagate to the leaves of the tree Clock grids Typically used in the final stage of the clock distribution network Minimizes absolute delay (not relative delay)

25 H-Tree Clock Network If the paths are perfectly balanced, clock skew is zero Clock Can insert clock gating at multiple levels in clock tree Can shut off entire subtree if all gating conditions are satisfied Clock Idle condition Gated clock Things to consider – interconnect material used for routing clock, shape of network, clock drivers and buffers used, load on clock lines, rise and fall time of clock (may have to consider transmission line effects as well!!) The H-tree network also helps with the clock skew problem (helps to minimize skew between neighboring elements) since only relative skew is important.

26 Clock Grid Network Distributed buffering reduces absolute delay and makes clock gating easier, but is sensitive to variations in the buffer delay The secondary buffers isolate the local clock nets from the upstream load and amplify the clock signals degraded by the RC network decreases absolute skew gives steeper clocks Only have to bound the skew within the local logic area local logic area Clock main clock buffer secondary clock buffers

27 DEC Alpha (EV5) Example 300 MHz clock (9.3 million transistors on a 16.5x18.1 mm die in 0.5 micron CMOS technology) single phase clock 3.75 nF total clock load Extensive use of dynamic logic 20 W (out of 50) in clock distribution network Two level clock distribution Single 6 inverter stage main clock buffer at the center of the chip Secondary clock buffers drive the left and right sides of the clock grid in m3 and m4 Total equivalent driver size of 58 cm !! 0.55 micron CMOS, 4 layer metal Clock load accounts for 40% of the total effective capacitance of the chip

28 Secondary Clock Buffers

29 Clock Skew in Alpha Processor
Absolute skew smaller than 90 ps The critical instruction and execution units all see the clock within 65 ps

30 ASIC example

31 Microprocessor example

32 Dealing with Clock Skew and Jitter
To minimize skew, balance clock paths using H-tree or matched-tree clock distribution structures. If possible, route data and clock in opposite directions; eliminates races at the cost of performance. The use of gated clocks to help with dynamic power consumption make jitter worse. Shield clock wires (route power lines – VDD or GND – next to clock lines) to minimize/eliminate coupling with neighboring signal nets. Use dummy fills to reduce skew by reducing variations in interconnect capacitances due to interlayer dielectric thickness variations. Beware of temperature and supply rail variations and their effects on skew and jitter. Power supply noise fundamentally limits the performance of clock networks.

33 Clock Skew Scheduling 12 16 Minimum clock period with zero skew 16 16
A C

34 Clock Skew Scheduling i j k 12 16 1 T = 15 pulse at i,k tight
pulse at j C max 16 12

35 Clock Scaling

36 Next Lecture and Reminders
Intro to datapath design Reading assignment – Rabaey, et al, Adder design Reading assignment – Rabaey, et al, 11.3


Download ppt "CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 17: Dynamic Sequential Circuits And Timing Issues [Adapted from Rabaey’s Digital Integrated Circuits,"

Similar presentations


Ads by Google