Presentation is loading. Please wait.

Presentation is loading. Please wait.

ECE 260B – CSE 241A Clocking 1http://vlsicad.ucsd.edu ECE260B – CSE241A Winter 2005 Clocking Website: Slides.

Similar presentations


Presentation on theme: "ECE 260B – CSE 241A Clocking 1http://vlsicad.ucsd.edu ECE260B – CSE241A Winter 2005 Clocking Website: Slides."— Presentation transcript:

1 ECE 260B – CSE 241A Clocking 1http://vlsicad.ucsd.edu ECE260B – CSE241A Winter 2005 Clocking Website: http://vlsicad.ucsd.edu/courses/ece260b-w05 Slides courtesy of Prof. Andrew B. Kahng

2 ECE 260B – CSE 241A Clocking 2http://vlsicad.ucsd.edu Outline  Problem Statement  Clock Distribution Structures  Robustness / Signal Integrity Control  Clock Design:  Skew Scheduling  Topology Construction  Embedding

3 ECE 260B – CSE 241A Clocking 3http://vlsicad.ucsd.edu Why Clocks?  Clocks provide the means to synchronize l By allowing events to happen at known timing boundaries, we can sequence these events  Greatly simplifies building of state machines  No need to worry about variable delay through combinational logic (CL) l All signals delayed until clock edge (clock imposes the worst case delay) Comb Logic register Comb Logic register DataflowFSM Courtesy K. Yang, UCLA

4 ECE 260B – CSE 241A Clocking 4http://vlsicad.ucsd.edu Clock Distribution Network  General goal of clock distribution l Deliver clock to all memory elements with acceptable skew l Deliver clock edges with acceptable sharpness  Clocking network design is one of the greatest challenges in the design of a large chip l Consume up to 1/3 of chip power l Accurate signal delay l Signal integrity l Subject to uncertainty / variation of different processes / operating conditions

5 ECE 260B – CSE 241A Clocking 5http://vlsicad.ucsd.edu Clock Design Components  Oscillator  Dividers  Buffers l Strong drivers l Reduce delay l Signal integrity / slew rate  Interconnects l Balanced trees, meshes, etc. l Shielding (e.g., for crosstalk reduction) l Non-tree links / feedback loops

6 ECE 260B – CSE 241A Clocking 6http://vlsicad.ucsd.edu Clock Distribution Objective  Minimum / bounded skew l performance / hold time requirements  Guaranteed slew rate / signal integrity  Small insertion delay  Robustness under process / operating condition variation  Minimum cell / routing area  Minimum power consumption

7 ECE 260B – CSE 241A Clocking 7http://vlsicad.ucsd.edu Clock Distribution Robustness Subject to  Radically different loading (flip-flop density) l Across the die l ECO (Engineering Change Order)  Interconnect coupling l Signal integrity l Delay variation  Process variation l From lot-to-lot l Across the die l Buffers l Metal width  Supply voltage variation across the die l Both static IR drop l Dynamic voltage drop  Temperature

8 ECE 260B – CSE 241A Clocking 8http://vlsicad.ucsd.edu Issues in Clock Distribution Network Design  Skew l Process, voltage, and temperature l Data dependence l Noise coupling l Load balancing  Power, CV 2 f (consume up to 1/3 of total chip power) l Clock gating  Flexibility/Tunability l Compactness – fit into existing layout/design l Facilitate ECO

9 ECE 260B – CSE 241A Clocking 9http://vlsicad.ucsd.edu Skew: Clock Delay Varies With Position

10 ECE 260B – CSE 241A Clocking 10http://vlsicad.ucsd.edu Clock Skew Causes  Designed (unavoidable) variations – mismatch in buffer load sizes, interconnect lengths  Process variation – process spread across die yielding different L eff, T ox, etc. values  Temperature gradients – changes MOSFET performance across die  IR voltage drop in power supply – changes MOSFET performance across die  Note: Delay from clock generator to fan-out points (clock latency) is not important by itself l BUT: increased latency leads to larger skew for same amount of relative variation Sylvester / Shepard, 2001

11 ECE 260B – CSE 241A Clocking 11http://vlsicad.ucsd.edu Outline  Problem Statement  Clock Distribution Structures  Robustness / Signal Integrity Control  Clock Design:  Skew Scheduling  Topology Construction  Embedding

12 ECE 260B – CSE 241A Clocking 12http://vlsicad.ucsd.edu Clock Distribution Structures  RC-Tree l Less capacitance l More accuracy l Flexible wiring Grids Reliable Less data dependency Tunable (late in design) Shown here for final stage drivers driving F/F loads

13 ECE 260B – CSE 241A Clocking 13http://vlsicad.ucsd.edu Grids  Gridded clock distribution common on earlier DEC Alpha microprocessors  Advantages: l Skew determined by grid density, not too sensitive to load position l Clock signals available everywhere l Tolerant to process variations l Usually yields extremely low skew values  Disadvantages: l Huge amount of wiring and power l To minimize such penalties, need to make grid pitch coarser  lose the grid advantage Pre- drivers Global grid Sylvester / Shepard, 2001

14 ECE 260B – CSE 241A Clocking 14http://vlsicad.ucsd.edu H-Tree  H-tree (Bakoglu) l One large central driver, recursive structure to match wirelengths l Halve wire width at branching points to reduce reflections  Disadvantages l Slew degradation along long RC paths l Unrealistically large central driver -Clock drivers can create large temperature gradients (ex. Alpha 21064 ~30° C) l Non-uniform load distribution l Inherently non-scalable (wire R growth) l Partial solution: intermediate buffers at branching points courtesy of P. Zarkesh-Ha Sylvester / Shepard, 2001

15 ECE 260B – CSE 241A Clocking 15http://vlsicad.ucsd.edu Buffered H-tree  Advantages l Ideally zero-skew l Can be low power (depending on skew requirements) l Low area (silicon and wiring) l CAD tool friendly (regular)  Disadvantages l Sensitive to process variations -Devices  Want same size buffers at each level of tree -Wires  Want similar segment lengths on each layer in each source-sink path !!! l Local clocking loads inherently non-uniform Sylvester / Shepard, 2001

16 ECE 260B – CSE 241A Clocking 16http://vlsicad.ucsd.edu Tree Balancing Some techniques: a) Introduce dummy loads b) Snaking of wirelength to match delays Con: Routing area often more valuable than Silicon Sylvester / Shepard, 2001

17 ECE 260B – CSE 241A Clocking 17http://vlsicad.ucsd.edu Examples From Processor Chips H-Tree, Asymmetric RC-Tree (IBM) Grids DEC [Alphas] Serpentines Intel x86 [Young ISSCC97]

18 ECE 260B – CSE 241A Clocking 18http://vlsicad.ucsd.edu Example Skews From Processor Chips DEC-Alpha 21064 clock spines DEC-Alpha 21064 RC delays DEC-Alpha 21164 RC delays for Global Distribution (Spine + Grid) DEC-Alpha 21164 RC local delays

19 ECE 260B – CSE 241A Clocking 19http://vlsicad.ucsd.edu ReShape Clocks Example (High-End ASIC)  Balanced, shielded H-tree for pre-clock distribution  Mesh for block level distribution  output mesh  All routes 5-6u M6/5, shielded with 1u grounds  ~10 buffers per node l E.g., ganged BUFx20’s  Output mesh must hit every sub-block

20 ECE 260B – CSE 241A Clocking 20http://vlsicad.ucsd.edu Block Level Mesh (.18u)  Max 600u stride  1u m5 ribs every 20 - 30 u (4 to 6 rows)  Shielded input and output m6 shorting straps  Clumps of 1-6 clock buffers, surrounded by capacitor pads  Pre-clock connects to input shorting straps

21 ECE 260B – CSE 241A Clocking 21http://vlsicad.ucsd.edu Problems with Meshes  Burn more power at low frequencies  Blocks more routing resources (solution: integrated power distribution with ribs can provide shielding for ‘free’)  Difficult for ‘spare’ clock domains that will not tolerate regioning  Post placement (and routing) tuning required  No ‘beneficial skew’ possible  Clock gating only easy at root  Fighting tools to do analysis: l Clumped buffers a problem in Static Timing Analysis tools l Large shorted meshes a problem for STA tools l What does Elmore delay calculation look like for a non-tree? l  Need full extraction and SPICE-like simulation to determine skew

22 ECE 260B – CSE 241A Clocking 22http://vlsicad.ucsd.edu Benefits of Meshes  Deterministic since shielded all the way down to rib distribution  No ECO placement required: all buffers preplaced before block placement  Low latency since uses shorted (= ganged, parallel) drivers, therefore lower skew  ECO placements of FFs later do not require rebalancing of tree  “Idealized” clocking environment for “concurrent dance” of RTL design and timing convergence

23 ECE 260B – CSE 241A Clocking 23http://vlsicad.ucsd.edu Hybrid Structure  Balanced tree on the top  Mesh in the middle l Minimize skew  Steiner minimum tree at the bottom l Minimize cost l Facilitate ECO

24 ECE 260B – CSE 241A Clocking 24http://vlsicad.ucsd.edu Outline  Problem Statement  Clock Distribution Structures  Robustness / Signal Integrity Control  Clock Design:  Skew Scheduling  Topology Construction  Embedding

25 ECE 260B – CSE 241A Clocking 25http://vlsicad.ucsd.edu Process Variation  Intra-die and inter-die variations l Intra-die variation is increasingly significant since 0.13um technology  Systematic and random variations l Systematic variation is due to equipment, process, etc. -Global len aberration in lithograthy causes systematic variation -Pattern-dependent optical proximity, chemical mechanical polish (CMP) l Random variation is due to inherent variation  Spatial correlation across a chip l Fast vs. slow corners

26 ECE 260B – CSE 241A Clocking 26http://vlsicad.ucsd.edu Process Variation  Metal wires l Width variation can be estimated by LUT(width, spacing) l Thickness variation  CMP  local density l Thickness variation also depends on wire width and spacing l Could be up to 30-40% in 90nm process  Transistors l Channel length variation (delay ~ L 1.5 ) l Thin gate oxide tox variation  Vth variation l Up to 30% variation in term of driving capability

27 ECE 260B – CSE 241A Clocking 27http://vlsicad.ucsd.edu Process Variations – SPICE model  Process variations are reflected into a statistical SPICE model Usually only a few parameters have a statistical distribution (e.g. : {  L,  W, T OX,V Tn, V Tp }) and the others are set to a nominal value l The nominal SPICE model is obtained by setting the statistical parameters to their nominal value Slide courtesy of A. Nardi, J. Rabaey, K. Keutzer of UCB

28 ECE 260B – CSE 241A Clocking 28http://vlsicad.ucsd.edu Global Variations (Inter-die) Process variations  Performance variations Critical path delay of a 16-bit adder All devices have the same set of model parameters value Slide courtesy of A. Nardi, J. Rabaey, K. Keutzer of UCB

29 ECE 260B – CSE 241A Clocking 29http://vlsicad.ucsd.edu Local Variations (Intra-die)  Each device instance has a slightly different set of model parameter values (aka device mismatch)  The performance of some analog circuits strongly depends on the degree of matching of device properties  Digital circuits are in general more immune to mismatch, but clock distribution network is sensitive (clock skew) Slide courtesy of A. Nardi, J. Rabaey, K. Keutzer of UCB

30 ECE 260B – CSE 241A Clocking 30http://vlsicad.ucsd.edu Statistical Design  Need to account for process variations during design phase Statistical design –Nominal design –Yield optimization –Design centering Slide courtesy of A. Nardi, J. Rabaey, K. Keutzer of UCB

31 ECE 260B – CSE 241A Clocking 31http://vlsicad.ucsd.edu Statistical Design Slide courtesy of A. Nardi, J. Rabaey, K. Keutzer of UCB

32 ECE 260B – CSE 241A Clocking 32http://vlsicad.ucsd.edu Process Variation Tolerance Enhancement  Rule of thumb: balanced tree l Identical buffers at identical heights l Drive identical subtree loads  Can we do better than this?  Process variation tolerant clock design l Bounded-skew DME l Topology construction -With process variation tolerance in objective l Useful skew scheduling -To the center of permissible ranges

33 ECE 260B – CSE 241A Clocking 33http://vlsicad.ucsd.edu Signal Integrity  Crosstalk l Capacitive, inductive  Supply voltage drop l IR, L dI/dt, LC resonance  Temperature l Increased resistance with higher temperature  Substrate coupling l Parasitic resistance, capacitance in the substrate layer

34 ECE 260B – CSE 241A Clocking 34http://vlsicad.ucsd.edu Crosstalk  Due to the coupling capacitance between interconnections, a signal switching on a net (aggressor) may affect the voltage waveform on a neighboring net (victim) Noise Propagation Increased Delay

35 ECE 260B – CSE 241A Clocking 35http://vlsicad.ucsd.edu Circuit Model for Crosstalk

36 ECE 260B – CSE 241A Clocking 36http://vlsicad.ucsd.edu Crosstalk Simulation

37 ECE 260B – CSE 241A Clocking 37http://vlsicad.ucsd.edu Design for Crosstalk  It can be both capacitive and inductive l Capacitive is dominant at current switching speeds  To reduce it: l Use of shielding layer (inter-layer) l Use of shielding wire (intra-layer) GND V DD GND Substrate

38 ECE 260B – CSE 241A Clocking 38http://vlsicad.ucsd.edu Clock Gating  Reduce power consumption by temporarily shutting down part of the circuit  Additional cost of enabling circuits CLK1 DQ combinational logic FF CLK2 CLK ENABLING

39 ECE 260B – CSE 241A Clocking 39http://vlsicad.ucsd.edu Outline  Problem Statement  Clock Distribution Statement  Robustness / Signal Integrity Control  Clock Design:  Skew Scheduling  Topology Construction  Embedding

40 ECE 260B – CSE 241A Clocking 40http://vlsicad.ucsd.edu Skew = Local Constraint D : longest path d : shortest path FF safe Skew race condition cycle time violation -d + t hold T period - D - t setup << permissible range  Timing is correct as long as the clock signals of sequentially adjacent FFs arrive within a permissible skew range W. Dai, UC Santa Cruz

41 ECE 260B – CSE 241A Clocking 41http://vlsicad.ucsd.edu “Useful Skew”  Design Robustness “0 0 0”: at verge of violation FF 2 ns 6 ns T = 6 ns “2 0 2”: more safety margin 40 -2 2 40  Design will be more robust if clock signal arrival time is in the middle of permissible skew range, rather than on edge W. Dai, UC Santa Cruz

42 ECE 260B – CSE 241A Clocking 42http://vlsicad.ucsd.edu Constraints on Skews  FF i receives clock signal delayed by x i  MIN_DEL l 0 <   1   : if nominal clock delay is x i, then actual clock delay must fall within interval  x i  x   x i l For FF to operate correctly when clock edge arrives at time x, the correct input data must be present and stable during the time interval (x – SETUP, x + HOLD) l For 1  i,j  L (#FFs), we compute lower and upper bounds MIN(i,j) and MAX(i,j) for the time that is required for a signal edge to propagate from FF i to FF j  Avoid double-clocking (race condition) l  x i + MIN(i,j)   x j + HOLD  Avoid zero-clocking l  x j + SETUP + MAX(i,j)   x j + P; P = clock period

43 ECE 260B – CSE 241A Clocking 43http://vlsicad.ucsd.edu Optimal Useful Skews by Linear Programming  LP_SPEED (clock period reduction): minimize P s.t.  x j -  x j  HOLD – MIN(i,j)  x i –  x j + P  SETUP + MAX(i,j) x i  MIN_DEL  LP_SAFETY (robustness): Maximize M s.t.  x j -  x j – M  HOLD – MIN(i,j)  x i –  x j – M  SETUP + MAX(i,j) – P x i  MIN_DEL  Notes -J. P. Fishburn, “Clock Skew Optimization”, IEEE Trans. Computers 39(7) (1990), pp. 945-951. -T. G. Szymanski, “Computing Optimal Clock Schedules”, Proc. DAC, June 1992, pp. 399-404. -Useful Skew optimization is similar to Retiming optimization -Peak current reductions are a side benefit

44 ECE 260B – CSE 241A Clocking 44http://vlsicad.ucsd.edu Outline  Problem Statement  Clock Distribution Structures  Robustness / Signal Integrity Control  Clock Design:  Skew Scheduling  Topology Design  Embedding l For zero skew (ZST-DME) l For bounded skew (BST-DME)

45 ECE 260B – CSE 241A Clocking 45http://vlsicad.ucsd.edu Zero-Skew Tree (ZST) Problem  Zero Skew Clock Routing Problem (S,G): Given a set S of sink locations and a connection topology G, construct a ZST T(S) with topology G and having minimum cost.  Skew = maximum value of |t d (s 0,s i ) – t d (s 0,s j )| over all sink pairs s i, s j in S.  T d = signal delay (from source s 0 )  Connection topology G = rooted binary tree with nodes of S as leaves l Edge e a in G is the edge from a to its parent l |e a | is the (assigned) length of edge e a  Cost = total edge length

46 ECE 260B – CSE 241A Clocking 46http://vlsicad.ucsd.edu Zero-Skew Example (555 sinks, 40 obstacles)

47 ECE 260B – CSE 241A Clocking 47http://vlsicad.ucsd.edu A Zero-Skew Routing Algorithm  Finds a ZST under linear delay model with minimum cost over all ZSTs with topology G and sink set S  Terms l Manhattan Arc: line segment with slope +1 or –1 l Tilted Rectangular Region (TRR): collection of points within a fixed distance of a Manhattan arc -Core = Manhattan arc -Radius = distance l Merging segment = locus of feasible locations for a node v in the topology, consistent with minimum wirelength -If v is a sink, then ms(v) = {v} -If v is an internal node, then ms(v) is the set of all points within distance |e a | of ms(a), and within distance |e b | of ms(b)

48 ECE 260B – CSE 241A Clocking 48http://vlsicad.ucsd.edu Phase 1: Tree of Merging Segments  Goal: Construct a tree of merging segments corresponding to topology G l Merging segment of a node depends on merging segment of its children  bottom-up construction l Let a, b be children of v. We want placements of v that allow TS a and TS b to be merged with minimum added wire while preserving zero skew l Merging cost = |e a | + |e b |  Fact: The intersection of two TRRs is also a TRR and can be found in constant time  Constant time per each new merging segment  linear time (in size of S) to construct entire tree

49 ECE 260B – CSE 241A Clocking 49http://vlsicad.ucsd.edu Phase 2: Find Node Placements  Goal: Find exact locations (“embeddings”) pl(v) of internal nodes v in the ZST topology  If v is the root node, then any point on ms(v) can be chosen as pl(v)  If v is an internal node other than the root, and p is the parent of v, then v can be embedded at any point in ms(v) that is at distance |e v | or less from pl(p) l Detail: create square TRR trr p with radius e v and core equal to pl(p); placement of v can be any point in ms(v)  trr p  Each instruction executed at most once for each node in G, and TRR intersection is O(1) time  Find_Exact_Placements is O(n)  DME is O(n)

50 ECE 260B – CSE 241A Clocking 50http://vlsicad.ucsd.edu Outline  Problem Statement  Clock Distribution Structures  Robustness / Signal Integrity Control  Clock Design:  Skew Scheduling  Topology Design  Embedding l For zero skew (ZST-DME) l For bounded skew (BST-DME)

51 ECE 260B – CSE 241A Clocking 51http://vlsicad.ucsd.edu Non-Zero Skew Bounds skew 0 246 2 4 6 0 246 2 4 6 v s4s4 v a b s1s1 s2s2 s3s3 Topology s0s0 b a  Given a skew bound, where can internal nodes of the given topology (e.g., a, b, v) be placed?

52 ECE 260B – CSE 241A Clocking 52http://vlsicad.ucsd.edu BST-DME Bottom-Up Phase s4s4 v a b s1s1 s2s2 s3s3 Topology s0s0 s1s1 s3s3 s4s4 s2s2 mr(a) mr(b) mr(v) B = 4 Bottom-Up: build tree of merging regions corresponding to given topology s0s0

53 ECE 260B – CSE 241A Clocking 53http://vlsicad.ucsd.edu BST-DME Top-Down Phase s4s4 v a b s1s1 s2s2 s3s3 Topology s0s0 s1s1 s3s3 s4s4 s2s2 a b v B = 4 s0s0

54 ECE 260B – CSE 241A Clocking 54http://vlsicad.ucsd.edu Good Luck for the Mid-Term!


Download ppt "ECE 260B – CSE 241A Clocking 1http://vlsicad.ucsd.edu ECE260B – CSE241A Winter 2005 Clocking Website: Slides."

Similar presentations


Ads by Google