Download presentation

Presentation is loading. Please wait.

Published byTony Douberly Modified about 1 year ago

1
Source: Advanced ASIC Chip Synthesis. 2 nd Ed. Himanshu Bhatnagar. Kluwer Academic Publishers Key Problem: Timing assumption during prelayout synthesis widely differs from the post layout reality. This happens because the interconnect delay dominates the overall propagation delay in DSM (Deep Sub-Micron) technologies. As a result getting a timing closure becomes a challenge. Traditional SOC Design Flow

2
Develop HDL filesSpecify Libraries Library Objects link_library target_library symbol_library synthetic_library Read Design analyze elaborate read_file Set Design Constraints Design Rule Constraints set_max_transition set_max_fanout set_max_capacitance Design Optimisation Constraints Create_clock set_clock_latency set_propagated_clock set_clock_uncertainty set_clock_transition set_input_delay set_output_delay set_max_area Select Compile Strategy Top Down Bottom Up Optimize the Design Compile Analyze and Resolve Design Problems Check_design Report_area Report_constraint Report_timing Save the Design database write Define Design Environment Set_operating_conditions Set_wire_load_model Set_drive Set_driving_cell Set_load Set_fanout_load Set_min_library

3
Design Compiler Setup Files.synopsys_dc.setup – Library paths – Company wide, project wide design environment related variables and commands – UNIX variables Three files at three locations. All three are read in the following order – Synopsys root - $SYNOPSYS/admin/setup Affects all users. Only system adminstrator can modify this. In small startups with only single ASIC project, this serves as the place to enforce project wide discipline. – Home Directory Content affects all DC activities. Project wide enforcement could happen at these level if the designer is involved in a single project (less likely). – Working Directory Affects the current invocation of DC. If a person is working on more than one Synopsys projects (more likely), then the project wide enforcement should happen at this level. One working directory for each project. Repeated commands are overridden

4
Libraries & Search Path Technology Library Created by ASIC vendor in Synopsys format – which is now an open standard. Cells are defined by their names, function, timing, net delay, parasitic information, units for time, resistance, capacitance etc. Target Library a technology library that Design Compiler maps to during optimization. Link Library The technology library that contains the definition of the cells used in the mapped design. In principle should be the same as target_library unless a technology translation is being performed. Symbol Library Definition of graphics symbols. Cells in Symbol Library must match DesignWare Library A DesignWare component library is a collection of reusable circuit-design building blocks that are tightly integrated into the Synopsys synthesis environment. GTECH Library The GTECH library is the Synopsys generic technology library. It is technology-independent and included with Design Compiler software. GTECH parts are Synopsys unmapped representations of Boolean functions (library cell placeholders). GTECH instantiation allows for a technology-independent HDL description and the accuracy of instantiation. Search_path If the library variables only specify file names, search_path is used to locate libraries. By default points to current working directory and $SYNOPSYS/libraries/syn

5
Synopsys Design Objects Design A circuit that performs one or more logical functions Cell An instance of a design or library primitive within a design Reference The name of the original design that a cell instance points to Port The input or output of a design Pin The input or output of a cell Net A wire that connects ports to ports or ports to pins Clock A timing reference object to describe a waveform for timing analysis

6
Synopsys Design Objects - Schematic

7
Synopsys Design Objects - VHDL

8

9
Reading Assignment Read about these commands from Synopsys Documentation Find and Filter Read / Analyze / Elaborate Compile Report_timing Also read about what are Attributes and Variables

10
Outline of this course module Synopsys Design Environment Essentials CMOS essentials for logic synthesis Constraint Classification Load and Drive Constraints Clocking constraints Operating Conditions Constraints Static Timing Analysis Chip Level Timing and Multiple Clock Domains

11
MOSFET Transistor Source: MIT. Course Lecture L

12
Key qualitative Characteristics of MOSFET transistors Source: MIT. Course Lecture L

13

14

15
RC Model of an inverter Source: MIT. Course Lecture L

16

17

18

19

20
Wires Source: MIT. Course Lecture L

21
Distributed RC wire model Source: MIT. Course Lecture L This is also known as Elmore Delay model

22
Manual insertion of Repeaters Source: MIT. Course Lecture L

23
Lumped RC wire model Source: MIT. Course Lecture L

24
Estimate the rise time Source: MIT. Course Lecture L

25
1.Width of transistor is found by multiplying the scaling factor (16/8/2/1) with the minimum width of transistor which is 0.5 m. 2.Multiply C g,N /C g,P /C d,N /C d,P with the width of the transistor to get the drain/gate capacitances for P and N transistors. 3.Wider transistor more capacitance 1.Divide R eff,N /R eff,P with the width of the transistor to get the Resistance for the N and P transistors. 2.Wider Transistor Less resistance The sheet resistance (0.07) is for unit square. Since the wire width is 0,25 m. resistance for 1 m X 0.25 m wire is 0.07/0.25. This factor is multiplied by the length 250 m The wire capacitance is made up of two parts: Bottom (area) capacitance found using 250 X 0.25 (area) X C A,M2. Side capacitance is found by multiplying length 250 XC L,M32 The factor 2.2 comes from 90% Vdd swing log e (0.9V dd / 0.1V dd )

26
Technology, Operating and Manufacturing Constraints – Max rise time, max capacitance – Operating Conditions – Vdd, Temperature Drive current, Load – Process Variations Fast corner, Slow corner – Physical Design Antenna rules Optimisation Constraints – Performance – clock – Area – Power Constraints

27
Generic Synthesis Flow Create a solution Evaluate the solution Analysis Constraints Met Design Optimisation Constraints Technology, Operating & Manufacturing Constraints

28
Static Timing Analysis (STA) Exhaustively verifies that – the timing constraints (clock) are met for a design – for given technology (Standard Cell Library) and – a set of specified operating conditions Limitations of the alternative – Simulation – Not Exhaustive – Accuracy RTL Gate Level – SDF back annotation – Dependent on STA Circuit Level SPICE simulation are impractical – Time (STA also takes time, but is bounded) PROCESS (clk) BEGIN IF rising_edge (clk) THEN s <= a * b; END IF; END

29
Timing Models - Accuracy Untimed Transaction Level - SystemC – Multiple Cycles – Bus Transactions, Transmit/Receive, Encode/Decode Cycle Accurate – RTL – What happens in each clock cycle is accurately known Gate Level – Event Driven – Physical details of computation, storage and interconnect operations known – Delay in wire is not known – Clock is ideal Layout Level – Delay in wire known – Clock is real – Relative position of standard cell is known

30
Delay Parameters – Intrinsic Delay & Slew A=1 B Z V dd 0.5V dd t1t2 P Q R B y z x Z t1t2 0.3V dd 0.7V dd V dd

31
Path Delay Calculation Library and Design Delay Computation Through Gate Delay Computation Through Wire Delay and Slew At Gate Output Delay and Slew At Next Gate Input D B A C Environment Conditions for Analysis The intrinsic delays and the slews are characterised using SPICE simulation by sweeping many parameters that affects the Intrinsic delay and Slew All the paths are exhaustively covered

32
Paths & Path Groups Paths Start point: Input ports or clock pins of sequential devices and End point: Output ports or Data input pins of sequential devices. Path groups Paths are organised in groups identified by clocks controlling their endpoints.

33
Timing Arcs positive unate timing arc: Combines rise delays with rise delays, and fall delays with fall delays. An example is an AND gate cell delay or an interconnect (net) delay. negative unate timing arc: Combines incoming rise delays with local fall delays, and incoming fall delays with local rise delays. An example is a NAND gate. nonunate timing arc: Combines local delay with the worst-case incoming delay value. Nonunate timing arcs are present in logic functions whose output value change cannot be predicted by the direction of the change on the input value. An example is an XOR gate. Accuracy of estimates is critical Intrinsic Delays are accurate after logic synthesis Slew and Net Delays are estimated and known accurately only after physical synthesis

34
Factors Affecting Delay and Slew Discrete Factors: 1.Geometry & Dimension 2.Specific Path 3.Transition Direction 4.Related Pin A B P1P2 N1 N2 Z 4 Input NAND gate

35
Factors Affecting Delay and Slew Load on the Gate Load of all the inputs that this output has to drive Load of the interconnect wires Tri-stated wires Input Slew Transition time at the previous gate The interconnect Primary input – drive strength, driver cell

36
Constraints Technology Constraints Max Transition Max Fanout Max Capacitance Min Capacitance Design Constraints Set Load Set Drive (inverse of resistance)

37
If drive or driving cell is not specified, the synthesis tool assumes infinite drive strength If load is not specified, the synthesis tool assumes zero load Technology Constraint; Cannot be relaxed Design Constraint

38
Interpolation and Extrapolation Slew Load S1S2 L1 L2 D11 D12 D21 D22 L S D1 D2 D Piece Wise Linear Model

39
Process, Voltage, Temperature (PVT) Variation & Operating Conditions Process Delay best nominal worst Voltage Delay best nominal worst Delay best nominal worst Temperature Operating Conditions NameLibraryProcessTempVoltInterconnect Model WCCOMmy_lib worst_case_tree WCINDmy_lib worst_case_tree WCMILmy_lib worst_case_tree BCCOMmy_lib best_case_tree BCINDmy_lib best_case_tree BCMILmy_lib best_case_tree

40
PVT Variation: An Example Now consider the variation in the following parameters: 25 % variation in Threshold voltage – V t 10 % variation in transconductance k’ n mainly due to variation in oxide thickness. ±0.15 m (about 10 %) variation in W and L. Variations in W and L are uncorrelated as they are ±0.5V (10%) variation in power supply voltage Speed of device is proportional to the drain current and can thus result in variation of the speed of the circuit. Consider a minimum size NMOS device in a 1.2 m CMOS process. V GS =V DS = 5V The nominal saturation current for the device size W = 1.8 m, L eff = 0,9 um

41
Derating Libraries are characterized for various operating conditions Further characterisation is done to see how the delay model responds to change in process, voltage and temperature. This is done by holding two parameters constant and sweeping the third. This yields derating factors for Process, Voltage and Temperature

42
Sequential Arcs Timing relationship between 1.two input pins 2.two consecutive events on the same input pin 1.Pulse Width 2.Setup 3.Hold 4.Recovery 5.Removal

43
Pulse Width rst_n Pulse Width Requirement Not met. Reset may have no effect 1.Width of High and low phases of clocks 2.Width of Active level of asynchronous inputs like reset

44
Setup clk Setup Requirement Not met. New data may not get latched data Data should be stable setup time before the arrival of clock edge. What happens if the setup time is violated ?

45
Hold clk Hold Requirement Not met. Old data may not get latched data Data should be stable hold time after the arrival of clock edge. What happens if the Hold time is violated ?

46
Recovery and Removal rst_n Recovery Requirement Not met. clk may not have effect clk Removal Requirement Not met. clk may override rst_n rst_n Minimum time between de-assertion of an asynchronous control signal and the next active clock edge Minimum time between an active clock edge that an asynchronous control signal should remain asserted Can be formulated as a setup check Can be formulated as a hold check

47
What is the reason for setup and hold Vin 1 Vout 1 Vin 2 Vout 2 c ba a b c Vin 1, Vout 2 Vin 2, Vout 1 V in1 = V out2 V in2 = V out1

48
Transistor Level Schematic of a D-Flop

49
Working of the D-Flop work at Transistor Level

50
Setup and Hold Time at Circuit Level The time it takes data D to reach node Z is called the setup time. The time it takes data D to reach node W is called the hold time.

51
Negative Hold Time

52
Generalizing Setup & Hold Constraints data clk F1 Delay D1 Delay C1 Boundary of the Flop 1.Assume C1 is zero 2.clk reaches F1 before data has arrived at F1 and registers wrong data 3.To avoid this, data should stabilize D1 time before the arrival of clk. 4.In reality, C1 is never zero, so data should stabilize D1-C1 time before the arrival of clk. 5.As there are multiple D1 paths and multiple C1 paths, the complete and safe setup constraint is max (data path delays) – min (clock path delays) Setup Constraint 1.Assume D1 is zero 2.Data reaches F1 before clk has arrived at F1. When the clk arrives, new data has overwritten the previous data. 3.To avoid this, data should remain stable C1 time after the arrival of clk. 4.In reality, D11 is never zero, so data should remain stable C1-D1 time after the arrival of clk. 5.The complete and safe hold constraint is max (clock path delays) – min (data path delays) Hold Constraint

53
Negative Hold clk Negative Hold – Seen At Device Interface At Device Interface clk At Latching Element data StableNew StableNewdata clk F1 Delay D1 Delay C1 Boundary of the Flop 1.Typically clock paths are well buffered and faster 2.There can be substantial data path delay, especially in scan flops 3.max (data path delays) – min (clock path delays) is always positive. This implies that Setup constraint is never negative 4.max (clock path delays) – min (data path delays) can be negative. This implies that Hold constraint can be negative Setup + Hold (cannot be negative) = Max(clock path) + Max(data path) – Min(clock path) – Min(data path)

54
Specifying Input Delay set_input_delay -clock Clock 8 “data_in_2” Good design practice mandates that inBlock does not have a combinatorial logic (”m”) driving output These days ”m” is more likely to be the result of global interconnect delay. Early floorplanning is a good way to estimate the delay due to ”m” If floorplanning is not done a good bet is 50-60% of the clock cycle Characterize command automatically calculates input delay from parent design

55
Specifying Output Delay set_output_delay -clock Clk -max -fall 10 {"Z " "Z "}

56
General Timing Constraints I1 clk F1 C1 F3F2 C0 C2 C3O1 C4 I2 O2 Four kinds of path groups exist: 1.Input to Output, e.g., I2 to O2 2.Input to Register, e.g, I1 to F1 3.Register to Register F1 to F2 4.Register to Output F3 to O1 TI1, TI2 are input delays DQ1, DQ2 and DQ3 are clk-to-Q delays S1, S2 and S3 are setup constraints H1, H2 and H3 are hold constraints C0-C3 combinatorial delays P is the clock Period O2 = TI2 + C4 TI1 + C0 ≤ P – S1 TI1 + C0 ≥ H1 Setup Slack: P- S1- TI1- C0 Hold Slack: TI1 + C0 - H1 Setup and Hold Slacks should be positive DQ1 + C1 ≤ P – S2 DQ2 + C1 ≥ H2 Setup Slack: P - S2 - DQ2 - C1 Hold Slack: DQ2 + C1 – H2

57
Gate Level Simulation Gate Level Design Simulator Timing Analysis Tool Simulation LibraryTiming Library SDF File

58
Clock Distribution Source: MIT. Course Lecture L

59
Clock Skew The basic assumption in synchronous system is that all the sequential elements in the design sample their input at the same time, marked by a clock signal. In reality, the clock signal does not arrive at the sequential elements at the same time. The difference in time between the reference clock signal and the local clock signal at a sequential element is called the clock skew. In fact clock skew would not be a problem if the clock signal was uniformly delayed at all the sequential elements. It is the non-uniform delay of the clock signal that creates the problem. The delay depends on the distance of the sequential element from the clock source and the local load. The primary reason for the delay is the large amount of load seen by the clock signal. The load consists of all the sequential elements in the design and clock net itself which behaves as a distributed RC line (or higher order models ) and can be several cms long in a large chip. The total capacitance of a single clock line easily measures hundreds of pF and can easily reach into nF range. The total clock capacitance of the Alpha processor equals 3.25 nF, which is 40% of the total switching capacitance of the entire chip. Clock Skew in Alpha Processor

60
Clock Skew Source: MIT. Course Lecture L

61
Clock Jitter Source: MIT. Course Lecture L

62

63
Clock Skew and Sequential Circuit Performance Each synchronous module is composed of combinational logic CL and a Flop and is characterised by six timing parameters: The min. and max. propagation(pg) delays of the register: t r,min, t r,max and combinational logic: t l,min, t l,max. The propagation delay of the interconnect t i and the local clock skew t . The max pg. delay corresponds to the time taken by the slowest output to respond to any transition at input. This delay constraints the max. allowable clock speed. The min pg. delay corresponds to the time taken by atleast one output to start responding to a transition at input. This delay is typically much smaller than the max delay and determines the amount of skew a circuit can tolerate before race condition occurs. If is greater t r,min + t i + t l,min than inputs at R2 can change before the previous inputs are latched. t ” t ’ + t r,min + t i + t l,min OR t r,min + t i + t l,min t ” + T t ’ + t r,max + t i + t l,max OR T t r,max + t i + t l,max -

64
Positive and Negative Clock Skew Positive Skew: > 0: In this case the clock is routed in the same direction as the data and the first equation needs to be satisfied. Violating it will result in malfuntioning of circuit. Observe that slowing down the clock period does not help. The positive skew actually helps improve the clock speed as it is a negative factor in the constraint on clock period T. Negative Skew: < 0: The negative skew occurs when the data is routed in the direction opposite to the clock signal. The first equation is unconditionally satisfied and the circuit works correctly independent of the skew. Unfortunately, negative skew will limit the clock speed and thus lower the performance, as predicted by the second equation: the skew reduces the time available for computation by | |.

65
Launch Clock Setup time metHold time met 0 a b c 0 a b cd Capture Clock a b d 0

66
Launch Clock Setup time violatedHold time violated a 0 b cd Capture Clock a’ b’ d 0 a b c 0

67
Launch Clock 0 a b Setup time violatedHold time met c 0 a b cd Capture Clock d 0

68
Setup Violations result from worst case timing Hold Violations result from best case timing FF 1 logic FF 2 logic startpoint endpoint setup relationship hold relationship

69
Chip Level Timing Issues 1 CGU Blocks 4 & 8 communicate and need their clocks to be skew alligned The data signals between Blocks 4 & 8 could take more than one clock cycle and can get routed through blocks 5 and 6 This makes chip level timing closure difficult and sensitive to geometry. A hierarchical design style, where each chiplets are timing closed independently and chip can be composed from such chiplets. Solution: Latency insensitive design.

70
Categories of Synchronization Clock Based Data Based GS GALS Double Latch Handshake: 2 Phase, 4 Phase Asynchronous – 2 Clock FIFO Clock based synchronization Data based synchronization Constraints Complexity Latency ambiguity GRLS (KTH Technology)

71
Send and Forget – Double Latching ACL SD CLK s CLK D PDPD PSPS D Q D Q PsPs PDPD Source Destination ACL: Asynchronous Communication Link

72
Send and Forget – Double Latching Advantages Good choice for single bit control data Grey coded multi bit data payloads are also target Disadvantages No Flow Control Send and Forget Metastable signal to multiple targets could resolve to different values

73
Handshake ACL Asynchronous Communication Link ACL SD CLK s CLK D PDPD PSPS RSRS ASAS RDRD ADAD D Q D Q ASAS QDQD PsPs PDPD FSM ADAD RDRD RSRS D Q CLK s CLK D P s : Source Payload P d : Destination Payload

74
Data payload frequency must be less than the worst-case round trip delay of the flow control 2-phase 3T s + 3T d ≥ T Ps 4 phase 6T s + 6T d ≥ T Ps Example: Source: 27 MHz, Destination: 200 MHz Maximum isochronous data rate using 2 phase protocol 3*(37nS) + 3*(5nS) = 126 ns = 7.9 MHz

75
Data payload frequency must be less than the worst-case round trip delay of the flow control 2-phase 3T s + 3T d ≥ T Ps 4 phase 6T s + 6T d ≥ T Ps Example: Source: 27 MHz, Destination: 200 MHz Maximum isochronous data rate using 2 phase protocol 3*(37nS) + 3*(5nS) = 126 ns = 7.9 MHz 3T s + 3T d T Ps 6T s + 6T d T Ps The period for which data remains valid/asserted 2-phase 3T s + 3T d 4-phase 6T s + 6T d 1.Note that T Ps does not decide data payload frequency. T Ps is less than the round trip delay to enable the next payload to be transferred immediately after the round trip delay is over. 2.The period (T PL )corresponding to the data payload frequency has to be more than the worst case round trip delay i.e. 3T s + 3T d ≤ T PL and 6T s + 6T d ≤ T PL for 2 and 4 phase protocols respectively. This is illustrated in the example below

76
2 Clock Asynchronous FIFO Fail Safe, Self Correcting: Write logic could think the FIFO is full when it is not Read logic could think that the FIFO is empty when it is not Not suitable for Island hopping: Storage in Write Island is a problem Typically the read side needs to be read every cycle

77
GALS Globally Asynchronous Locally Synchronous Source: ETH, Zurich

78
GALS

79
Clocking and Communication Schemes Synchronous Design – phase and skew alligned Mesochronous Design – same clk freq and phase alligned Ratiochronous Design Different Clock freqs but have rational relationship – phase alligned KTH research Pleisochronous – No rational clock relationship – phase relationship drifts Asynchronous

80
Ideal vs Real Clock During the initial phase of synthesis clock is ideal set_auto_disable_drc_nets command should be used to prevent DC from wasting time on fixing DRC violations on high fanout nets like Resets and Clocks Model skew and jitter effects using the set_clock_uncertainity command Model clock network latency using set_clock_latency command Once clock tree has been inserted use the set_propagated_clock command to use the actual clock. Back annotation using read_sdf command is required

81
Modelling Clock Skew

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google