2EE141Static CMOS CircuitsAt every point in time (except during the switchingtransients) each gate output is connected to eitherVDDorssvia a low-resistive path.The outputs of the gates assume at all times the valueof the Boolean function, implemented by the circuit(ignoring, once again, the transient effects duringswitching periods).
3Static Complementary CMOS EE141Static Complementary CMOSVDDIn1PMOS onlyIn2PUN…InNF(In1,In2,…InN)In1In2PDN…NMOS onlyInNOne and only one of the networks (PUN or PDN) is conducting in steady statePUN and PDN are logically dual logic networks
4NMOS Transistors in Series/Parallel Connection EE141NMOS Transistors in Series/Parallel ConnectionTransistors can be thought as a switch controlled by its gate signalNMOS switch closes when switch control input is high
5PMOS Transistors in Series/Parallel Connection EE141PMOS Transistors in Series/Parallel Connection
6Threshold Drops VDD VDD PUN VDD 0 VDD 0 VDD - VTn VGS CL CL PDN EE141Threshold DropsVDDVDDPUNSDVDDDS0 VDD0 VDD - VTnVGSCLCLPDNVDD 0VDD |VTp|Why PMOS in PUN and NMOS in PDN … threshold dropNMOS transistors produce strong zeros; PMOS transistors generate strong onesVGSCLCLDSVDDSD
10Complex CMOS Gate B C A D OUT = D + A • (B + C) A D B C EE141 Shown synthesis of pull up from pull down structureDBC
11Constructing a Complex Gate EE141Constructing a Complex GateLogic Dual need not be Series/Parallel DualIn general, many logical dual exist, need to choose one with best characteristicsUse Karnaugh-Map to find good dualsGoal: find 0-cover and 1-cover with best parasitic or layout propertiesMaximize connections to power/groundPlace critical transistors closest to output nodeAAD(b) Deriving the pull-up networkBhierarchically by identifyingsub-netsVB(c) complete gateDDVCDDCASN2SN3DSN4SN1C(a) pull-down networkFFDFCBADB
12Example: Carry Gate F = (ab+bc+ac)’ Carry ‘c’ is critical AB’1A’B’A’BF = (ab+bc+ac)’Carry ‘c’ is criticalFactor c out:F=(ab+c(a+b))’0-cover is n-pd1-cover is p-pu
13Example: Carry Gate (2) Pull Down is easy Order by maximizing connections to ground and critical transistorsFor pull up – Might guess series dual– would guess wrong
15Example: Carry Gate (4) Pull Up from 1 cover of Kmap Get a’b’+a’c’+b’c’Factor c’ out3 connections to Vdd2 series transistorsCo-Euler path layoutMoral: Use Kmap!
16Cell Design Standard Cells Datapath Cells General purpose logic EE141Cell DesignStandard CellsGeneral purpose logicCan be synthesizedSame height, varying widthDatapath CellsFor regular, structured designs (arithmetic)Includes some wiring in the cellFixed height and width
17Standard Cell Layout Methodology – 1980s EE141Standard Cell Layout Methodology – 1980sRoutingchannelVDDsignalsContacts and wells not shown.What does this implement??GND
18Standard Cell Layout Methodology – 1990s EE141Standard Cell Layout Methodology – 1990sMirrored CellNo RoutingchannelsVDDVDDM2Contacts and wells not shown.What does this implement??M3GNDMirrored CellGND
19Standard Cells Cell height 12 metal tracks EE141Standard CellsN WellCell height 12 metal tracksMetal track is approx. 3 + 3Pitch = repetitive distance between objectsCell height is “12 pitch”VDDOutIn2Rails ~10GNDCell boundary
20Standard Cells With minimal diffusion routing With silicided diffusion EE141Standard CellsWith minimal diffusion routingVDDWith silicided diffusionVDDOutInOutInGNDGND
22Stick Diagrams Contains no dimensions EE141Stick DiagramsContains no dimensionsRepresents relative positions of transistorsVDDVDDInverterNAND2OutOutInABGNDGND
23Stick Diagrams Logic Graph j VDD X i GND A B C PUN PDN A j C B EE141Stick DiagramsLogic GraphjVDDXiGNDABCPUNPDNAjCBX = C • (A + B)CiSystematic approach to derive order of input signal wires so gate can be laid out to minimize areaNote PUN and PDN are duals (parallel <-> series)Vertices are nodes (signals) of circuit, VDD, X, GND and edges are transitionsABABC
24Two Versions of C • (A + B) EE141Two Versions of C • (A + B)ACBABCVDDVDDXXLine of diffusion layout – abutting source-drain connectionsNote crossover eliminated by A B C orderingGNDGND
25Consistent Euler Path X C i VDD X B A j A B C GND EE141 A path through all nodes in the graph such that each edge is visited once and only once.The sequence of signals on the path is the signal ordering for the inputs.PUN and PDN Euler paths are (must be) consistent (same sequence)If you can define a Euler path then you can generate a layout with no diffusion breaksA B CC A BB C A no PDNB A CA C B -> no PDNC B AABCGND
26OAI22 Logic Graph X PUN A C D C B D VDD X X = (A+B)•(C+D) C D B A A B EE141OAI22 Logic GraphXPUNACDCBDVDDXX = (A+B)•(C+D)CDBAABPDNAGNDBCD
28Properties of Complementary CMOS Gates Snapshot EE141Properties of Complementary CMOS Gates SnapshotHigh noise margins:VandVare atVandGND, respectively.OHOLDDNo static power consumption:There never exists a direct path betweenVandDDV(GND) in steady-state mode.SSComparable rise and fall times:(under appropriate sizing conditions)
29CMOS Properties Full rail-to-rail swing; high noise margins EE141CMOS PropertiesFull rail-to-rail swing; high noise marginsLogic levels not dependent upon the relative device sizes; ratiolessAlways a path to Vdd or Gnd in steady state; low output impedanceExtremely high input resistance; nearly zero steady-state input currentNo direct path steady state between power and ground; no static power dissipationPropagation delay function of load capacitance and resistance of transistors
30Switch Delay Model Req A A B Rp A Rn CL Cint CL B Rn A Rp Cint A Rp A EE141Switch Delay ModelReqAABRpARnCLCintCLBRnARpCintARpARpARnCLNote capacitance on the internal node – due to the source grain of the two fets in series and the overlap gate capacitances of the two fets in seriesNOR2INVNAND2
31Input Pattern Effects on Delay EE141Input Pattern Effects on DelayDelay is dependent on the pattern of inputsLow to high transitionboth inputs go lowdelay is 0.69 Rp/2 CLone input goes lowdelay is 0.69 Rp CLHigh to low transitionboth inputs go highdelay is Rn CLARpBRpCLRnBARnCint
32Delay Dependence on Input Patterns EE141Delay Dependence on Input PatternsInput DataPatternDelay(psec)A=B=0167A=1, B=0164A= 01, B=161A=B=1045A=1, B=1080A= 10, B=181A=B=10A=1 0, B=1Voltage [V]A=1, B=10Gate sizing should result in approximately equal worst case rise and fall times.Reason for difference in the last two delays is due to internal node capacitance of the pulldown stack. When A transitions, the pullup only has to charge CL; when A=1 and B transitions pullup have to charge up both CL and Cint.For high to low transitions (first three cases) delay depends on state of internal node. Worst case happens when internal node is charged up to VDD – VTn.Conclusions: Estimates of delay can be fairly complex – have to consider internal node capacitances and the data patterns.time [ps]NMOS = 0.5m/0.25 mPMOS = 0.75m/0.25 mCL = 100 fF
33Transistor Sizing CL B Rn A Rp Cint B Rp A Rn CL Cint 4 2 2 1 EE141 Assumes Rp = Rn1
35Transistor Sizing a Complex CMOS Gate EE141Transistor Sizing a Complex CMOS GateAB8643C86D46OUT = D + A • (B + C)For class lecture.Red sizing assuming Rp = RnFollow short path first; note PMOS for C and B 4 rather than 3 – average in pull-up chain of three – (4+4+2)/3 = 3Also note structure of pull-up and pull-down to minimize diffusion cap at output (e.g., single PMOS drain connected to output)Green for symmetric response and for performance (where Rn = 3 Rp)Sizing rules of thumbPMOS = 3 * NMOS1 in series = 12 in series = 23 in series = 3etc.A2D1B2C2
36Fan-In Considerations EE141Fan-In ConsiderationsABCDCLADistributed RC model(Elmore delay)tpHL = 0.69 Reqn(C1+2C2+3C3+4CL)Propagation delay deteriorates rapidly as a function of fan-in – quadratically in the worst case.C3BC2CWhile output capacitance makes full swing transition (from VDD to 0), internal nodes only transition from VDD-VTn to GNDC1, C2, C3 on the order of 0.85 fF for W/L of 0.5/0.25 NMOS and 0.375/0.25 PMOSCL of 3.2 fF with no output load (all diffusion capacitance – intrinsic capacitance of the gate itself).To give a 80.3 psec tpHL (simulated as 86 psec)C1D
37tp as a Function of Fan-In EE141tp as a Function of Fan-IntpHLquadraticlineartpGates with a fan-in greater than 4 should be avoided.tp (psec)tpLHFixed fan-out (NMOS 0.5 micrcon, PMOS 1.5 micron)tpLH increases linearly due to the linearly increasing value of the diffusion capacitancetpHL increase quadratically due to the simultaneous incrase in pull-down resistance and internal capacitancefan-in
38tp as a Function of Fan-Out EE141tp as a Function of Fan-OutAll gates have the same drive current.tpNOR2tpNAND2tpINVtp (psec)Slope is a function of “driving strength”slope is a function of the driving strengtheff. fan-out
39tp as a Function of Fan-In and Fan-Out EE141tp as a Function of Fan-In and Fan-OutFan-in: quadratic due to increasing resistance and capacitanceFan-out: each additional fan-out gate adds two gate capacitances to CLtp = a1FI + a2FI2 + a3FOa1 term is for parallel chain, a2 term is for serial chain, a3 is fan-out
40Practical Optimization The previous arguments regarding tp raise the question – why build nor at all?Criticality is not a path– but a transition so it is usually only on rising or falling (but not both)NOR forms have bad pull-up but good pull downNAND forms have bad pull-down but good pull upDetermine the critical transition(s) and design for them– using Elmore or Simulation on the appropriate edge only!Logical Effort presupposes uniform rise and fall times, so good in general, but can be beatStatic Timing Analyzers nearly always get this wrong!
41Fast Complex Gates: Design Technique 1 EE141Fast Complex Gates: Design Technique 1Transistor sizingas long as fan-out capacitance dominatesProgressive sizingCLDistributed RC lineM1 > M2 > M3 > … > MN(the fet closest to theoutput is the smallest)InNMNM1 have to carry the discharge current from M2, M3, … MN and CL so make it the largestMN only has to discharge the current from MN (no internal capacitances)C3In3M3C2In2M2Can reduce delay by more than 20%; decreasing gains as technology shrinksC1In1M1
42Fast Complex Gates: Design Technique 2 EE141Fast Complex Gates: Design Technique 2Transistor orderingcritical pathcritical path01CLCLchargedcharged1In1In3M3M31C21C2In2In2M2dischargedM2chargedFor lecture.Critical input is latest arriving signalPlace latest arriving signal (critical path) closest to the output1C1C1In3dischargedIn1chargedM1M101delay determined by time to discharge CL, C1 and C2delay determined by time to discharge CL
43Fast Complex Gates: Design Technique 3 EE141Fast Complex Gates: Design Technique 3Alternative logic structuresF = ABCDEFGHReduced fan-in -> deeper logic depthReduction in fan-in offsets, by far, the extra delay incurred by the NOR gate (second configuration).Only simulation will tell which of the last two configurations is faster, lower power
44Fast Complex Gates: Design Technique 4 EE141Fast Complex Gates: Design Technique 4Isolating fan-in from fan-out using buffer insertionCLCLReduce CL on large fan-in gates, especially for large CL, and size the inverters progressively to handle the CL more effectively
45Fast Complex Gates: Design Technique 5 EE141Fast Complex Gates: Design Technique 5Reducing the voltage swinglinear reduction in delayalso reduces power consumptionBut the following gate may be much slower!Large fan-in/fan-out requires use of “sense amplifiers” to restore the signal (memory)tpHL = 0.69 (3/4 (CL VDD)/ IDSATn )= 0.69 (3/4 (CL Vswing)/ IDSATn )
46Sizing Logic Paths for Speed Frequently, input capacitance of a logic path is constrainedLogic also has to drive some capacitanceExample: ALU load in an Intel’s microprocessor is 0.5pFHow do we size the ALU datapath to achieve maximum speed?We have already solved this for the inverter chain – can we generalize it for any type of logic?
47Buffer Example In Out CL 1 2 N (in units of tinv) EE141Buffer ExampleInOutCL12N(in units of tinv)For given N: Ci+1/Ci = Ci/Ci-1To find N: Ci+1/Ci ~ 4How to generalize this to any logic path?
48EE141Logical Effortp – intrinsic delay (3kRunitCunitg) - gate parameter f(W)g – logical effort (kRunitCunit) – gate parameter f(W)f – effective fanoutNormalize everything to an inverter:ginv =1, pinv = 1Divide everything by tinv(everything is measured in unit delays tinv)Assume g = 1.
49Delay in a Logic Gate Gate delay: d = h + p effort delay EE141Delay in a Logic GateGate delay:d = h + peffort delayintrinsic delayEffort delay:h = g flogical efforteffective fanout = Cout/CinLogical effort is a function of topology, independent of sizingEffective fanout (electrical effort) is a function of load/gate size
50EE141Logical EffortInverter has the smallest logical effort and intrinsic delay of all static CMOS gatesLogical effort of a gate presents the ratio of its input capacitance to the inverter capacitance when sized to deliver the same currentLogical effort increases with the gate complexity
51EE141Logical EffortLogical effort is the ratio of input capacitance of a gate to the inputcapacitance of an inverter with the same output currentg = 1g = 4/3g = 5/3
52Logical Effort of Gates EE141Logical Effort of GatestpNANDg =p =d =pINVtNormalized delay (d)g =p =d =F(Fan-in)1234567Fan-out (h)
59Method of Logical Effort EE141Method of Logical EffortCompute the path effort: F = GBHFind the best number of stages N ~ log4FCompute the stage effort f = F1/NSketch the path with this number of stagesWork either from either end, find sizes: Cin = Cout*g/fReference: Sutherland, Sproull, Harris, “Logical Effort, Morgan-Kaufmann 1999.
60Example: Optimize Path EE141Example: Optimize Pathg = 1 f = ag = 5/3 f = b/ag = 5/3 f = c/bg = 1 f = 5/cEffective fanout, F =G =H =h =a =b =
61Example: Optimize Path EE141Example: Optimize Pathg = 1 f = ag = 5/3 f = b/ag = 5/3 f = c/bg = 1 f = 5/cEffective fanout, F = 5G = 25/9H = 125/9 = 13.9h = 1.93a = 1.93b = ha/g2 = 2.23c = hb/g3 = 5g4/f = 2.59
65Homework 5Using the Carry cell design from earlier homework, optimally size the carry propagate chain for a 16-bit adder to minimize worst case delay where Cin is driven by a 1u/0.6u inverter and Cout drives a fanout of 4 such loads. (use logic effort, show your work!)For the problems below, use parameters from class for 0.5um and use 2x voltages as applicable. Chap 5: problems: 4, 7, 8, 15Chap 6, problems: 2, 4, 5, 7Design the parity tree: c = a xor b xor c xor d in Complementary Pass Transistor Logic, insert inverters to restore the output swing – Given input drive from an inverter stage, and an inverter every 2 stages of logic, and inverter output restore, estimate the propagation time for devices using the AMI 0.5um model.New (digital) AMI model (for minimum length only!):n-channel: VT=0.77, l=0.03, Vsat=1.56V, k=32mA/V2p-channel: VT=-0.95, l=0.03 Vsat=2.8V, k=-16mA/V2