2Fast Complex Gates: Design Technique 1 Transistor sizingas long as fan-out capacitance dominatesProgressive sizingCLDistributed RC lineM1 > M2 > M3 > … > MN(the fet closest to theoutput is the smallest)InNMNM1 have to carry the discharge current from M2, M3, … MN and CL so make it the largestMN only has to discharge the current from MN (no internal capacitances)C3In3M3C2In2M2Can reduce delay by more than 20%;C1In1M1
3Fast Complex Gates: Design Technique 2 Transistor orderingcritical pathcritical path01CLCLchargedcharged1In1In3M3M31C21C2In2In2M2dischargedM2chargedFor lecture.Critical input is latest arriving signalPlace latest arriving signal (critical path) closest to the output1C1C1In3dischargedIn1chargedM1M101delay determined by time to discharge CL, C1 and C2delay determined by time to discharge CL
4Fast Complex Gates: Design Technique 3 Alternative logic structuresF = ABCDEFGHReduced fan-in -> deeper logic depthReduction in fan-in offsets, by far, the extra delay incurred by the NOR gate (second configuration).Only simulation will tell which of the last two configurations is faster, lower power
5Fast Complex Gates: Design Technique 4 Isolating fan-in from fan-out using buffer insertionCLCLReduce CL on large fan-in gates, especially for large CL, and size the inverters progressively to handle the CL more effectively
6Fast Complex Gates: Design Technique 5 Reducing the voltage swinglinear reduction in delayalso reduces power consumptionBut the following gate is much slower!Or requires use of “sense amplifiers” to restore the signal level (memory design)tpHL = 0.69 (3/4 (CL VDD)/ IDSATn )= 0.69 (3/4 (CL Vswing)/ IDSATn )
7Sizing Logic Paths for Speed Frequently, input capacitance of a logic path is constrainedLogic also has to drive some capacitanceExample: ALU load in an Intel’s microprocessor is 0.5pFHow do we size the ALU datapath to achieve maximum speed?We have already solved this for the inverter chain – can we generalize it for any type of logic?
8Buffer Example In Out CL 1 2 N (in units of tinv) For given N: Ci+1/Ci = Ci/Ci-1To find N: Ci+1/Ci ~ 4How to generalize this to any logic path?
9Logical Effortp – intrinsic delay (3kRunitCunitg) - gate parameter f(W)g – logical effort (kRunitCunit) – gate parameter f(W)f – effective fanoutNormalize everything to an inverter:ginv =1, pinv = 1Divide everything by tinv(everything is measured in unit delays tinv)Assume g = 1.
10Delay in a Logic Gate Gate delay: d = h + p effort delay intrinsic delayEffort delay:h = g flogical efforteffective fanout = Cout/CinLogical effort is a function of topology, independent of sizingEffective fanout (electrical effort) is a function of load/gate size
11Logical EffortInverter has the smallest logical effort and intrinsic delay of all static CMOS gatesLogical effort of a gate presents the ratio of its input capacitance to the inverter capacitance when sized to deliver the same currentLogical effort increases with the gate complexity
12Intrinsic DelayInverter has the smallest intrinsic delay and of all static CMOS gatesIntrinsic delay of a gate presents the ratio of its output capacitance to the inverter output capacitance when sized to deliver the same currentIntrinsic delay increases with the gate complexity
13Logical EffortLogical effort is the ratio of input capacitance of a gate to the inputcapacitance of an inverter with the same output currentg =p= 1g = 4/3, p=2g = 5/3, p=2
19Example – 8-input AND Fan out is not known here g=10/3 g=1 p=8 p=1 Logical effortsIntrinsic delaysg=2 g=5/3p=4 p=2g=4/3 g=5/3 g=4/3 g=1p= p= p= p=1Fan out is not known here
20Example: Optimize Path g1 = 1 f1 = ag2/g1g2 = 5/3 f2 = bg3/ag2g3 = 5/3 f3 = cg4/bg3g4 = 1 f 4= 5/cg4=Output loadInput loadStage fan-out is fi and a,b,c are scale factors comparing a gate size to the minimum size gate with the same speed as inverterEffective fanout, F = 5 Path electrical effort: F = Cout/CinG = Path logical effort : G = g1g2…gNH = Path effort: H = GFBh = Stage effort: hi = gifia =b =c =
21Example: Optimize Path g1 = 1 f1 = ag2/g1g2 = 5/3 f2 = bg3/ag2g3 = 5/3 f3 = cg4/bg3g4 = 1 f 4= 5/cg4Stage fan-out is fi and a,b,c are scale factors comparing a gate size to the minimum size gate with the same speed as inverterEffective fanout, F = 5G = 25/9H = 125/9 = (since no branching here then B=1, H=GFB)h = (this is the optimum effort for each gate h=H1/4)a = h/g2= (from h=f1g1=1.93 and f1=ag2/g1)b = ha/g3 = (same as h=f2g2=1.93 and f2= bg3/ag2)c = hb/g4 = (same as h=f3g3=1.93 and f3=cg4/bg3)
22Method of Logical Effort Compute the path effort: H = GBFFind the best number of stages N ~ log4HCompute the stage effort h= H1/NSketch the path with this number of stagesWork from either end, find sizes: Cin = Cout*g/hReference: Sutherland, Sproull, Harris, “Logical Effort, Morgan-Kaufmann 1999.
23Ratio Based LogicVDDSSPDNIn123FRLLoadResistiveDepletionPMOS(a) resistive load(b) depletion load NMOS(c) pseudo-NMOST< 0Goal: to reduce the number of devices over complementary CMOS
24Ratio Based Logic V PDN In F R Load Resistive N transistors + Load • V DDSSPDNIn123FRLLoadResistiveN transistors + Load• VOH= VOL=DN+ R• Asymmetrical response• Static power consumption•• tpLH= 0.69 RCVDD
25Ratio Based Logic Problems Problems with Resistive LoadIL = (VDD – Vout )/ RLCharging current drops rapidly once Vout starts to riseSolution: Use a current source!Available current is independent of voltageReduces tpLH by 25%
26Active Loads V In F PDN Depletion Load PMOS depletion load NMOS DDSSIn123FPDNDepletionLoadPMOSdepletion load NMOSpseudo-NMOST< 0
27Active Loads Depletion mode NMOS load VGS = 0 IL ~ (kn, load / 2) (|VTn|)2Deviates from ideal current sourceChannel length modulationBody effectVSB varies with Voutreduces |VTn|, hence IL gets smaller for increasing Vout
28Active Loads Pseudo-NMOS load No body effect, VSB = 0V VGS = - VDD , higher load currentIL = (kp / 2) (VDD - |VTn|)2Larger VGS causes pseudo-NMOS load to leave saturation mode sooner than NMOS
33Improved Loads (1)For fast low-to-high transition in standby circuits
34Improved Loads (2) Differential Cascode Voltage Switch Logic (DCVSL) DDSSPDN1OutPDN2ABM1M2Differential Cascode Voltage Switch Logic (DCVSL)Have no static currentRequires that each gate generates both Out and its complement
39NMOS-Only Logic In Out x [V] e g a t l o V Time [ns] 3.0 2.0 1.0 0.0 0.511.52Time [ns]
40NMOS-only Switch V does not pull up to 2.5V, but 2.5V - V A =2.5 VA =2.5 VBMBnCM1LVdoes not pull up to 2.5V, but 2.5V -VBTNThreshold voltage loss causesstatic power consumptionNMOS has higher threshold than PMOS (body effect)
41Pass-Transistor Logic- Solution 1: Level Restoring Transistor DDLevel Restorerweak transistorVDDMrBM2XAMnOutM1• Advantages: Full Swing, No static power dissipation• Restorer adds capacitance, takes away pull down current at X• Ratio problem
42Restorer Transistor Sizing 1002003004005000.01.02.0W/Lr=1.0/0.25=1.25/0.25=1.50/0.25=1.75/0.25Voltage[V]Time [ps]3.0Level restoring transistor cannot be too strong otherwise it will prevent output from reaching VDD valueUpper limit on restorer sizePass-transistor pull-down can have several transistors in stack
43Pass-Transistor Logic - Solution 2: Single Transistor Pass Gate with VT=0 OutVDD2.5V0VWATCH OUT FOR LEAKAGE CURRENTSIf pass transistorshave VT=0 the outputdoes not requirelevel restorer butthere is a leakagecurrent