After Tech. Mapping. 7. Circuit Level Design Buffer Chain Delay analysis of buffer chainDelay analysis considering parasitic capacitance,C p Ck,Pk: stage.

Slides:



Advertisements
Similar presentations
Topics Electrical properties of static combinational gates:
Advertisements

L 18 : Circuit Level Design 성균관대학교 조 준 동 교수
Design for Manufacturability and Power Estimation Lecture 25 Alessandra Nardi Thanks to Prof. Jan Rabaey and Prof. K. Keutzer.
High-Level Constructors and Estimators Majid Sarrafzadeh and Jason Cong Computer Science Department
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
Introduction to CMOS VLSI Design Clock Skew-tolerant circuits.
CSE477 L19 Timing Issues; Datapaths.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 19: Timing Issues; Introduction to Datapath.
L21:Lower Power Layout Design 성균관대학교 조 준 동 교수
Fall 06, Sep 19, 21 ELEC / Lecture 6 1 ELEC / (Fall 2005) Special Topics in Electrical Engineering Low-Power Design of Electronic.
Designing Combinational Logic Circuits: Part2 Alternative Logic Forms:
Dec. 6, 2005ELEC Glitch Power1 Low power design: Insert delays to eliminate glitches Yijing Chen Dec.6, 2005 Auburn university.
Copyright Agrawal, 2007 ELEC6270 Fall 07, Lecture 12 1 ELEC 5270/6270 Fall 2007 Low-Power Design of Electronic Circuits Pass Transistor Logic: A Low Power.
Lecture 8: Clock Distribution, PLL & DLL
11/5/2004EE 42 fall 2004 lecture 281 Lecture #28 PMOS LAST TIME: NMOS Electrical Model – NMOS physical structure: W and L and d ox, TODAY: PMOS –Physical.
© Digital Integrated Circuits 2nd Inverter CMOS Inverter: Digital Workhorse  Best Figures of Merit in CMOS Family  Noise Immunity  Performance  Power/Buffer.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 13: Power Dissipation Prof. Sherief Reda Division of Engineering, Brown.
Lecture 5 – Power Prof. Luke Theogarajan
Lecture 7: Power.
Lecture 21, Slide 1EECS40, Fall 2004Prof. White Lecture #21 OUTLINE –Sequential logic circuits –Fan-out –Propagation delay –CMOS power consumption Reading:
Low Power Design of Integrated Systems Assoc. Prof. Dimitrios Soudris
Low power architecture and HDL coding practices for on-board hardware applications Kaushal D. Buch ASIC Engineer, eInfochips Ltd., Ahmedabad, India
Digital Integrated Circuits for Communication
Digital Integrated Circuits© Prentice Hall 1995 Inverter THE INVERTERS.
Practical Aspects of Logic Gates COE 202 Digital Logic Design Dr. Aiman El-Maleh College of Computer Sciences and Engineering King Fahd University of Petroleum.
EE466: VLSI Design Power Dissipation. Outline Motivation to estimate power dissipation Sources of power dissipation Dynamic power dissipation Static power.
1 Delay Estimation Most digital designs have multiple data paths some of which are not critical. The critical path is defined as the path the offers the.
© Digital Integrated Circuits 2nd Sequential Circuits Digital Integrated Circuits A Design Perspective Designing Sequential Logic Circuits Jan M. Rabaey.
Modern VLSI Design 4e: Chapter 4 Copyright  2008 Wayne Wolf Topics n Interconnect design. n Crosstalk. n Power optimization.
ENGG 6090 Topic Review1 How to reduce the power dissipation? Switching Activity Switched Capacitance Voltage Scaling.
1 VLSI Design SMD154 LOW-POWER DESIGN Magnus Eriksson & Simon Olsson.
Review: CMOS Inverter: Dynamic
Power Reduction for FPGA using Multiple Vdd/Vth
EE415 VLSI Design DYNAMIC LOGIC [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
CAD for Physical Design of VLSI Circuits
ASIC Design Flow – An Overview Ing. Pullini Antonio
Logic Synthesis for Low Power(CHAPTER 6) 6.1 Introduction 6.2 Power Estimation Techniques 6.3 Power Minimization Techniques 6.4 Summary.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
Chapter 07 Electronic Analysis of CMOS Logic Gates
Chapter 4 Logic Families.
Washington State University
Modern VLSI Design 2e: Chapter 3 Copyright  1998 Prentice Hall PTR Topics n Electrical properties of static combinational gates: –transfer characteristics;
CSE 494: Electronic Design Automation Lecture 2 VLSI Design, Physical Design Automation, Design Styles.
Adiabatic Logic as Low-Power Design Technique Presented by: Muaayad Al-Mosawy Presented to: Dr. Maitham Shams Mar. 02, 2005.
L 19: Low Power Circuit Optimization. Power Optimization Modeling and Technology Circuit Design Level –logic Families –low-power Flip-Flops –low-power.
Modern VLSI Design 3e: Chapter 4 Copyright  1998, 2002 Prentice Hall PTR Topics n Interconnect design. n Crosstalk. n Power optimization.
Reduandant Binary-based Booth Multipler
경종민 Low-Power Design for Embedded Processor.
EE141 © Digital Integrated Circuits 2nd Inverter 1 Digital Integrated Circuits A Design Perspective The Inverter Jan M. Rabaey Anantha Chandrakasan Borivoje.
Basics of Energy & Power Dissipation
Modern VLSI Design 3e: Chapter 7 Copyright  1998, 2002 Prentice Hall PTR Topics n Power/ground routing. n Clock routing. n Floorplanning tips. n Off-chip.
© Digital Integrated Circuits 2nd Inverter Digital Integrated Circuits A Design Perspective The Inverter Jan M. Rabaey Anantha Chandrakasan Borivoje Nikolic.
Bi-CMOS Prakash B.
Introduction to Clock Tree Synthesis
FPGA-Based System Design: Chapter 2 Copyright  2004 Prentice Hall PTR Topics n Logic gate delay. n Logic gate power consumption. n Driving large loads.
Solid-State Devices & Circuits
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.
EE141 © Digital Integrated Circuits 2nd Combinational Circuits 1 A few notes for your design  Finger and multiplier in schematic design  Parametric analysis.
VADA Lab.SungKyunKwan Univ. 1 L20 :Lower Power Booth Multiplier Design 성균관대학교 전기전자컴퓨터공학부.
Modern VLSI Design 3e: Chapter 3 Copyright  1998, 2002 Prentice Hall PTR Topics n Electrical properties of static combinational gates: –transfer characteristics;
EE415 VLSI Design THE INVERTER [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 6.1 EE4800 CMOS Digital IC Design & Analysis Lecture 6 Power Zhuo Feng.
ELEC Digital Logic Circuits Fall 2015 Delay and Power Vishwani D. Agrawal James J. Danaher Professor Department of Electrical and Computer Engineering.
LOW POWER DESIGN METHODS
THE CMOS INVERTER.
LOW POWER DESIGN METHODS V.ANANDI ASST.PROF,E&C MSRIT,BANGALORE.
Reading: Hambley Ch. 7; Rabaey et al. Sec. 5.2
Timing Analysis 11/21/2018.
Chapter 10 Timing Issues Rev /11/2003 Rev /28/2003
332:578 Deep Submicron VLSI Design Lecture 14 Design for Clock Skew
Reading: Hambley Ch. 7; Rabaey et al. Secs. 5.2, 5.5, 6.2.1
Presentation transcript:

After Tech. Mapping

7. Circuit Level Design

Buffer Chain Delay analysis of buffer chainDelay analysis considering parasitic capacitance,C p Ck,Pk: stage k buffer output 의 total capacitance, power PT: buffer chain 의 power consumption Pn: load capacitance CL 의 power consumption Eff: power efficiency pn/pT

Slew Rate Determining rise/fall time

Slew Rate(Cont’d) Power consumption of Short circuit current in Oscillation Circuit

Pass Transistor Logic Reducing Area/Power –Macro cell(Large part in chip area)  XOR/XNOR/MUX(Primitive)  Pass Tr. Logic –Not using charge/discharge scheme  Appropriate in Low Power Logic Pass Tr logic Family –CPL (Complementary Pass Transistor Logic) –DPL (Dual Pass Transistor Logic) –SRPL (Swing Restored Pass Transistor Logic) CPL –Basic Scheme –Inverter Buffering

Pass Transistor Logic(Cont’d) DPL –Pass Tr Network + Dual p-MOS –Enables rail-to-rail swing –Characteristics Increasing input capacitance(delay) Increasing driving ability for existing 2 ON-path equals CPL in input loading capacitance SRPL –Pass Tr network + Cross coupled inverter –Restoring logic level –Inverter size must not be too big

Dynamic Logic Using Precharge/Evaluation scheme Family –Domino logic –NORA(NO RAce) logic Characteristics –Decreasing input loading capacitance –Power consumption in precharge clock –Increasing useless switching in precharging period Basic architecture of Domino logic

Input Pin Ordering Reorder the equivalent inputs to a transistor based on critical path delays and power consumption N- input Primitive CMOS logic –symmetrical in function level –antisymmetrical in Tr level capacitance of output stage body effect Scheme –The signal that has many transition must be far from output –If it is hard to estimate switching frequency, we must determine pin ordering considering path and path delay balance from primary input to input of Tr. Example of N-input CMOS logic Experimentd with gate array of TI For a 4-input NAND gate in TI’s BiCMOS gate array library (with a load of 13 inverters), the delay varies by 20% while power dissipation by 10% between a good and bad ordering

INPUT PIN Reordering Simulation result ( t cycle =50ns, t f /t r =1ns) : A 가 critical input 인 경우 =38.4uW, D 가 critical input 인 경우 =47.2uW

Sensitization Example Definition –sensitization : input signal that forces output transition event –sensitization vector : the other inputs if one signal is sensitized

Sensitization(Cont’d) Considering Sensitization in Combinational logic:Remove unnecessary transitions in the C.L Considering Sensitization in Sequential logic: Also reduces the power consumption in the flip- flops.

TTL-Compatible TTL level signal  CMOS input Characteristic Curve of CMOS Inverter

TTL Compatible(Cont’d) CMOS output signal  TTL input –Because of sink current I OL, CMOS gets a large amount of heat –Increased chip operating temperature –Power consumption of whole system

INPUT PIN Reordering ◈ To reduce the power dissipation one should place the input with low transition density near the ground end. (a) If MNA turns off, only CL needs to be charged (b) If MND turns off, all CL, CB, CC and CD needs to be charged (c) If the critical input is rising and placed near output node, the initial charge of C B, C C and C D are zero and the delay time of C L discharging is less than (d) (d) If the critical input is rising and placed near ground end, the charge of CB, CC and CD must dischagge before the charge of CL discharge to zero

저전력 Booth Multiplier 설계 성균관대학교 전기전자컴퓨터공학부 김 진 혁, 이 준 성, 조 준 동

Modified Booth 곱셈기 Multibit Recoding 을 사용하여 부분합의 갯수를 1/2 로 줄여 고속의 곱셈을 가능하게 한다. 피승수 (multiplicand) : X, 승수 (multiplier) : Y Recoded digit = Y 2i-1 + Y 2i -2Y 2i+1 ( Y -1 =0 )

Modified Booth 곱셈기 - 예 Example

Wallace Tree - 4:2 Compressor

Multipliers - Area 16-bit Multiplier Area

Multiplier - Delay Average Power Dissipation (16-bit)

Multiplier - Power Worst-Case Delay (16-bit)

Instruction Level Power Analysis Estimate power dissipation of instruction sequences and power dissipation of a program E b : base cost of individual instructions E s : circuit state change effects E M : the overall energy cost of a program B i : the base cost of type i instruction N i : the number of type i instruction O i,j : the cost occurred when a type i instruction is followed by a type j instruction N i,j : the number of occurrences when a type i instruction is immediately followed by a type j instruction

Instruction ordering Develop a technique of operand swapping Recoding weight : necessary operation cost of operands W total : total recoding weight of input operand W i : weight of individual recoded digit i in Booth Multiplier W b : base weight of an instruction W inter : inter-operation weight of instructions Therefore, if an operand has lower W total, put it in the second input(multiplier).

RESULT

Conclusion Power[pJ] bits % of instances with circuit states effects 4.0% reduction 12.0% reduction 9.0% reduction

8. Layout Level Design

Constant scaled wire increases coupling capacitance by S and wire resistance by S Supply Voltage by 1/S, Theshold Voltage by 1/S, Current Drive by 1/S Gate Capaitance by 1/S, Gate Delay by 1/S Global Interconnection Delay, RC load+para by S Interconnect Delay: 50-70% of Clock Cycle Area: 1/S 2 Power dissipation by 1/S - 1/S 2 ( P = nCV dd 2 f, where nC is the sum of capacitance times #transitions) SIA (Semiconductor Industry Association): On 2007, physical limitation: 0.1  m 20 billion transistors, 10 sqare centimeters, 12 or 16 inch wafer Device Scaling of Factor of S

Delay Variations at Low-Voltage At high supply voltage, the delay increases with temperature (mobility is decreasing with temperature) while at very low supply voltages the delay decreases with temperature (V T is decreasing with temperature). At low supply voltages, the delay ratio between large and minimum transistor widths W increases in several factors. Delay balancing of clock trees based on wire snaking in order to avoid clock-skew. In this case, at low supply voltages, slightly V T variations can significantly modify the delay balancing.

Quarter Micron Challenge Computers/peripherals (SOC): 1996 ($50 Billion) 1999 ($70 Billion) Wiring dominates delay: wire R comparable to gate driver R; wire/wire coupling C > C to ground Push beyond 0.07 micron Quest for area(past), speed-speed (now), power-power-power(future) Accelerated increases of clock frequencies Signal integrity-based tools Design styles (chip + packages) System-level design(system partitioning) Synthesis with multiple constraints (power,area,timing) Partitioning/MCM Increasing speed limits complicate clock and power distribution Design bounded by wires, vias, via resistance, coupling Reverse scaling: adding area/spacing as needed: widening, thickening of wires, metal shielding & noise avoidance - adding metal

CLOCK POWER CONSUMPTION Clock power consumption is as large as the logic power; Clock Signal carrying the heaviest load and switching at high frequency, clock distribution is a major source of power dissipation. In a microprocessor, 18% of the total power is consumed by clocking Clock distribution is designed as a hierarchical clock tree, according to the decomposition principle.

Power Consumption per block in typical microprocessor

Crosstalk

Solution for Clock Skew Dynamic Effects on Skew Capacitance Coupling Supply Voltage Deviation (Clock driver and receiver voltage difference) Capacitance deviation by circuit operation Global and local temperature Layout Issues: clocks routed first Must aware of all sources of delay Increased spacing Wider wires Insert buffers Specialized clock need net matching Two approaches: Single Driver, H- tree driver Gated Clocks: The local clocks that are conditionally enabled so that the registers are only clocked during the write cycles. The clock is partitioned in different blocks and each block is clocked with its own clock. Gating the clocks to infrequently used blocks does not provide and acceptable level of power savings Divide the basic clock frequency to provide the lowest clock frequency needed to different parts of the circuit Clock Distribution: large clock buffer waste power. Use smaller clock buffers with a well-balanced clock tree.

PowerPC Clocking Scheme

CLOCK DRIVERS IN THE DEC ALPHA 21164

DRIVER for PADS or LARGE CAPACITANCES Off-chip power (drivers and pads) are increasing and is very difficult to reduce such a power, as the pads or drivers sizes cannot be decreased with the new technologies.

Layout-Driven Resynthesis for Lower Power

Low Power Process Dynamic Power Dissipation

Crosstalk In deep-submicron layouts, some of the netlengths for connection between modules can be so long that they have a resistance which is comparable to the resistance of the driver. Each net in the mixed analog/digital circuits is identified depending upon its crosstalk sensitivity –1. Noisy = high impedance signal that can disturb other signals, e.g., clock signals. –2. High-Sensitivity = high impedance analog nets; the most noise sensitive nets such as the input nets to operational amplifiers. –3. Mid-Sensitivity = low/medium impedance analog nets. –4. Low-Sensitivity = digital nets that directly affect the analog part in some cells such as control signals. –5. Non-Sensitivity = The most noise insensitive nets such as pure digital nets, The crosstalk between two interconnection wires also depends on the frequencies (i.e., signal activities) of the signals traveling on the wires. Recently, deep-submicron designs require crosstalk-free channel routing.

Power Measure in Layout The average dynamic power consumed by a CMOS gate is given below, where C_l is the load capacity at the output of the node, V_dd is the supply voltage, T_cycle is the global clock period, N is the number of transitions of the gate output per clock cycle, C_g is the load capacity due to input capacitance of fanout gates, and C_w is the load capacity due to the interconnection tree formed between the driver and its fanout gates. P av = (0.5 V dd 2) / (T cycle C l N) = (0.5 V dd 2) / (T cycle (C g + C w )N) Logic synthesis for low power attempts to minimize SUM i C gi N i Physical design for low power tries to minimize SUM i C wi N i. Here C wi consists of C xi + C sI, where C xi is the capacitance of net i due to its crosstalk, and C sI is the substrate capacitance of net i. For low power layout applications, power dissipation due to crosstalk is minimized by ensuring that wires carrying high activity signals are placed sufficiently far from the other wires. Similarly, power dissipation due to substrate capacitance is proportional to the wirelength and its signal activity.

이중 전압을 이용한 저전력 레이아웃 설계 성균관대학교 전기전자컴퓨터공학부 김 진 혁, 이 준 성, 조 준 동

목 차 연구목적 연구배경 Clustered Voltage Scaling 구조 Row by Row Power Supply 구조 Mix-And-Match Power Supply 구조 Level Converter 구조 Mix-And-Match Power Supply 설계흐름 실험결과 결론

연 구 목 적 및 배경 조합회로의 전력 소모량을 줄이는 이중 전압 레이아웃 기법 제안 이중 전압 셀을 사용할 때, 한 cell row 에 같은 전압의 cell 이 배치되면 서 증가하는 wiring 과 track 의 수를 줄임 최소 트랜지스터 개수를 사용하는 Level Converter 회로의 구현 디바이스의 성능을 유지하면서 이중 전압을 사용하는 Clustered Voltage Scaling [Usami, ’95] 을 적 용 제안된 Mix-And-Match Power Supply 레이 아웃 구조는 기존의 Row by Row Power Supply [Usami, ’97] 레이 아웃 구조를 개선하여 전력과 면적을 줄임

Clustered Voltage Scaling 저전력 netlist 를 생성

Row by Row Power Supply 구조

Mix-And-Match Power Supply 구조

구 조 비 교구 조 비 교 Conventional RRPS MAMPS Circuit

Level Converter 구조 Transistor 의 갯수 : 6 개 4 개 전력과 면적면에서 효과적 기 존 제 안

Mix-And-Match Power Supply Design Flow

실 험 결 과실 험 결 과 전체 Power 전체 Area

결 론 단일 전압 회로와 비교하여 49.4% 의 Power 감소를 얻은 반면 5.6% 의 Area overhead 가 발생 기존의 RRPS 구조보다 10% 의 Area 감소와 2% 의 Power 감소 제안된 Level Converter 는 기존의 Level Converter 보다 30% 의 Area 감소와 35% 의 Power 감소

9. CAD tools

Low Power Design Tools Transistor Level Tools (5-10% of silicon) –SPICE, PowerMill(Epic), ADM(Avanti/Anagram), Lsim Power Analyst(mentor) Logic Level Tools (10-15%) –Design Power and PowerGate (Synopsys), WattWatcher/Gate (Sente), PowerSim (System Sciences), POET (Viewlogic), and QuickPower (Mentor) Architectural (RTL) Level Tools (20-25%) –WattWatcher/Architect (Sente): 20-25% accuracy Behavioral (spreadsheet) Level Tools (50-100%) –Active area of academic research

Commercial synthesis systems

Research synthesis systems A - Architectural synthesis. L - Logic synthesis.

Low-Power CAD sites Alternative System Concepts, Inc, : 7X power reduction throigh optimization, contact and Jake Karrfalt at or (603) Reduction of glitch and clock power; modeling and optimization of interconnect power; power optimization for data-dominated designs with limited control flow. Mentor Graphics QuickPower: Hierarchical of determining overall benet of exchanging the blocks for lower power. powering down or disabling blocks when not in use by gated-clock choose candidates for power-down Calculate the effect of the power-down logic Synopsys's Power Compiler Sente's WattWatcher/Architect (first commerical tool operating at the architecture level(20-25 %accuracy). Behavioral Tool: Hyper-LP (Optimization), Explore (Estimation) by J. Rabaey

Design Power(Synopsys) DesignPower(TM) provides a single, integrated environment for power analysis in multiple phases of the design process: – Early, quick feedback at the HDL or gate level through probabilistic analysis. – Improved accuracy through simulation-based analysis for gate level and library exploration. DesignPower estimates switching, internal cell and leakage power. It accepts user-defined probabilities, simulation toggle data or a combination of both as input. DesignPower propagates switching information through sequential devices, including flip-flops and latches. It supports sequential, hierarchical, gated-clock, and multiple-clock designs. For simulation toggle data, it links directly to Verilog and VHDL simulators, including Synopsys' VSS.

10. References

References [1] Gary K. Yeap, "Practical Low Power Digital VLSI Design", Kluwer Academic Publishers. [2] Jan M. Rabaey, Massoud Pedram, "Low Power Design Methodologies", Kluwer Academic Publishers. [3] Abdellatif Bellaouar, Mohamed I. Elmasry, "Low-Power Digital VLSI Design Circuits And Systems", Kluwer Academic Publishers. [4] Anantha P. Chandrakasan, Robert W. Brodersen, "Low Power Digital CMOS Design", Kluwer Academic Publishers. [5] Dr. Ralph Cavin, Dr. Wentai Liu, "1996 Emerging Technologies : Designing Low Power Digital Systems" [6] Muhammad S. Elrabaa, Issam S. Abu-Khater, Mohamed I. Elmasry, "Advanced Low-Power Digital Circuit Techniques", Kluwer Academic Publishers.

References [BFKea94] R. Bechade, R. Flaker, B. Kaumann, and et. al. A 32b 66 mhz 1.8W Microprocessor". In IEEE Int. Solid-State Circuit Conference, pages , [BM95] Bohr and T. Mark. Interconnect Scaling - The real limiter to high performance ULSI". In proceedings of 1995 IEEE international electron devices meeting, pages , [BSM94] L. Benini, P. Siegel, and G. De Micheli. Saving Power by Synthesizing Gated Clocks for Sequential Circuits". IEEE Design and Test of Computers, 11(4):32-41, [GH95] S. Ganguly and S. Hojat. Clock Distribution Design and Verification for PowerPC Microprocessor". In International Conference on Computer-Aided Design, page Issues in Clock Designs, [MGR96] R. Mehra, L. M. Guerra, and J. Rabaey. Low Power Architecture Synthesis and the Impact of Exploiting Locality". In Journal of VLSI Signal Processing,, 1996.