Christopher LaFrieda and Rajit Manohar Computer Systems Laboratory Cornell University Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits.

Slides:



Advertisements
Similar presentations
ECE555 Lecture 5 Nam Sung Kim University of Wisconsin – Madison
Advertisements

Jongsok Choi M.A.Sc Candidate, University of Toronto.
Introduction to CMOS VLSI Design Combinational Circuits
COMBINATIONAL LOGIC [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
Chapter 10 Digital CMOS Logic Circuits
Static CMOS Circuits.
Chapter 3 Logic Gates.
CMOS Logic Circuits.
Topics Electrical properties of static combinational gates:
The scale of IC design Small-scale integrated, SSI: gate number usually less than 10 in a IC. Medium-scale integrated, MSI: gate number ~10-100, can operate.
Transmission Gate Based Circuits
ASYNC07 High Rate Wave-pipelined Asynchronous On-chip Bit-serial Data Link R. Dobkin, T. Liran, Y. Perelman, A. Kolodny, R. Ginosar Technion – Israel Institute.
COMP541 Transistors and all that… a brief overview
Introduction to CMOS VLSI Design Sequential Circuits.
VLSI Design EE 447/547 Sequential circuits 1 EE 447/547 VLSI Design Lecture 9: Sequential Circuits.
Introduction to CMOS VLSI Design Sequential Circuits
MICROELETTRONICA Sequential circuits Lection 7.
Lecture 11: Sequential Circuit Design. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 11: Sequential Circuits2 Outline  Sequencing  Sequencing Element Design.
Sequential Circuits. Outline  Floorplanning  Sequencing  Sequencing Element Design  Max and Min-Delay  Clock Skew  Time Borrowing  Two-Phase Clocking.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis EE4800 CMOS Digital IC Design & Analysis Lecture 11 Sequential Circuit Design Zhuo Feng.
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
Introduction to CMOS VLSI Design Clock Skew-tolerant circuits.
Clock Design Adopted from David Harris of Harvey Mudd College.
A 16-Bit Kogge Stone PS-CMOS adder with Signal Completion Seng-Oon Toh, Daniel Huang, Jan Rabaey May 9, 2005 EE241 Final Project.
Low Power Design for Wireless Sensor Networks Aki Happonen.
Institute of Digital and Computer Systems 1 Fabio Garzia / Finding Peak Performance in a Process23/06/2015 Chapter 5 Finding Peak Performance in a Process.
On-Line Adjustable Buffering for Runtime Power Reduction Andrew B. Kahng Ψ Sherief Reda † Puneet Sharma Ψ Ψ University of California, San Diego † Brown.
Towards An Efficient Low Frequency Energy Recovery Dynamic Logic Sujay Phadke Advanced Computer Architecture Lab Department of Electrical Engineering and.
Lecture 5 – Power Prof. Luke Theogarajan
Lecture 7: Power.
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits Credits: David Harris Harvey Mudd College (Material taken/adapted from Harris’ lecture.
Prof. John Nestor ECE Department Lafayette College Easton, Pennsylvania ECE VLSI Circuit Design Lecture 8 - Comb. Logic.
Lecture 11 MOUSETRAP: Ultra-High-Speed Transition-Signaling Asynchronous Pipelines.
Integrated Regulation for Energy- Efficient Digital Circuits Elad Alon 1 and Mark Horowitz 2 1 UC Berkeley 2 Stanford University.
Lecture 21, Slide 1EECS40, Fall 2004Prof. White Lecture #21 OUTLINE –Sequential logic circuits –Fan-out –Propagation delay –CMOS power consumption Reading:
Power, Energy and Delay Static CMOS is an attractive design style because of its good noise margins, ideal voltage transfer characteristics, full logic.
Digital Integrated Circuits for Communication
EE466: VLSI Design Power Dissipation. Outline Motivation to estimate power dissipation Sources of power dissipation Dynamic power dissipation Static power.
ENGG 6090 Topic Review1 How to reduce the power dissipation? Switching Activity Switched Capacitance Voltage Scaling.
Mehdi Sadi, Italo Armenti Design of a Near Threshold Low Power DLL for Multiphase Clock Generation and Frequency Multiplication.
Review: CMOS Inverter: Dynamic
Power Reduction for FPGA using Multiple Vdd/Vth
Logic Synthesis For Low Power CMOS Digital Design.
MICAS Department of Electrical Engineering (ESAT) Design-In for EMC on digital circuit October 27th, 2005 AID–EMC: Low Emission Digital Circuit Design.
Paper review: High Speed Dynamic Asynchronous Pipeline: Self Precharging Style Name : Chi-Chuan Chuang Date : 2013/03/20.
Ratioed Circuits Ratioed circuits use weak pull-up and stronger pull-down networks. The input capacitance is reduced and hence logical effort. Correct.
DCSL & LVDCSL: A High Fan-in, High Performance Differential Current Switch Logic Families Dinesh Somasekhaar, Kaushik Roy Presented by Hazem Awad.
XIAOYU HU AANCHAL GUPTA Multi Threshold Technique for High Speed and Low Power Consumption CMOS Circuits.
Chapter 1 Combinational CMOS Logic Circuits Lecture # 4 Pass Transistors and Transmission Gates.
4. Combinational Logic Networks Layout Design Methods 4. 2
Skewed Flip-Flop Transformation for Minimizing Leakage in Sequential Circuits Jun Seomun, Jaehyun Kim, Youngsoo Shin Dept. of Electrical Engineering, KAIST,
Basics of Energy & Power Dissipation
Directional and Single-Driver Wires in FPGA Interconnect Guy Lemieux Edmund LeeMarvin TomAnthony Yu Dept. of ECE, University of British Columbia Vancouver,
FPGA-Based System Design: Chapter 2 Copyright  2004 Prentice Hall PTR Topics n Logic gate delay. n Logic gate power consumption. n Driving large loads.
FPGA-Based System Design: Chapter 6 Copyright  2004 Prentice Hall PTR Topics n Low power design. n Pipelining.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis 6.1 EE4800 CMOS Digital IC Design & Analysis Lecture 6 Power Zhuo Feng.
1 Recap: Lecture 4 Logic Implementation Styles:  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates, or “pass-transistor” logic.
ECE 506 Reconfigurable Computing Lecture 5 Logic Block Architecture Ali Akoglu.
EE222 Winter 2013 Steve Kang Lecture 5 Interconnects and Clock Signaling Open systems interconnect (
Presented by Rania Kilany.  Energy consumption  Energy consumption is a major concern in many embedded computing systems.  Cache Memories 50%  Cache.
1 Clockless Logic Montek Singh Thu, Mar 2, Review: Logic Gate Families  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates,
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
Lecture 11: Sequential Circuit Design
Other Approaches.
SECTIONS 1-7 By Astha Chawla
Reading: Hambley Ch. 7; Rabaey et al. Sec. 5.2
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits
332:578 Deep Submicron VLSI Design Lecture 14 Design for Clock Skew
Clockless Logic: Asynchronous Pipelines
Wagging Logic: Moore's Law will eventually fix it
Presentation transcript:

Christopher LaFrieda and Rajit Manohar Computer Systems Laboratory Cornell University Reducing Power Consumption with Relaxed Quasi Delay-Insensitive Circuits

Outline Motivation / Background Contributions Relaxed Quasi Delay-Insensitive (RQDI) RQDI Voltage Scaling RQDI Two Phase Circuits Results Summary

Motivation: How Does Dynamic Power Scale? α – activity factor (1x) N – total number of transistors (2x) C L – average load capacitance per transistor (.7x) V dd – doesn’t scale well anymore Scaled by 17-20% from 130nm to 65nm. Scaled by 10% at 45nm and 5.5% at 32nm.

Motivation: Power Scaling With Fixed Frequency

Motivation: Process Variations Getting Worse Process Variation in 65nm: FO4 delays across corners: FF is 70% faster than SS. Circuits need to be robust w.r.t. process variations. QDI is a logical place to start. SS CornerTT CornerFF Corner 13.6 ps18.2 ps22.6 ps

Background: QDI – WCHB Buffer Simple buffer. Neutrality is checked in the pull-up stack of the c-element. Timing assumption?

RQDI: Staticizer Timing Assumption I Data is neutral and enable is high.

RQDI: Staticizer Timing Assumption II Data is neutral and enable is high. Data becomes valid which sets _R0 low. If R0 inverter is slow, R0 will remain low.

RQDI: Staticizer Timing Assumption III Data is neutral and enable is high. Data becomes valid which sets _R0 low. If R0 inverter is slow, R0 will remain low. Nothing is fighting the weak feedback, _R0 can go high.

RQDI: Half Cycle Timing Assumption The half cycle timing assumption (HCTA): A small amount of combinational logic (1-2 transitions) will always switch within one half cycle of a process. There is a 4.5x 18 t.p.c.) timing margin. With worst case corners, 2.7x margin in 65nm. Wire delays make the assumption even more conservative. QDI has an HCTA in staticizers. RQDI allows them everywhere.

RQDI: HCHB Template N tracks neutrality. Check N+, but assume N- happens in the first half cycle. Two transition latency. 14 transition cycle time. Validity must be checked by pull- down.

RQDI Voltage Scaling: Scaling Scenarios Two possible scenarios for voltage scaling. Top: mismatched slack. Lower pipeline can run slower. Bottom: Token limited loop. Latency through loop should be minimal, but cycle time can scale. In some applications these can’t be avoided. Mismatched slack Token limited loop

RQDI Voltage Scaling: Slack Mismatch In An FPGA Logic blocks (LB) for logic. Switch boxes (SB) for routing. Limited routing resources. Imperfect slack matching. Can scale voltage on blue path.

RQDI Voltage Scaling: DVHB: Dual Voltage Template Data rails are full swing. Acknowledges are low swing. Latency remains constant through voltage scaling. Cycle time can be adjusted through voltage scaling.

RQDI Two Phase Circuits: Two Phase Buffer (HCFB2P) An HCTA exists on the right pair of XORs. Two transition latency. Seven transition cycle time. Twice the area of a WCHB. However, it can replace two stages.

RQDI Two Phase Circuits: Two Phase In An FPGA Replace routing (SB) with two phase logic. Logic (LB) remains four phase. Phase converters are placed around logic blocks. Routing makes up over half the area in an asynchronous FPGA, so power savings can be large. Width N Switch

RQDI Two Phase Circuits: Converters Need to convert between two phase (for routing) and four phase (for logic). The 4:2 converter is 3x larger than a WCHB. The 2:4 converter is 3.25x larger than a WCHB.

Experimental Setup Simulated in HSpice with a 65nm bulk technology. Circuits are sized to the drive strength of a 20/10 lambda inverter. NameDescriptionInputsOutputsImplies Validity? and2And21No or2Or21No xor2Exclusive Or21Yes faFull Adder32Yes bencBooth Encoder32No

Results : HCHB – Energy Per Cycle HCHB consumes 32% less energy than PCHB. HCHB consumes 36% less energy than PCEHB. Slight frequency improvement. Negligible latency penalty.

Results: HCHB – Total Transistor Area Despite the additional transistors to check validity, HCHB is smaller. HCHB is about 20% smaller than PCHB. HCHB is about 15% smaller than PCEHB.

Results: DVHB – Low voltage vs. Dual Voltage

Results: HCFB2P Switch – Energy Reduction vs. WCHB Wider switches means larger MUXes and larger PCs. The associated caps switch half as much. Over 50% reduction in power. Due to replacing two stages.

RQDI Two Phase Circuits: Results – Area Overhead Typically, there is about of 8 stages of 4-wide switches between logic blocks. Area overhead is 15%. With direct connections, there are about 10 stages with an overhead of 10%.

Summary RQDI allows half cycle timing assumptions outside of staticizers. With RQDI, we can simplify the PCHB logic template. The resulting template, HCHB, consumes 32% less energy. The dual voltage logic template can be used to adjust the dynamic slack of a stage. This allows us to save energy with a minimal throughput penalty in token limited loops. Replacing the routing in an FPGA with two phase logic can reduce energy consumption by 50%. Using the RQDI two phase buffer and converters will achieve this with a 10-15% area overhead.

Questions?