Reader: Pushpinder Kaur Chouhan

Slides:



Advertisements
Similar presentations
Digital Integrated Circuits© Prentice Hall 1995 Combinational Logic COMBINATIONAL LOGIC.
Advertisements

Topics Electrical properties of static combinational gates:
Logical Design.
Introduction So far, we have studied the basic skills of designing combinational and sequential logic using schematic and Verilog-HDL Now, we are going.
CPE 626 CPU Resources: Adders & Multipliers Aleksandar Milenkovic Web:
Reading1: An Introduction to Asynchronous Circuit Design Al Davis Steve Nowick University of Utah Columbia University.
1 KU College of Engineering Elec 204: Digital Systems Design Lecture 9 Programmable Configurations Read Only Memory (ROM) – –a fixed array of AND gates.
Lecture 11: Sequential Circuit Design. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 11: Sequential Circuits2 Outline  Sequencing  Sequencing Element Design.
1 Clockless Logic  Recap: Lookahead Pipelines  High-Capacity Pipelines.
Modern VLSI Design 4e: Chapter 8 Copyright  2008 Wayne Wolf Topics High-level synthesis. Architectures for low power. GALS design.
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
Introduction to CMOS VLSI Design Clock Skew-tolerant circuits.
Synchronous Digital Design Methodology and Guidelines
CSE477 L19 Timing Issues; Datapaths.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 19: Timing Issues; Introduction to Datapath.
Clock Design Adopted from David Harris of Harvey Mudd College.
A 16-Bit Kogge Stone PS-CMOS adder with Signal Completion Seng-Oon Toh, Daniel Huang, Jan Rabaey May 9, 2005 EE241 Final Project.
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 1 VLSI Architectures Lecture 3 S&F Ch. 5: Handshake Ckt Implementations.
Contemporary Logic Design Multi-Level Logic © R.H. Katz Transparency No Chapter # 3: Multi-Level Combinational Logic 3.3 and Time Response.
Designing Combinational Logic Circuits: Part2 Alternative Logic Forms:
1 Clockless Logic Montek Singh Thu, Jan 13, 2004.
Low Power Design for Wireless Sensor Networks Aki Happonen.
Computer ArchitectureFall 2008 © August 20 th, Introduction to Computer Architecture Lecture 2 – Digital Logic Design.
Lecture 11 MOUSETRAP: Ultra-High-Speed Transition-Signaling Asynchronous Pipelines.
Digital Integrated Circuits© Prentice Hall 1995 Combinational Logic COMBINATIONAL LOGIC.
1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines.
Sequential Circuit  It is a type of logic circuit whose output depends not only on the present value of its input signals but on the past history of its.
Digital Integrated Circuits for Communication
Bar Ilan University, Engineering Faculty
Practical Aspects of Logic Gates COE 202 Digital Logic Design Dr. Aiman El-Maleh College of Computer Sciences and Engineering King Fahd University of Petroleum.
EE466: VLSI Design Power Dissipation. Outline Motivation to estimate power dissipation Sources of power dissipation Dynamic power dissipation Static power.
© Digital Integrated Circuits 2nd Sequential Circuits Digital Integrated Circuits A Design Perspective Designing Sequential Logic Circuits Jan M. Rabaey.
Design of Robust, Energy-Efficient Full Adders for Deep-Submicrometer Design Using Hybrid-CMOS Logic Style Sumeer Goel, Ashok Kumar, and Magdy A. Bayoumi.
EE415 VLSI Design DYNAMIC LOGIC [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
Clockless Chips Date: October 26, Presented by:
Low-Power Wireless Sensor Networks
Asynchronous Datapath Design Adders Comparators Multipliers Registers Completion Detection Bus Pipeline …..
MOUSETRAP Ultra-High-Speed Transition-Signaling Asynchronous Pipelines Montek Singh & Steven M. Nowick Department of Computer Science Columbia University,
Abdullah Aldahami ( ) Feb26, Introduction 2. Feedback Switch Logic 3. Arithmetic Logic Unit Architecture a.Ripple-Carry Adder b.Kogge-Stone.
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
MICAS Department of Electrical Engineering (ESAT) Design-In for EMC on digital circuit October 27th, 2005 AID–EMC: Low Emission Digital Circuit Design.
Digital Logic Design Review Dr. Ahmad Almulhem ahmadsm AT kfupm Phone: Office: Ahmad Almulhem, KFUPM 2010.
Paper review: High Speed Dynamic Asynchronous Pipeline: Self Precharging Style Name : Chi-Chuan Chuang Date : 2013/03/20.
Ratioed Circuits Ratioed circuits use weak pull-up and stronger pull-down networks. The input capacitance is reduced and hence logical effort. Correct.
DCSL & LVDCSL: A High Fan-in, High Performance Differential Current Switch Logic Families Dinesh Somasekhaar, Kaushik Roy Presented by Hazem Awad.
Design of a 32-Bit Hybrid Prefix-Carry Look-Ahead Adder
NTU Confidential Test Asynchronous FIR Filter Design Presenter: Po-Chun Hsieh Advisor:Tzi-Dar Chiueh Date: 2003/12/1.
MICAS Department of Electrical Engineering (ESAT) Design-In for EMC on digital circuit December 5th, 2005 Low Emission Digital Circuit Design Junfeng Zhou.
XIAOYU HU AANCHAL GUPTA Multi Threshold Technique for High Speed and Low Power Consumption CMOS Circuits.
ECE442: Digital ElectronicsSpring 2008, CSUN, Zahid Static CMOS Logic ECE442: Digital Electronics.
A Reconfigurable Low-power High-Performance Matrix Multiplier Architecture With Borrow Parallel Counters Counters : Rong Lin SUNY at Geneseo
UNIVERSITY OF ROSTOCK Institute of Applied Microelectronics and Computer Science Single-Rail Self-timed Logic Circuits in Synchronous Designs Frank Grassert,
Reading Assignment: Rabaey: Chapter 9
Bi-CMOS Prakash B.
Static CMOS Logic Seating chart updates
NTU Confidential Introduction to the Applications of Asynchronous Circuits Presenter: Po-Chun Hsieh Advisor:Tzi-Dar Chiueh Date: 2003/09/22.
Dynamic Logic Circuits Static logic circuits allow implementation of logic functions based on steady state behavior of simple nMOS or CMOS structures.
1 Practical Design and Performance Evaluation of Completion Detection Circuits Fu-Chiung Cheng Department of Computer Science Columbia University.
EE415 VLSI Design THE INVERTER [Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al.]
Dynamic Logic.
EE141 Combinational Circuits 1 Chapter 6 (I) Designing Combinational Logic Circuits Dynamic CMOS LogicDynamic CMOS Logic V1.0 5/4/2003.
VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 성균관대학교 조 준 동 교수
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
1 Recap: Lecture 4 Logic Implementation Styles:  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates, or “pass-transistor” logic.
EE141 Arithmetic Circuits 1 Chapter 14 Arithmetic Circuits Rev /12/2003 Rev /05/2003.
1 The ALU l ALU includes combinational logic. –Combinational logic  a change in inputs directly causes a change in output, after a characteristic delay.
1 Clockless Logic Montek Singh Thu, Mar 2, Review: Logic Gate Families  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates,
Welcome To Seminar Presentation Seminar Report On Clockless Chips
Reading: Hambley Ch. 7; Rabaey et al. Sec. 5.2
ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN
COMBINATIONAL LOGIC - 2.
Presentation transcript:

Reader: Pushpinder Kaur Chouhan Speculative Completion for the Design of High-Performance Asynchronous Dynamic Adders Authors: Steven M. Nowick, Kenneth Y. Yun, Peter A. Beerel and Ayoob E.Dooply Reader: Pushpinder Kaur Chouhan

Speculative Completion for the Design of High-Performance Asynchronous Dynamic Adders Introduction Basic Concepts Architecture of Speculative Completion Speculative Adder Design Basic Dynamic Brent-Kung Adders Conclusion References

Speculative Completion for the Design of High-Performance Asynchronous Dynamic Adders Introduction Goal of the article Motivation Basic Concept Counters Classification Architecture of Speculative Completion Speculative Adder Design Conclusion References

Introduction Goal of the article – To design high performance asynchronous datapath components, which are faster than synchronous designs and yet have low area overhead.

Motivation Potential advantages of asynchronous design: Low power consumption - components use power only “on demand” High performance - systems not limited to “worst-case” clock rate Robustness & Scalability - no global timing Ease of design – global clock distribution and synchronization can be avoided Use of speculative completion to design the asynchronous datapath components for early results.

Speculative Completion for the Design of High-Performance Asynchronous Dynamic Adders Introduction Basic Concept Bundled datapath Completion detection Adders Basic Binary lookahead carry adder design Architecture of Speculative Completion Speculative Adder Design Basic Dynamic Brent-Kung Adders Conclusion

Worst-case matched delay Basic Concepts Bundled datapath – Completion detection – Implementation in dual-rail, where each bit is mapped to a pair of wires, which encode both the value and validity of the data. Worst-case matched delay req ack Function Block (C/L) Advantages – Easy implementation Low power Limited area

Basic Concepts Adders basic 1-bit Full adder Si=(Ai Bi) Ci Ci+1 = AiBi+(Ai Bi)Ci In terms of generate(g), propagate(p) and absorb(a) signal gi = AiBi pi = Ai Bi ai = AiBi = Ai+Bi Si = pi Ci Ci+1 = gi+piCi

Binary Lookahead Carry Adder

Binary Lookahead Carry Adder Adder computes cumulative P and G values Level-1 computes all 2-bit P and G values, where Pi = pipi-1 and Gi = gi + pigi-1 Level-2 computes all 4-bit P and G values, where Pi=PiPi-2 and Gi = Gi + PiGi-2 and so on. Level-6 computes the ith sum bit Si, where Si = pi Gi-1 1 1 2 1 1 2 1 1 1 5

Speculative Completion for the Design of High-Performance Asynchronous Dynamic Adders Introduction Basic Concept Architecture of Speculative Completion Multiple model delays Abort detection networks Modified result logic Speculative Adder Design Basic Dynamic Brent-Kung Adders Conclusion

Architecture of Speculative Completion Worst-case matched delay 1 req Medium matched delay 1 req done req Short matched delay Abort 2 Abort Logic Abort 1 Abort Logic Function Block (C/L) Block Diagram

Architecture of Speculative Completion Worst-case matched delay 1 req Medium matched delay 1 req done req Short matched delay Abort 2 Abort 1 Multiple model delays:- one for worst-case and the remaining ones for speculative completion. These speculative delays allow different speeds of early completion. For eg:- In a ripple carry adder, an “average-case” delay might be used if adder input results is short carry chains; a “best-case” delay might be used if there is no carry chain.

Architecture of Speculative Completion Worst-case matched delay 1 req Medium matched delay 1 req done req Short matched delay Abort 2 Abort Logic Abort 1 Abort Logic Abort detection network:- It is associated with each speculative delay. The network determines if the corresponding speculative completion must be aborted, due to worst-case data. Abort detection is computed in parallel with datapath computation.

Speculative Completion Modified result logic With speculative completion, early completion is allowed when results can be produced early. Modified result logic is required to take advantage of the early production of required inputs to the result logic. For example:- in adder designs, carry may be produced earlier and hence sum logic needs to be modified.

Speculative Completion for the Design of High-Performance Asynchronous Dynamic Adders Introduction Basic Concept Architecture of Speculative Completion Speculative Adder Design Multiple model delays Abort detection networks Modified result logic Basic Dynamic Brent-Kung Adders Conclusion

Speculative Adder Design Completion network (matched delays) req 1 done req Abort Abort detection network A 32 ADDER SUM B 32 32 Block Diagram

Speculative Adder Design Completion Network – Each inverter is roughly corresponds to the delay of one level in BLC adder. Worst-case delay path has 7 gate delay. Speculative delay path has only 5 gate delays. The finial generate values are available in Level-3. The speculative path is disabled by an abort signal. Completion network (matched delays) req 1 done req Abort signal

Speculative Adder Design Abort Detection Network – Conditions for late completion – late completion can only occur if there exists a run of 8 consecutive Level-0 propagate signals. At the nth level, a generate function of the ith stage is computed as: Detecting late completion Simple detection network

Speculative Adder Design Abort Detection Network – Conditions for late completion Detecting late completion Simple detection network

Speculative Adder Design Abort Detection Network – Conditions for late completion Detecting late completion Simple detection network A simple sum-of-products detection network can be used, where each product contains a short run of Level-0 propagate signals. For eg- 4-literal products: each product contains a run of 4 propagate signals in Level-0. The network contains 5 products. If any of the run occurs, product will be 1. The sum-of-products eq: p4p5p6p7+p9p10p11p12+p14p15p16p17+p19p20p21p22+p24p25p26p27

Speculative Adder Design Abort Detection Network – Conditions for late completion Detecting late completion Simple detection network

Speculative Adder Design Modified Sum Generation –

Speculative Completion for the Design of High-Performance Asynchronous Dynamic Adders Introduction Basic Concept Architecture of Speculative Completion Speculative Adder Design Basic Dynamic Brent-Kung Adders Completion network Abort detection networks Modified sum generation Conclusion References

Basic Dynamic Brent-Kung Adders Basic Dynamic P/G Cell – n n-1 n-1 n n-1 n-1 n-1 Pi = Pi Pj and Gi = Gi + Pi Gj Si = pi Gi-1 N

Basic Dynamic Brent-Kung Adders Completion Network

Basic Dynamic Brent-Kung Adders Abort Detection Network

Basic Dynamic Brent-Kung Adders Modified Sum Generation (a) 2-speed adder, (b) 3-speed adder

Basic Dynamic Brent-Kung Adders Modified Sum Generation

Speculative Completion for the Design of High-Performance Asynchronous Dynamic Adders Introduction Basic Concept Architecture of Speculative Completion Speculative Adder Design Basic Dynamic Brent-Kung Adders Conclusion References

Conclusion With speculative completion, early completion is allowed when results can be produced early. Asynchronous adder is selected because of the potential advantages of asynchronous design. Dynamic Brent and kung adder is better because with dynamic logic all nodes are reset during the precharge phase, so values of internal nodes are known, where as in static CMOS implementation internal nodes are never reset, so their state is general unknown. No late-enable signal is need to be distributed in dynamic logic, where as in static CMOS implementation late enable signals had to be distributed to the different sum modules.

Conclusion Advantages Little area overhead (less than 5%) Performance increase for average-case data (upto 29% increase in 64-bit and 19% increase in 32-bit BK adders for random input data) Disadvantages Probabilistic approach, hence performance gain depends on distribution of input data.

Speculative Completion for the Design of High-Performance Asynchronous Dynamic Adders Introduction Basic Concept Architecture of Speculative Completion Speculative Adder Design Basic Dynamic Brent-Kung Adders Conclusion References

References Design of low-latency asynchronous adder using speculative completion by S.M.Nowick High-performance adders with speculative completion by Ayoob E. Dooply

Questions ?

Dual Rail Monotonic Encoding Def. Glitch: Nonfinal transition Def. Hazard: Potential for glitch Encode every signal, X, with two wires, XH and XL: XH=0, XL=0: data not ready XH=0, XL=1: logic “0” XH=1, XL=0: logic “1” XH=1, XL=1: not used

Static : At every point in time (except during the switching transient), each gate output is connected to either V DD or V SS via a low-resistance path. Slower and more complex than dynamic but "safer". Dynamic : Rely on the temporary storage of signal values on the capacitance of high-impedance circuit nodes. Simplier in design and faster than static but more complicated in operation and are sensitive to noise.

Fan-in The number of standard loads drawn by an input to ensure reliable operation. Most inputs have a fan-in of 1. Fan-out The number of standard loads that can be reliably driven by an output, without causing the output voltage to shift out of its legal range of values.

Benefit: Low Power No clock or PLL to start/stop Faster (instantaneous!) recovery from idling Easier to idle for short periods Clock itself is a high-power node Only draw power when doing work No need to explicitly enable/disable units Automatic fine granularity of power saving

Asynchronous Design Several Potential Advantages: Lower Power no clock ==> components use power only “on demand” Robustness, Scalability no global timing==>“mix-and-match” varied components Higher Performance systems not limited to “worst-case” clock rate

Should we use Asynch? Benefits Drawbacks Early completion, better EM, low power, environmental adaptability No global clock to distribute! Drawbacks Design challenges Testing and tools

Asynchronous circuits are advantageous in: · Low-power applications, by: automatic turn-off for idle parts, if synchronization is done by handshaking, only were needed; adaptive scaling of supply voltage, as performance of speed-independent circuits does not depend on component speeds and scales continuously over a wide range of power supply voltages. · Improved EMI characteristics, including: reduced noise by the absence of clock harmonics; reduced switching activity; accommodation of delays due to electromagnetic noise if communication is done delay-insensitively. If the average signal transition time is T for a voltage swing of V, then an induced electromotive force of V will cause a signal delay of T V/V. · High-speed applications: for circuits with completion detection, the speed of the system is determined by the average-case rather than the worst-case speeds of the components. · Applications in heterogeneous system timing. According to semiconductor industry forecasts such as ITRS (previously known as SIA roadmap), the systems on chip of the near future will require multiple clock domains. As die sizes increase and the distance that can be traveled by a signal over a clock period becomes smaller, the number of time zones on a chip will grow rapidly, approaching 1000 by 2006 and 10000 by 2012.

Introduction Synchronous vs. Asynchronous Systems? Synchronous Systems: use a global clock entire system operates at fixed-rate uses “centralized control” clock

Introduction (cont.) Synchronous vs. Asynchronous Systems? (cont.) Asynchronous Systems: no global clock components can operate at varying rates communicate locally via “handshaking” uses “distributed control” “handshaking interfaces”

Introduction (cont.) Asynchronous Circuits: Synchronous Circuits: long history (since early 1950’s), but... early approaches often impractical: slow, complex Synchronous Circuits: used almost everywhere: highly successful benefits: simplicity, support by existing design tools But recently: renewed interest in asynchronous circuits