MOUSETRAP Ultra-High-Speed Transition-Signaling Asynchronous Pipelines Montek Singh & Steven M. Nowick Department of Computer Science Columbia University,

Slides:



Advertisements
Similar presentations
Andrey Mokhov, Victor Khomenko Danil Sokolov, Alex Yakovlev Dual-Rail Control Logic for Enhanced Circuit Robustness.
Advertisements

Reading1: An Introduction to Asynchronous Circuit Design Al Davis Steve Nowick University of Utah Columbia University.
Digital Integrated Circuits A Design Perspective
Issues in System on the Chip Clocking November 6th, 2003 SoC Design Conference, Seoul, KOREA Vojin G. Oklobdzija Advanced Computer System Engineering Laboratory.
Introduction to CMOS VLSI Design Sequential Circuits.
Introduction to CMOS VLSI Design Sequential Circuits
Lecture 11: Sequential Circuit Design. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 11: Sequential Circuits2 Outline  Sequencing  Sequencing Element Design.
Slide 1/20IWLS 2003, May 30Early Output Logic with Anti-Tokens Charlie Brej, Jim Garside APT Group Manchester University.
Presenter : Ching-Hua Huang 2012/4/16 A Low-latency GALS Interface Implementation Yuan-Teng Chang; Wei-Che Chen; Hung-Yue Tsai; Wei-Min Cheng; Chang-Jiu.
Decoupled Pipelines: Rationale, Analysis, and Evaluation Frederick A. Koopmans, Sanjay J. Patel Department of Computer Engineering University of Illinois.
1 Clockless Logic  Recap: Lookahead Pipelines  High-Capacity Pipelines.
Asynchronous comparator design
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis EE4800 CMOS Digital IC Design & Analysis Lecture 11 Sequential Circuit Design Zhuo Feng.
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
Introduction to CMOS VLSI Design Clock Skew-tolerant circuits.
Clock Design Adopted from David Harris of Harvey Mudd College.
A 16-Bit Kogge Stone PS-CMOS adder with Signal Completion Seng-Oon Toh, Daniel Huang, Jan Rabaey May 9, 2005 EE241 Final Project.
Digital Integrated Circuits A Design Perspective
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 1 VLSI Architectures Lecture 3 S&F Ch. 5: Handshake Ckt Implementations.
Digital Integrated Circuits© Prentice Hall 1995 Timing ISSUES IN TIMING.
1 Clockless Logic Montek Singh Thu, Jan 13, 2004.
1 Clockless Logic Montek Singh Tue, Mar 23, 2004.
Advances in Designing Clockless Digital Systems Prof. Steven M. Nowick Department of Computer Science Columbia University New York,
1 Clockless Logic Montek Singh Tue, Mar 16, 2004.
COMP Clockless Logic and Silicon Compilers Lecture 3
Jordi Cortadella, Universitat Politècnica de Catalunya, Spain
1 Clockless Logic Montek Singh Tue, Mar 21, 2006.
High-Throughput Asynchronous Pipelines for Fine-Grain Dynamic Datapaths Montek Singh and Steven Nowick Columbia University New York, USA
Introduction to CMOS VLSI Design Lecture 10: Sequential Circuits Credits: David Harris Harvey Mudd College (Material taken/adapted from Harris’ lecture.
1 Clockless Computing Montek Singh Thu, Sep 13, 2007.
Fall 2009 / Winter 2010 Ran Ginosar (
Lecture 11 MOUSETRAP: Ultra-High-Speed Transition-Signaling Asynchronous Pipelines.
1 Recap: Lectures 5 & 6 Classic Pipeline Styles 1. Williams and Horowitz’s PS0 pipeline 2. Sutherland’s micropipelines.
1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines.
Asynchronous Circuits Kent Orthner Wed. March 2nd, 2005 Presentation for: High speed and Low Power VLSI, Dr. Maitham Shams Kent Orthner Wed. March 2nd,
Design of Robust, Energy-Efficient Full Adders for Deep-Submicrometer Design Using Hybrid-CMOS Logic Style Sumeer Goel, Ashok Kumar, and Magdy A. Bayoumi.
Comparison of Two RCA Implementations Abstract Two implementations of RCA (Ripple Carry Adder) static circuit are introduced—CMOS and TG logic circuit.
Team MUX Adam BurtonMark Colombo David MooreDaniel Toler.
Clockless Chips Date: October 26, Presented by:
Asynchronous Datapath Design Adders Comparators Multipliers Registers Completion Detection Bus Pipeline …..
Amitava Mitra Intel Corp., Bangalore, India William F. McLaughlin
Logic Synthesis for Low Power(CHAPTER 6) 6.1 Introduction 6.2 Power Estimation Techniques 6.3 Power Minimization Techniques 6.4 Summary.
Paper review: High Speed Dynamic Asynchronous Pipeline: Self Precharging Style Name : Chi-Chuan Chuang Date : 2013/03/20.
Ratioed Circuits Ratioed circuits use weak pull-up and stronger pull-down networks. The input capacitance is reduced and hence logical effort. Correct.
Low Power – High Speed MCML Circuits (II)
Radix-2 2 Based Low Power Reconfigurable FFT Processor Presented by Cheng-Chien Wu, Master Student of CSIE,CCU 1 Author: Gin-Der Wu and Yi-Ming Liu Department.
1 Clockless Computing Montek Singh Thu, Sep 6, 2007  Review: Logic Gate Families  A classic asynchronous pipeline by Williams.
Lecture 10: Circuit Families. CMOS VLSI DesignCMOS VLSI Design 4th Ed. 10: Circuit Families2 Outline  Pseudo-nMOS Logic  Dynamic Logic  Pass Transistor.
Reading1: An Introduction to Asynchronous Circuit Design Al Davis Steve Nowick University of Utah Columbia University.
UNIVERSITY OF ROSTOCK Institute of Applied Microelectronics and Computer Science Single-Rail Self-timed Logic Circuits in Synchronous Designs Frank Grassert,
12004 MAPLD: 153Brej Early output logic and Anti-Tokens Charlie Brej APT Group Manchester University.
Reader: Pushpinder Kaur Chouhan
Spring 2006EE VLSI Design II - © Kia Bazargan 332 EE 5324 – VLSI Design II Kia Bazargan University of Minnesota Part VIII: Timing Issues.
Reading Assignment: Rabaey: Chapter 9
Dynamic Logic Dynamic Circuits will be introduced and their performance in terms of power, area, delay, energy and AT2 will be reviewed. We will review.
1 Practical Design and Performance Evaluation of Completion Detection Circuits Fu-Chiung Cheng Department of Computer Science Columbia University.
1 Bridging the gap between asynchronous design and designers Peter A. BeerelFulcrum Microsystems, Calabasas Hills, CA, USA Jordi CortadellaUniversitat.
Project : GasP pipeline in asynchronous circuit Wilson Kwan M.A.Sc. Candidate Ottawa-Carleton Institute for Electrical & Computer Engineering (OCIECE)
1 Recap: Lecture 4 Logic Implementation Styles:  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates, or “pass-transistor” logic.
EE141 Timing Issues 1 Chapter 10 Timing Issues Rev /11/2003 Rev /28/2003 Rev /05/2003.
Clockless Chips Under the esteemed guidance of Romy Sinha Lecturer, REC Bhalki Presented by: Lokesh S. Woldoddy 3RB05CS122 Date:11 April 2009.
1 Clockless Logic Montek Singh Thu, Mar 2, Review: Logic Gate Families  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates,
Lecture 11: Sequential Circuit Design
Welcome To Seminar Presentation Seminar Report On Clockless Chips
Other Approaches.
332:578 Deep Submicron VLSI Design Lecture 14 Design for Clock Skew
Clockless Logic: Asynchronous Pipelines
Wagging Logic: Moore's Law will eventually fix it
A Quasi-Delay-Insensitive Method to Overcome Transistor Variation
Clockless Computing Lecture 3
Presentation transcript:

MOUSETRAP Ultra-High-Speed Transition-Signaling Asynchronous Pipelines Montek Singh & Steven M. Nowick Department of Computer Science Columbia University, New York, NY IEEE

Agenda Review Introduction MOUSETRAP Preliminary Experiment Results Conclusions

Review Synchronous pipeline Wave pipeline Clock-delayed domino Skew-tolerant domino Self-resetting circuits Asynchronous pipeline Micropipeline GasP IPCMOS

Asynchronous circuit ’ s benefits No clock skew problem Low power consumption Faster speed (average case) Reduce global timing issues Avoid variations in fabrication,temperature, … etc. Low EMI & Noise ………

Low Power Consumption On high-performance chips Clock power consumption is a significant proportion of total power consumption. Gated clocks reduce the wastage Make clock skew worse Incur some power cost All parts of the clocked circuits run the same frequency

Performance Synchronous design must be toleranced for worst case conditions Fabrication, temperature, voltage, data values, Clock skew Asynchronous circuits self-adjust to the operating and data conditions

Agenda Review Introduction MOUSETRAP Preliminary Experiment Results Conclusions

Introduction Asynchronous Design Styles Protocol: Level signaling (four phase) Transition signaling (two phase) Logic: Bundled-data (ex: signal-rail) Self-timed (ex: dual-rail)

Level signaling ( four phase ) A send data to B (active) Step 1:A  put data in bus, set req =1 Step 2:B  get data from bus, set ack =1 (return-to-zero phase) Step 3:A  set req =0 Step 4:B  set ack =0

Transition signaling ( two phase ) A send data to B (active) Step 1:A  put data in bus, set req =1 Step 2:B  get data from bus, set ack =1 Step 3:A  put data in bus, set req =0 Step 4:B  get data from bus, set ack =0

Introduction Asynchronous Design Styles Protocol: Level signaling (four phase) Transition signaling (two phase) Logic: Bundled-data (ex: signal-rail) Self-timed (ex: dual-rail)

C-element Z next =AB+Z(A+B) When A=1,B=1  Z next =1 When A=0,B=0  Z next =0

Micropipeline 4-phase latch FIFO req ack

Bundled-data

Self-timed Generate Completion-Detection signal Delay-Insensitive (DI) Coding ex:dual-rail coding (two phase coding) 00 -> invalid value 01 -> > > no use

Self-timed (dual-rail coding)

Performance Comparison of Asynchronous Adders Mark A. Franklin & Tienyo Pan

Agenda Review Introduction MOUSETRAP Preliminary Experiment Results Conclusions

Mousetrap Minimal-Overhead Ultrahigh-SpEed Transition-signaling Asynchronous Pipeline

MOUSETRAP-FIFO Latch delay is 110 ps XNOR delay is 65 ps

MOUSETRAP with logic (bundled data)

Bundled data Bundled data scheme: Req n must arrive at stage N after the data inputs to that stage have stabilized. Worst-case delay Allow circuits to have hazards

Delay Buffer Inverter chain A chain of transmission gates Duplicate the worst-case critical path More accurate delay More area-expensive

Timing-forward latency

Timing-Cycle time

Standard synchronous pipeline Forward latency Cycle time

MOUSETRAP-Setup time

MOUSETRAP-Hold time

Clocked-CMOS (C 2 MOS) logic

C 2 MOS ’ s benefits Smaller delay Smaller area Lower power consumption

MOUSETRAP- C 2 MOS Forward latency Cycle time

Handling wide datapaths Datapath partitioning Control kiting (buffer insertion)

Optimization Sliding door Change MOS ’ s width (lower )

Non-Linear Pipeline-fork

Non-Linear Pipeline-join

experiment 0.25μm TSMC 2.5v, 300k A pass-gate implementation of an XNOR/XOR A standard 6 transistor pass-gate dynamic D-latch 0.6μm HP 3.3v,300K A pass-gate implementation of an XNOR/XOR Clocked-CMOS style latch 10 stage, 16-bit datapath pre-layout simulation (HSPICE)

result

Conclusions Use small & fast latches Low Latch controller overhead(XNOR) Transition-signaling protocol (efficient & concurrent) Without complex timing & design effort Variable-speed environment(elasticity)

comparison IPCMOS (asynchronous interlocked pipelined CMOS) 3.3~4.5GHz IBM 0.18μm Post-layout simulation