1 Clockless Logic Montek Singh Thu, Jan 13, 2004.

Slides:



Advertisements
Similar presentations
Self-Timed Logic Timing complexity growing in digital design -Wiring delays can dominate timing analysis (increasing interdependence between logical and.
Advertisements

INPUT-OUTPUT ORGANIZATION
Finite State Machines (FSMs)
Andrey Mokhov, Victor Khomenko Danil Sokolov, Alex Yakovlev Dual-Rail Control Logic for Enhanced Circuit Robustness.
Reading1: An Introduction to Asynchronous Circuit Design Al Davis Steve Nowick University of Utah Columbia University.
Fault-Tolerant Delay-Insensitive Inter-Chip Communication Yebin Shi Apt Group The University of Manchester.
Clockless Logic System-Level Specification and Synthesis Ack: Tiberiu Chelcea.
Self-Timed Systems Timing complexity growing in digital design -Wiring delays can dominate timing analysis (increasing interdependence between logical.
Introduction to CMOS VLSI Design Sequential Circuits.
MICROELETTRONICA Sequential circuits Lection 7.
ELEC 256 / Saif Zahir UBC / 2000 Timing Methodology Overview Set of rules for interconnecting components and clocks When followed, guarantee proper operation.
1 Clockless Logic  Recap: Lookahead Pipelines  High-Capacity Pipelines.
Modern VLSI Design 4e: Chapter 8 Copyright  2008 Wayne Wolf Topics High-level synthesis. Architectures for low power. GALS design.
Jordi Cortadella, Universitat Politecnica de Catalunya, Barcelona Mike Kishinevsky, Intel Corp., Strategic CAD Labs, Hillsboro.
Z. Feng MTU EE4800 CMOS Digital IC Design & Analysis EE4800 CMOS Digital IC Design & Analysis Lecture 11 Sequential Circuit Design Zhuo Feng.
Clock Design Adopted from David Harris of Harvey Mudd College.
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 1 VLSI Architectures Lecture 3 S&F Ch. 5: Handshake Ckt Implementations.
Digital Integrated Circuits© Prentice Hall 1995 Timing ISSUES IN TIMING.
1 A Modular Synchronizing FIFO for NoCs Vainbaum Yuri.
1 Clockless Logic Montek Singh Tue, Mar 23, 2004.
1 Clockless Logic Montek Singh Tue, Mar 16, 2004.
ELEC 6200, Fall 07, Oct 24 Jiang: Async. Processor 1 Asynchronous Processor Design for ELEC 6200 by Wei Jiang.
Low Power Design for Wireless Sensor Networks Aki Happonen.
COMP Clockless Logic and Silicon Compilers Lecture 3
Jordi Cortadella, Universitat Politècnica de Catalunya, Spain
1 Bridging the gap between asynchronous design and designers Thanks to Jordi Cortadella, Luciano Lavagno, Mike Kishinevsky and many others.
1 Clockless Logic Montek Singh Tue, Mar 21, 2006.
High-Throughput Asynchronous Pipelines for Fine-Grain Dynamic Datapaths Montek Singh and Steven Nowick Columbia University New York, USA
Network Data Organizational Communications and Technologies Prithvi N. Rao Carnegie Mellon University Web:
1 Clockless Computing Montek Singh Thu, Sep 13, 2007.
Fall 2009 / Winter 2010 Ran Ginosar (
Lecture 11 MOUSETRAP: Ultra-High-Speed Transition-Signaling Asynchronous Pipelines.
1 Recap: Lectures 5 & 6 Classic Pipeline Styles 1. Williams and Horowitz’s PS0 pipeline 2. Sutherland’s micropipelines.
1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines.
INPUT-OUTPUT ORGANIZATION
9/15/09 - L15 Decoders, Multiplexers Copyright Joanne DeGroat, ECE, OSU1 Decoders and Multiplexers.
Clockless Chips Date: October 26, Presented by:
Asynchronous Datapath Design Adders Comparators Multipliers Registers Completion Detection Bus Pipeline …..
Amitava Mitra Intel Corp., Bangalore, India William F. McLaughlin
MOUSETRAP Ultra-High-Speed Transition-Signaling Asynchronous Pipelines Montek Singh & Steven M. Nowick Department of Computer Science Columbia University,
ICCD Conversion Driven Design of Binary to Mixed Radix Circuits Ashur Rafiev, Julian Murphy, Danil Sokolov, Alex Yakovlev School of EECE, Newcastle.
Paper review: High Speed Dynamic Asynchronous Pipeline: Self Precharging Style Name : Chi-Chuan Chuang Date : 2013/03/20.
Ratioed Circuits Ratioed circuits use weak pull-up and stronger pull-down networks. The input capacitance is reduced and hence logical effort. Correct.
Copyright © Silistix, all rights reserved Glitch Sensitivity and Defense of QDI NoC Links Sean Salisbury 18 May 2009.
1 COMP541 Combinational Logic - 4 Montek Singh Jan 30, 2012.
Types of Service. Types of service (1) A network architecture may have multiple protocols at the same layer in order to provide different types of service.
1 Clockless Computing Montek Singh Thu, Sep 6, 2007  Review: Logic Gate Families  A classic asynchronous pipeline by Williams.
Reading1: An Introduction to Asynchronous Circuit Design Al Davis Steve Nowick University of Utah Columbia University.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 26: October 31, 2014 Synchronous Circuits.
The Principle of Electronic Data Serial and Parallel Data Communication Transmission Rate Bandwidth Bit Rate Parity bits.
UNIVERSITY OF ROSTOCK Institute of Applied Microelectronics and Computer Science Single-Rail Self-timed Logic Circuits in Synchronous Designs Frank Grassert,
12004 MAPLD: 153Brej Early output logic and Anti-Tokens Charlie Brej APT Group Manchester University.
Reader: Pushpinder Kaur Chouhan
EE5970 Computer Engineering Seminar Spring 2012 Michigan Technological University Based on: A Low-Power FPGA Based on Autonomous Fine-Grain Power Gating.
1 Clockless Logic or How do I make hardware fast, power- efficient, less noisy, and easy-to-design? Montek Singh Tue, Jan 14, 2003.
1 Practical Design and Performance Evaluation of Completion Detection Circuits Fu-Chiung Cheng Department of Computer Science Columbia University.
1 Bridging the gap between asynchronous design and designers Peter A. BeerelFulcrum Microsystems, Calabasas Hills, CA, USA Jordi CortadellaUniversitat.
1 Recap: Lecture 4 Logic Implementation Styles:  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates, or “pass-transistor” logic.
Clockless Chips Under the esteemed guidance of Romy Sinha Lecturer, REC Bhalki Presented by: Lokesh S. Woldoddy 3RB05CS122 Date:11 April 2009.
1 Clockless Logic Montek Singh Thu, Mar 2, Review: Logic Gate Families  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates,
Interconnection Structures
Welcome To Seminar Presentation Seminar Report On Clockless Chips
Other Approaches.
Roadmap History Synchronized vs. Asynchronous overview How it works
Recap: Lecture 1 What is asynchronous design? Why do we want to study it? What is pipelining? How can it be used to design really fast hardware?
1 Input-Output Organization Computer Organization Computer Architectures Lab Peripheral Devices Input-Output Interface Asynchronous Data Transfer Modes.
Emerging Technologies of Computation
Clockless Logic: Asynchronous Pipelines
Clockless Computing Lecture 3
William Stallings Computer Organization and Architecture
Presentation transcript:

1 Clockless Logic Montek Singh Thu, Jan 13, 2004

2 Preliminaries  How is data represented in an asynchronous system?  How is information exchanged?: control signaling (handshake styles)

3 Data Encoding: “Bundled Data” Single-rail “Bundled Datapath”: simplest approach widely used widely usedFeatures: datapath: 1 wire per bit (e.g. standard sync blocks) datapath: 1 wire per bit (e.g. standard sync blocks) matched delay: produces delayed “done” signal matched delay: produces delayed “done” signal  worst-case delay: longer than slowest path +Practical style: can reuse sync components ; small area –Fixed (worst-case) completion time done indicates valid data valid data bit 1 request bit n bit 1 bit m done matcheddelay function block

4 Bundled Data: Completion Sensing Delay Matching: either single worst-case delay either single worst-case delay or, fine-grain delay or, fine-grain delay requestdone bank of delays MUX delay selector Speculative completion: choose delay “on the fly” choose delay “on the fly” start with shortest delay; increase as needed start with shortest delay; increase as needed

5 +provides robust data-dependent completion –needs completion detectors Data Encoding: Dual-Rail Dual-rail: uses 2 wires per data bit bit n bit 1 bit m bit 1 Each Dual-Rail Pair: provides both data value and validity

6 Dual-Rail: Completion Sensing Dual-Rail Completion Detector: combines dual-rail signals combines dual-rail signals indicates when all bits are valid (or reset) indicates when all bits are valid (or reset) C Done OR bit 0 OR bit 1 OR bit n  OR together 2 rails per bit  Merge results using a Müller “C-element” C-element: if all inputs=1, output  1 if all inputs=1, output  1 if all inputs=0, output  0 if all inputs=0, output  0 else, maintain output value else, maintain output valueC-element: if all inputs=1, output  1 if all inputs=1, output  1 if all inputs=0, output  0 if all inputs=0, output  0 else, maintain output value else, maintain output value

7 4-Phase: requires 4 events per handshake Handshaking Styles: 4-phase Request Acknowledge start event done get ready for next event ready for next event +“Level-sensitive”  simpler logic implementation –Overhead of “return-to-zero” (RTZ or resetting) extra events which do no useful computation extra events which do no useful computation

8 +Elegant: no return-to-zero –Slower logic implementation: logic primitives are inherently level-sensitive, not event-based (at least in CMOS) logic primitives are inherently level-sensitive, not event-based (at least in CMOS) Handshaking Styles: 2-phase 2-Phase: requires 2 events per handshake a.k.a. transition signaling a.k.a. transition signaling Request Acknowledge start event done start next event next event done

9 +No return-to-zero (like 2-phase) +Level-based implementation (like 4-phase) –Need a timing constraint on pulse with Handshaking Styles: Pulse Mode Pulse Mode: combines benefits of 2-phase and 4-phase use pulses to represent events use pulses to represent events Request Acknowledge start event done start next event next event done

10 +Efficient protocol: no return-to-zero, level-based –Need aggressive low-level design techniques  much effort to ensure reliability Handshaking Styles: Single-Track Single-Track: combines req and ack onto single wire! one wire used for bidirectional communication one wire used for bidirectional communication  sender raises, receiver lowers req + ack Request Acknowledgereqreqackack

11 Handshaking + Data Representation Several combinations possible: dual-rail 4-phase, single-rail 4-phase, dual-rail 2-phase, and single- rail 2-phase dual-rail 4-phase, single-rail 4-phase, dual-rail 2-phase, and single- rail 2-phase Example: dual-rail 4-phase dual-rail data: functions as an implicit “request” dual-rail data: functions as an implicit “request” 4-phase cycle: between acknowledge and implicit request 4-phase cycle: between acknowledge and implicit request bit m bit 1 ack A B

12 Other Data Representation Styles  Level-Encoded Dual-Rail (LEDR) 2 wires per bit: “data” and “phase” 2 wires per bit: “data” and “phase” exactly one wire per bit changes value exactly one wire per bit changes value  if new value is different, “data” wire changes value  else “phase” wire change value  M-of-N Codes N wires used for a data word N wires used for a data word M wires (M <= N) change value M wires (M <= N) change value Values of N and M: have impact on… Values of N and M: have impact on…  information transmitted, power consumed and logic complexity  Knuth codes, Huffman codes, … data phase

13 Which to use? Depends on several performance parameters: speed speed  single-rail vs. dual-rail –single-rail may be faster (if designed aggressively) –dual-rail may be faster (if completion times vary widely)  2-phase vs. 4-phase –2-phase may be faster (if logic overhead is small) –4-phase may be faster (if overhead of return-to-zero is small) power consumption power consumption  2-phase typically has fewer gate transitions (  lower power) amount of logic used (#gates/wires/pins  chip area) amount of logic used (#gates/wires/pins  chip area)  single-rail needs fewer gates/wires/pins design and verification effort design and verification effort  dual-rail, 1-of-N, M-of-N, Knuth codes…: –delay-insensitive: robust in the presence of arbitrary delays  single-rail: requires greater timing verification effort