Asynchronous Circuits Kent Orthner Wed. March 2nd, 2005 Presentation for: High speed and Low Power VLSI, Dr. Maitham Shams Kent Orthner Wed. March 2nd,

Slides:



Advertisements
Similar presentations
Reading1: An Introduction to Asynchronous Circuit Design Al Davis Steve Nowick University of Utah Columbia University.
Advertisements

ELEC 256 / Saif Zahir UBC / 2000 Timing Methodology Overview Set of rules for interconnecting components and clocks When followed, guarantee proper operation.
1 Logic Design of Asynchronous Circuits Jordi Cortadella Jim Garside Alex Yakovlev Univ. Politècnica de Catalunya, Barcelona, Spain Manchester University,
Copyright © 2001 Stephen A. Edwards All rights reserved Review of Digital Logic Prof. Stephen A. Edwards.
P. Keresztes, L.T. Kóczy, A. Nagy, G.Rózsa: Training Electrical Engineers on Asynchronous Logic Circuits on Constant Weight Codes 1 Training Electrical.
Modern VLSI Design 4e: Chapter 8 Copyright  2008 Wayne Wolf Topics High-level synthesis. Architectures for low power. GALS design.
Minimizing Clock Skew in FPGAs
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
Spartan II Features  Plentiful logic and memory resources –15K to 200K system gates (up to 5,292 logic cells) –Up to 57 Kb block RAM storage  Flexible.
Introduction to CMOS VLSI Design Clock Skew-tolerant circuits.
Synchronous Digital Design Methodology and Guidelines
Clock Design Adopted from David Harris of Harvey Mudd College.
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 1 VLSI Architectures Lecture 3 S&F Ch. 5: Handshake Ckt Implementations.
© Ran GinosarAsynchronous Design and Synchronization 1 VLSI Architectures Lecture 2: Theoretical Aspects (S&F 2.5) Data Flow Structures.
CS 300 – Lecture 3 Intro to Computer Architecture / Assembly Language Sequential Circuits.
ENGIN112 L38: Programmable Logic December 5, 2003 ENGIN 112 Intro to Electrical and Computer Engineering Lecture 38 Programmable Logic.
ELEC 6200, Fall 07, Oct 24 Jiang: Async. Processor 1 Asynchronous Processor Design for ELEC 6200 by Wei Jiang.
COMP Clockless Logic and Silicon Compilers Lecture 3
Lecture 8: Clock Distribution, PLL & DLL
The Spartan 3e FPGA. CS/EE 3710 The Spartan 3e FPGA  What’s inside the chip? How does it implement random logic? What other features can you use?  What.
Jordi Cortadella, Universitat Politècnica de Catalunya, Spain
1 Bridging the gap between asynchronous design and designers Thanks to Jordi Cortadella, Luciano Lavagno, Mike Kishinevsky and many others.
CS294-6 Reconfigurable Computing Day 2 August 27, 1998 FPGA Introduction.
Embedded Systems Hardware: Storage Elements; Finite State Machines; Sequential Logic.
Lecture 7: Power.
Chapter #6: Sequential Logic Design 6.2 Timing Methodologies
Fall 2009 / Winter 2010 Ran Ginosar (
Lecture 11 MOUSETRAP: Ultra-High-Speed Transition-Signaling Asynchronous Pipelines.
156 / MAPLD 2005 Rollins 1 Reducing Energy in FPGA Multipliers Through Glitch Reduction Nathan Rollins and Michael J. Wirthlin Department of Electrical.
Digital System Bus A bus in a digital system is a collection of (usually unbroken) signal lines that carry module-to-module communications. The signals.
Power Reduction for FPGA using Multiple Vdd/Vth
Lecture 2 1 Computer Elements Transistors (computing) –How can they be connected to do something useful? –How do we evaluate how fast a logic block is?
Clockless Chips Date: October 26, Presented by:
MOUSETRAP Ultra-High-Speed Transition-Signaling Asynchronous Pipelines Montek Singh & Steven M. Nowick Department of Computer Science Columbia University,
School of Computer Science G51CSA 1 Computer Systems Architecture Fundamentals Of Digital Logic.
Logic Synthesis for Low Power(CHAPTER 6) 6.1 Introduction 6.2 Power Estimation Techniques 6.3 Power Minimization Techniques 6.4 Summary.
Paper review: High Speed Dynamic Asynchronous Pipeline: Self Precharging Style Name : Chi-Chuan Chuang Date : 2013/03/20.
Optimal digital circuit design Mohammad Sharifkhani.
Introduction to FPGA Created & Presented By Ali Masoudi For Advanced Digital Communication Lab (ADC-Lab) At Isfahan University Of technology (IUT) Department.
Basic Sequential Components CT101 – Computing Systems Organization.
1 CSE370, Lecture 17 Lecture 17 u Logistics n Lab 7 this week n HW6 is due Friday n Office Hours íMine: Friday 10:00-11:00 as usual íSara: Thursday 2:30-3:20.
Reading1: An Introduction to Asynchronous Circuit Design Al Davis Steve Nowick University of Utah Columbia University.
EE3A1 Computer Hardware and Digital Design
12004 MAPLD: 153Brej Early output logic and Anti-Tokens Charlie Brej APT Group Manchester University.
Reader: Pushpinder Kaur Chouhan
EE5970 Computer Engineering Seminar Spring 2012 Michigan Technological University Based on: A Low-Power FPGA Based on Autonomous Fine-Grain Power Gating.
Reading Assignment: Rabaey: Chapter 9
1 Carnegie Mellon University Center for Silicon System Implementation An Architectural Exploration of Via Patterned Gate Arrays Chetan Patel, Anthony Cozzie,
Sp09 CMPEN 411 L18 S.1 CMPEN 411 VLSI Digital Circuits Spring 2009 Lecture 16: Static Sequential Circuits [Adapted from Rabaey’s Digital Integrated Circuits,
FPGA-Based System Design: Chapter 1 Copyright  2004 Prentice Hall PTR Moore’s Law n Gordon Moore: co-founder of Intel. n Predicted that number of transistors.
1 Practical Design and Performance Evaluation of Completion Detection Circuits Fu-Chiung Cheng Department of Computer Science Columbia University.
1 Bridging the gap between asynchronous design and designers Peter A. BeerelFulcrum Microsystems, Calabasas Hills, CA, USA Jordi CortadellaUniversitat.
Project : GasP pipeline in asynchronous circuit Wilson Kwan M.A.Sc. Candidate Ottawa-Carleton Institute for Electrical & Computer Engineering (OCIECE)
Digital Logic Design Basics Combinational Circuits Sequential Circuits Pu-Jen Cheng Adapted from the slides prepared by S. Dandamudi for the book, Fundamentals.
ECEN 248: INTRODUCTION TO DIGITAL SYSTEMS DESIGN Dr. Shi Dept. of Electrical and Computer Engineering.
Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California,
FPGA-Based System Design: Chapter 3 Copyright  2004 Prentice Hall PTR Topics n FPGA fabric architecture concepts.
Clockless Chips Under the esteemed guidance of Romy Sinha Lecturer, REC Bhalki Presented by: Lokesh S. Woldoddy 3RB05CS122 Date:11 April 2009.
Digital Integrated Circuits A Design Perspective
Asynchronous Primitives in CML
Other Approaches.
Basics Combinational Circuits Sequential Circuits Ahmad Jawdat
CPE/EE 422/522 Advanced Logic Design L02
Fundamentals of Computer Science Part i2
We will be studying the architecture of XC3000.
The Xilinx Virtex Series FPGA
332:578 Deep Submicron VLSI Design Lecture 14 Design for Clock Skew
CSE 370 – Winter Sequential Logic-2 - 1
Clockless Logic: Asynchronous Pipelines
Lecture 19 Logistics Last lecture Today
Presentation transcript:

Asynchronous Circuits Kent Orthner Wed. March 2nd, 2005 Presentation for: High speed and Low Power VLSI, Dr. Maitham Shams Kent Orthner Wed. March 2nd, 2005 Presentation for: High speed and Low Power VLSI, Dr. Maitham Shams

for High speed and Low Power VLSI, Carleton UniversityPage 2Kent Orthner, March 2 nd 2005 Agenda What are Asynchronous Circuits? Advantages & Disadvantages Example Asynchronous Circuit GasP FPGAs Design Project

for High speed and Low Power VLSI, Carleton UniversityPage 3Kent Orthner, March 2 nd 2005 What are Asynchronous Circuits? Synchronous Circuits  Everything synchronized to a global clock Clock edges determine the time instants where data is sampled Register inputs are sampled at the clock rising edge Data wires may glitch between clock edges  “Worst case” operation: The clock frequency is limited by the speed of the slowest stage. The clock frequency must be slow enough that the circuit will work with worst case PVT, and worst case data. Clock 9 ns 10 ns 4 ns6 ns

for High speed and Low Power VLSI, Carleton UniversityPage 4Kent Orthner, March 2 nd 2005 What are Asynchronous Circuits? Asynchronous Circuits  Eliminate the global Clock signal  States defined in terms of input values and internal actions  Synchronize data transfer by other means Handshaking, flow control  “Average-case” performance: each block goes as fast as it goes. Each block goes as fast as it goes. 9 ns 10 ns5 ns7 ns 4 ns6 ns Ack Req

for High speed and Low Power VLSI, Carleton UniversityPage 5Kent Orthner, March 2 nd 2005 Micropipelines Each data channel associated with two abstract control signals  Rdy – indicates when the upstream stage has data.  Ack – indicates when the downstream stage is finished with the previous data. Data moves through a stage when the upstream stage has data available, and the downstream stage is ready for new data. If no logic processing is being performed, the circuit acts as an elastic FIFO. C CC C R in A in R1R1 A1A1 R3R3 A3A3 A2A2 R2R2 A out R out D in D out

for High speed and Low Power VLSI, Carleton UniversityPage 6Kent Orthner, March 2 nd 2005 Advantages Performance  Average-case instead of worst case Low Power  Clock accounts for 30 – 50% of chip dynamic power  Automatic clock gating in asynchronous Escape from Metastability  No concern about clock crossing: circuits are metastable-safe by design Easier Circuit Synthesis  No clock distribution, no clock skews, no clock buffering tree analysis  No timing-driven placement necessary Technology Scaling Potential  No circuit retiming/re-pipelining necessary  Technology-independent, in some ways  Automatic adaptation to physical properties, PVT Lower EMI  Activity in synchronous circuits produce predictable EMI patterns Ease of composition  Easier to interface heterogeneous IP cores  No timing assumptions necessary

for High speed and Low Power VLSI, Carleton UniversityPage 7Kent Orthner, March 2 nd 2005 Disadvantages Vulnerable to circuit hazards & glitches Circuits are larger  more area for control & handshaking logic, encoding scheme, hazard avoidance More difficult & less mature than synchronous designs  Benefits not explored on large-scale VLSI  Synchronous designs are well understood : it’s easier to think sequentially than concurrently provide a simple way to deal with noise and hazards are tolerant to glitches CAD Tools  Synchronous tools are quite mature  No such established asynchronous tools

for High speed and Low Power VLSI, Carleton UniversityPage 8Kent Orthner, March 2 nd 2005 Example Asynchronous Circuit TOKYO, Japan, February 9, 2005: Epson Develops the World's First Flexible 8-Bit Asynchronous Microprocessor  Seiko Epson Corp. ("Epson") has announced that it has developed the world's first*1 flexible 8-bit asynchronous microprocessor using low-temperature polysilicon thin-film transistors (LTPS-TFTs) on a plastic substrate  With energy consumption reduced by 70% compared to the synchronous microprocessors now in everyday use, Epson is now researching potential applications for its invention.  Using asynchronous circuit design technology, Epson has been able to: 1.Make a stable 8-bit microprocessor composed of 32,000 LTPS-TFTs, 2.Achieve energy consumption 70% lower than the synchronous design, 3.Reduce electromagnetic radiation by 20dB.

for High speed and Low Power VLSI, Carleton UniversityPage 9Kent Orthner, March 2 nd 2005 GasP A family of asynchronous circuits that provide controls for:  simple pipelines  branching and joining,  Scatter & gather  Join on demand with arbitration Excess of 1.5 G data items / second in 0.35 um A single wire is used to carry both Ack & req messages, indicating that each is empty or full. Rely on careful choice of transistor widths to equalize delay in logic gates.

for High speed and Low Power VLSI, Carleton UniversityPage 10Kent Orthner, March 2 nd 2005 GasP Circuit 1.If the upstream state conductor is full (low), and the downstream state conductor is empty (high), b and x both conduct, driving the voltage at (1) low. 2.This causes transistor p to turn on, making the data latch momentarily transparent.

for High speed and Low Power VLSI, Carleton UniversityPage 11Kent Orthner, March 2 nd 2005 GasP Circuit 3.The low voltage at (2) causes transistor d to turn on, driving the downstream state conductor to low (full). 4.This also causes transistor y to turn on, driving the upstream state conductor to high (empty) 5.Transistor t turns on, resetting the top of the nand gate to a high value, causing pass transistor p to turn off.

for High speed and Low Power VLSI, Carleton UniversityPage 12Kent Orthner, March 2 nd 2005 GasP Circuit The propagation of data in the forward direction through the circuit is four gate delays per stage: a  b  c  d  The transistors for Logic functions must be sized such that the logic functions take no more than four gate delays. The propagation of holes in the reverse direction is two gate delays per stage: x  y

for High speed and Low Power VLSI, Carleton UniversityPage 13Kent Orthner, March 2 nd 2005 FPGAs Commonly built of 4-input look-up tables (LUTs)  Effectively a small RAM block with 1 data bit, and 16 memory locations.  Any logic function with up to 4 inputs can be made from a 4 input LUT. Combinations of LUTs are used to create larger logic functions.  RAM is programmed at configuration time, or during operation.  A register for each logic element Connected with a ‘sea of programmable interconnect’  SRAM used to configured at start-up time

for High speed and Low Power VLSI, Carleton UniversityPage 14Kent Orthner, March 2 nd 2005 FPGAs Almost exclusively synchronous  Frequency is limited by the worst case path from a register, through one or more lookup tables, through the routing matrix, and into the next register.  The delay through a LUT is constant (and worst case!) A 2-input XOR function takes as much time as a complex 4-input function.  The path from a register to the next register is very granular If the logic function is 5 inputs, then then the propagation delay is almost doubled over the 4-input case. High power  Clock distribution network goes everywhere.  Power consumed to drive logic elements that aren’t used for a given design

for High speed and Low Power VLSI, Carleton UniversityPage 15Kent Orthner, March 2 nd 2005 Design Project 16:1 pipeline multiplexer in four stages, using GasP pipeline.  Essentially a 4-input LUT Compare with equivalent synchronous design with the same gate sizes  Performance, Power & Energy per cycle, Circuit Size SPICE Simulations, with 0.13um technology  using TSMC models from MOSIS Example: Out  ABCD 0 In 0 0 In 1 0 In 2 0 In 3 0 In 4 0 In 5 0 In 6 0 In 7 0 In 8 0 In 9 0 In 10 0 In 11 0 In 12 0 In 13 0 In 14 1 In 15 Out Sel [ABCD] D-Sel 0 C-Sel 1 B-Sel 2 A-Sel 3 Delay

for High speed and Low Power VLSI, Carleton UniversityPage 16Kent Orthner, March 2 nd 2005 Design Project Motivation  The pipeline is shortened when some inputs are not used, leading to reduced propagation delay.  If GasP latches are at each stage within the LUT, the flip-flop after each LUT is not required  The effective operating frequency is not due to the propagation between GasP stages, not LUTs. Performance can be further increased by incorporating GasP FIFO stages into the routing network. Example: Z  AB 0 In 0 0 In 1 0 In 2 0 In 3 0 In 4 0 In 5 0 In 6 0 In 7 0 In 8 0 In 9 0 In 10 0 In 11 1 In 12 1 In 13 1 In 14 1 In 15 Out Delay Sel [ABCD] D-Sel 0 C-Sel 1 B-Sel 2 A-Sel

for High speed and Low Power VLSI, Carleton UniversityPage 17Kent Orthner, March 2 nd 2005 Tentative Schedule MilestoneDate Background ResearchFebruary Design & Implementation of GasP & Synchronous Circuits Early / Mid March Testing & Result CollectionLate March Class PresentationEarly April Prepare ReportApril

for High speed and Low Power VLSI, Carleton UniversityPage 18Kent Orthner, March 2 nd 2005 References [1] Sutherland, Ivan, and Fairbanks, Scott, “GasP: A minimal FIFO Control”, Synchronous Circuits and Systems, ASYNC Seventh International Symposium on, March 2001 [2] Shams, Maitham, Ebergen, Jo, and Elmasry, Mohammed I. “Asynchronous Circuits”, [3] Ebergen, J, “Squaring the FIFO in GasP”, Asynchronous Circuits and Systems, ASYNC Seventh International Symposium on, March 2001 [1] I. Sutherland, “Micropipelines”, Communications of the ACM, June 1989 [4] Girish Venkataramani, “Asynchronous Logic Design: What, Why and How?” National University of Singapore, Sept, 2004 [5] Myers, Chris J, “Asynchronous Circuit Design”, University of Utah lecture notes [6] A. Davis, S. Nowick, “An Introduction to Asynchronous Circuit Design”, University of Utah, Columbia University. [7] Asynchronous Logic Homepage [8] [9] S.Brown, J. Rose, “Architecture of FPGAs and CPLDs: A Tutorial”, Department of Electrical and Computer Engineering, University of Toronto, 1994

Asynchronous Circuits Kent Orthner Wed. March 2nd, 2005 Presentation for: High speed and Low Power VLSI, Dr. Maitham Shams Kent Orthner Wed. March 2nd, 2005 Presentation for: High speed and Low Power VLSI, Dr. Maitham Shams

for High speed and Low Power VLSI, Carleton UniversityPage 20Kent Orthner, March 2 nd 2005 Classification: Timing Delay-Insensitive (DI)  Designed to operate correctly regardless of the delays on gates & wires “Unbounded” gate & delay model assumed.  The class of simple DI operations built out of basic gates is almost empty Practical DI circuits can be build with complex compnents that use timing assumptions within the component. Example: C-Element Quasi-Delay Insensitive (QDI)  Same as DI, but with Isochronic fork delay assumption An isochronic fork is a forked wire where all branches have the same or a bounded delay  Weakest compromise to true DI circuits needed to build practival circuits. Speed-Independent (SI)  Unbounded delays for gates and “negligible” (optimistic) delays for wires. Self-timed  The circuit contains a number of elements, where each element may be SI internally.  Communication between regions is assumed to be Delay Insensitive.

for High speed and Low Power VLSI, Carleton UniversityPage 21Kent Orthner, March 2 nd 2005 Classification: Signaling Control Signaling  Request/Acknowledge (Self-Timed) is popular  Four phase / Return to Zero / Level signalling Req /  Ack /  Req \  Ack \ : 1 cycle.  Two phase / Non-RTZ / Transition Signalling Req /  Ack / : 1 Cycle. Req \  Ack \ : 1 cycle. Data Signaling  Bundled Data Normal wires, one wire per bit. Use control signals to indicate when data is valid.  Dual-rail data 2 wires per bit, encoding implies data validity 00=no data, 01=0, 10=1, 11=invalid Simple acknowledge control wire