Asynchronous Circuits Jordi Cortadella Universitat Politècnica de Catalunya, Barcelona Collège de France May 14 th, 2013.

Slides:



Advertisements
Similar presentations
Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.
Advertisements

TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Serial Interface Dr. Esam Al_Qaralleh CE Department
Boolean Algebra Variables: only 2 values (0,1)
ECE555 Lecture 5 Nam Sung Kim University of Wisconsin – Madison
Sequential Logic Design
1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
0 - 0.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
CS 140 Lecture 11 Sequential Networks: Timing and Retiming Professor CK Cheng CSE Dept. UC San Diego 1.
Signal and Timing Parameters I Common Clock – Class 2
Bus arbitration Processor and DMA controllers both need to initiate data transfers on the bus and access main memory. The device that is allowed to initiate.
Processor Data Path and Control Diana Palsetia UPenn
EE466: VLSI Design Lecture 7: Circuits & Layout
Discrete Mathematical Structures: Theory and Applications
CMOS Circuits.
Static CMOS Circuits.
Registers Computer Organization I 1 September 2009 © McQuain, Feng & Ribbens A clock is a free-running signal with a cycle time. A clock may.
The scale of IC design Small-scale integrated, SSI: gate number usually less than 10 in a IC. Medium-scale integrated, MSI: gate number ~10-100, can operate.
Feb. 17, 2011 Midterm overview Real life examples of built chips
ASYNC07 High Rate Wave-pipelined Asynchronous On-chip Bit-serial Data Link R. Dobkin, T. Liran, Y. Perelman, A. Kolodny, R. Ginosar Technion – Israel Institute.
Digital Techniques Fall 2007 André Deutz, Leiden University
Chapter #8: Finite State Machine Design 8
Copyright © 2013, 2009, 2006 Pearson Education, Inc. 1 Section 5.4 Polynomials in Several Variables Copyright © 2013, 2009, 2006 Pearson Education, Inc.
ASIC 121: Practical VHDL Digital Design for FPGAs Tutorial 2 October 4, 2006.
Chapter 4: Combinational Logic
Addition 1’s to 20.
Princess Sumaya University
25 seconds left…...
Datorteknik TopologicalSort bild 1 To verify the structure Easy to hook together combinationals and flip-flops Harder to make it do what you want.
Week 1.
©2004 Brooks/Cole FIGURES FOR CHAPTER 12 REGISTERS AND COUNTERS Click the mouse to move to the next page. Use the ESC key to exit this chapter. This chapter.
Improved Census Transforms for Resource-Optimized Stereo Vision
©2004 Brooks/Cole FIGURES FOR CHAPTER 11 LATCHES AND FLIP-FLOPS Click the mouse to move to the next page. Use the ESC key to exit this chapter. This chapter.
Datorteknik TopologicalSort bild 1 To verify the structure Easy to hook together combinationals and flip-flops Harder to make it do what you want.
ECE 424 – Introduction to VLSI
Andrey Mokhov, Victor Khomenko Danil Sokolov, Alex Yakovlev Dual-Rail Control Logic for Enhanced Circuit Robustness.
Pass Transistor Logic. Agenda  Introduction  VLSI Design methodologies  Review of MOS Transistor Theory  Inverter – Nucleus of Digital Integrated.
Digital Logic Circuits (Part 2) Computer Architecture Computer Architecture.
Elastic circuits Jordi Cortadella Universitat Politècnica de Catalunya, Barcelona EMicro 2013.
Modern VLSI Design 4e: Chapter 8 Copyright  2008 Wayne Wolf Topics High-level synthesis. Architectures for low power. GALS design.
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
Introduction to CMOS VLSI Design Clock Skew-tolerant circuits.
Clock Design Adopted from David Harris of Harvey Mudd College.
Assume array size is 256 (mult: 4ns, add: 2ns)
Philips Research ApplyingAsynchronous Circuits in Contactless Smart Cards Applying Asynchronous Circuits in Contactless Smart Cards Joep Kessels, Torsten.
Low Power Design for Wireless Sensor Networks Aki Happonen.
Handshake protocols for de-synchronization I. Blunno, J. Cortadella, A. Kondratyev, L. Lavagno, K. Lwin and C. Sotiriou Politecnico di Torino, Italy Universitat.
Jordi Cortadella, Universitat Politècnica de Catalunya, Spain
Synthesis of synchronous elastic architectures Jordi Cortadella (Universitat Politècnica Catalunya) Mike Kishinevsky (Intel Corp.) Bill Grundmann (Intel.
S. Reda EN160 SP’07 Design and Implementation of VLSI Systems (EN0160) Lecture 23: Sequential Circuit Design (1/3) Prof. Sherief Reda Division of Engineering,
Lecture 1 Combinational Logic Design & Flip Flop 2007/09/07 Prof. C.M. Kyung.
Clockless Chips Date: October 26, Presented by:
Paper review: High Speed Dynamic Asynchronous Pipeline: Self Precharging Style Name : Chi-Chuan Chuang Date : 2013/03/20.
Low Power – High Speed MCML Circuits (II)
CS 352 : Computer Organization and Design University of Wisconsin-Eau Claire Dan Ernst Latches & Flip-Flops.
1 Bridging the gap between asynchronous design and designers Peter A. BeerelFulcrum Microsystems, Calabasas Hills, CA, USA Jordi CortadellaUniversitat.
VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 성균관대학교 조 준 동 교수
Clockless Chips Under the esteemed guidance of Romy Sinha Lecturer, REC Bhalki Presented by: Lokesh S. Woldoddy 3RB05CS122 Date:11 April 2009.
Asynchronous Interface Specification, Analysis and Synthesis
Roadmap History Synchronized vs. Asynchronous overview How it works
Reactive Clocks with Variability-Tracking Jitter
Recap: Lecture 1 What is asynchronous design? Why do we want to study it? What is pipelining? How can it be used to design really fast hardware?
Ring Oscillator Clocks and Margins
Lecture 41: Introduction to Reconfigurable Computing
Clocking in High-Performance and Low-Power Systems Presentation given at: EPFL Lausanne, Switzerland June 23th, 2003 Vojin G. Oklobdzija Advanced.
332:578 Deep Submicron VLSI Design Lecture 14 Design for Clock Skew
De-synchronization: from synchronous to asynchronous
A Quasi-Delay-Insensitive Method to Overcome Transistor Variation
Presentation transcript:

Asynchronous Circuits Jordi Cortadella Universitat Politècnica de Catalunya, Barcelona Collège de France May 14 th, 2013

Goals Convince ourselves that: – designing an asynchronous circuit is easy – synchronous and asynchronous circuits are similar – asynchronous circuits bring new advantages Not to discourage designers with exotic and sophisticated asynchronous schemes Collège de France 2013Asynchronous circuits2

Clocking Collège de France 2013Asynchronous circuits Nvidia Kepler TM GK110 How to distribute the clock? How to determine the clock frequency? How to implement robust communications? How to reduce and manage energy? 3 28nm, 7.1B transistors, 550mm 2, 2688 CUDA cores, Base clock: 836MHz, Memory clock: 6GHz

Collège de France 2013Asynchronous circuits4

Synchronous circuits

Synchronous circuit Collège de France 2013Asynchronous circuits Combinational Logic Flip Flops PLLPLL 6

Synchronous circuit Collège de France 2013Asynchronous circuits CLCL Two competing paths: Launching path Capturing path Launching path < Capturing path + Period CLKtree + CL < CLKtree + Period CL < Period (no clock skew) 2 2PLLPLL 7

Source-synchronous Collège de France 2013Asynchronous circuits CLK gen matched delay No global clock required More tolerance to PVT variations Period > longest combinational path Good for acyclic pipelines Launching path Capturing path 8

CLK gen ?? Source-synchronous with forks and joins Collège de France 2013Asynchronous circuits How to synchronize incoming events? 9

C element (Muller 1959) Collège de France 2013Asynchronous circuits C C A B C A B CABC000 01C 10C

C element (Muller 1959) Collège de France 2013Asynchronous circuits A B C A B CABC000 01C 10C 111 MAJMAJ 11 (many implementations exist)

Multi-input C element Collège de France 2013Asynchronous circuits CC CC CC CC CC CC a1 a2 a3 a4 a5 a6 a7 c 12

Completion detection

Collège de France 2013Asynchronous circuits CLKgenCLKgen fixed delay The fixed delay must be longer than the worst-case logic delay (plus variability) Q: could we detect when a computation has completed ASAP ? 14

A 1 SP 0 SP 1 SP 1 SP Delay-insensitive codes: Dual Rail Dual rail: every bit encoded with two signals Collège de France 2013Asynchronous circuits A.tA.fA 00Spacer Not used A.t A.f 15

Dual-Rail AND gate Collège de France 2013Asynchronous circuits ABC SP A B C A.t A.f B.t B.f C.t C.f 16

Dual-Rail Inverter Collège de France 2013Asynchronous circuits AZ SP A.t A.f Z.t Z.f 17

Dual-Rail AND/OR gate Collège de France 2013Asynchronous circuits A B C A.t A.f B.t B.f C.t C.f A B C A.f A.t B.f B.t C.f C.t A B C 18

Dual rail: completion detection Dual-rail logic Collège de France 2013Asynchronous circuits

Dual rail: completion detection Dual-rail logic C done Completion detection tree Collège de France 2013Asynchronous circuits20

Dual rail: completion detection Collège de France 2013Asynchronous circuits ANDOR INV AND CLKgenCLKgen 21

Dual rail: completion detection Collège de France 2013Asynchronous circuits ANDOR INV AND C C 22 C C

Single rail data vs. dual rail Some back-of-the-envelope estimations: Collège de France 2013Asynchronous circuits Single rail Dual Rail Area12 Delay1<< 1 Static power12 Dynamic power< 0.22 Dual rail: Good for speed Large area High power comsumption 23

Handshaking

Handshaking Collège de France 2013Asynchronous circuits CLKgenCLKgen unknown delay Assume that the source module can provide data at any rate: When should the CLK generator send an event if the internal delays of the circuit are unknown? Solution:handshaking Solution: handshaking 25

Handshaking Collège de France 2013Asynchronous circuits I have data I want data Data Request Acknowledge 26

Asynchronous elastic pipelineCC ReqInReqOut AckIn AckOut CC CC CC David Mullers pipeline (late 50s) Sutherlands Micropipelines (Turing award, 1989) Collège de France 2013Asynchronous circuits27

Multiple inputs and outputs Collège de France 2013Asynchronous circuits28

Multiple inputs and outputs Collège de France 2013Asynchronous circuits delaydelay 29

Channel-based communication A channel contains data and handshake wires Collège de France 2013Asynchronous circuits Data Req Ack 30 Data Req Ack

Two-phase protocol Every edge is active It may require double-edge triggered flip-flops or pulse generators Collège de France 2013Asynchronous circuits Data 1 Data 2 Data 3 Req Ack Data Data transfer 31

Four-phase protocol Valid data on the active edge of Req Req/Ack must return to zero before the next transfer Different variations of the 4-phase protocol exist Collège de France 2013Asynchronous circuits Data 1 Data 2 Data 3 Req Ack Data Data transfer 32

How to memorize? Collège de France 2013Asynchronous circuits Combinational Logic LL LL delay CC CC ???? 2-phase or 4-phase ? 33

How to memorize? Collège de France 2013Asynchronous circuits Combinational Logic LL LL delay CC CC Pulse generator 2-phase 34

How to memorize? Collège de France 2013Asynchronous circuits Combinational Logic LL LL delay CC CC 4-phase 35

Performance analysis

Ring oscillators Collège de France 2013Asynchronous circuits C C CC C Every ring requires an odd number of inverters The cycle period is determined by the slowest ring The cycle period is adapted to the operating conditions (temperature, voltage)

Ring oscillators Collège de France 2013Asynchronous circuits C C CC C Every ring requires an odd number of inverters The cycle period is determined by the slowest ring The cycle period is adapted to the operating conditions (temperature, voltage)

Why asynchronous?

Modularity Time-independent functional composability – Performance may be affected (but not functionality) Collège de France 2013Asynchronous circuits40 AA BB DataData Req Ack BB

Tracking variability Collège de France 2013Asynchronous circuits41 matched delay

Tracking variability delay best typ worst multi-corner matched delay critical paths Good correlation for: Process variability (systematic) Global voltage fluctuations Temperature Aging (partially) Good correlation for: Process variability (systematic) Global voltage fluctuations Temperature Aging (partially) Collège de France 2013Asynchronous circuits42

Margins Gate and wire delays (typ) PPVVTTAgingAging PLL Jitter SkewSkew Rigid Clocks: Cycle period Gate and wire delays (typ) PPVVTTAgingAging Elastic Clocks: SkewSkew Cycle period Margin reduction Speed-up / Power savings Collège de France 2013Asynchronous circuits43

wasted time computation time Rigid clock computation time Cycle period Elastic clock Clock elasticity Collège de France 2013Asynchronous circuits44

Voltage scaling and power savings-24%-14% 3 ARM926 cores on the same die Collège de France 2013Asynchronous circuits45

Design Automation

Design automation paradigms Synthesis of asynchronous controllers – Logic synthesis from Petri nets or asynchronous FSMs Syntax-directed translation – Correct-by-construction composition of handshake components De-synchronization – Automatic transformation from synchronous to asynchronous Collège de France 2013Asynchronous circuits47

Synthesis of asynchronous controllers Collège de France 2013Asynchronous circuits48 DSr LDS LDTACK D DTACK LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+

Synthesis of asynchronous controllers Collège de France 2013Asynchronous circuits49 LDS+LDTACK+D+DTACK+DSr-D- DTACK- LDS-LDTACK- DSr+ DTACK D DSr LDS LDTACK Example: Petrify

Syntax-directed translation Collège de France 2013Asynchronous circuits50 (A || B) ; C P = (A || B) ; C

Syntax-directed translation Collège de France 2013Asynchronous circuits51 par AA BB CC A || B seq P = (A || B) ; C

Syntax-directed translation Collège de France 2013Asynchronous circuits52 seq par AA BB CC P = (A || B) ; C

Syntax-directed translation Collège de France 2013Asynchronous circuits53 AA BB P = (A ; B) P = (A ; B) seqseq

Syntax-directed translation Collège de France 2013Asynchronous circuits54 c := a + b + + c ab

Syntax-directed translation Collège de France 2013Asynchronous circuits int = type [0..255] & gcd: main proc (in? chan > & out! chan int) begin x, y: var int | forever do in? > ; do x <> y then if x < y then y:=y-x else x:=x-y fi od ; out!x od end Sources: J. Kessels and A. Peeters. DESCALE: A Design Experiment for a Smart Card Application Consuming Low Energy, in Principles of Asynchronous Circuit Design, A Systems Perspective, Eds., J. Sparso and S. Furber, Kluwer Academic Publishers, P.A.Beerel, R.O. Ozdag and M. Ferretti. A Designers Guide to Asynchronous VLSI, Cambridge University Press,

De-synchronization Strategy: substitute the clock tree by local clocks and handshakes Combinational logic and latches are not modified More tolerance to variability – Similar area, less power and/or more speed Cortadella, Kondratyev, Lavagno and Sotiriou. Desynchronization: Synthesis of asynchronous circuits from synchronous specifications. IEEE TCAD, Oct Collège de France 2013Asynchronous circuits56

Synchronous operation Collège de France 2013Asynchronous circuits CLK gen Transforming a synchronous circuit into asynchronous (automatically) 57

Synchronous operation Collège de France 2013Asynchronous circuits CLK gen Transforming a synchronous circuit into asynchronous (automatically) 58

De-synchronization Collège de France 2013Asynchronous circuits Transforming a synchronous circuit into asynchronous (automatically) 59

De-synchronization Collège de France 2013Asynchronous circuits Transforming a synchronous circuit into asynchronous (automatically) 60

Conclusions Asynchrony offers flexibility in time – Modularity – Dynamic adaptability – Tolerance to variability Better optimization of power/performance Why isnt it an important trend in circuit design? – Lack of commercial EDA support (timing sign-off) – Designers do not feel comfortable with unpredictable timing – Other aspects: testing, verification, … De-synchronization might be a viable solution Collège de France 2013Asynchronous circuits61

Collège de France 2013Asynchronous circuits62