1 Bridging the gap between asynchronous design and designers Thanks to Jordi Cortadella, Luciano Lavagno, Mike Kishinevsky and many others.

Slides:



Advertisements
Similar presentations
1 Bridging the gap between asynchronous design and designers Hao Zheng.
Advertisements

Andrey Mokhov, Victor Khomenko Danil Sokolov, Alex Yakovlev Dual-Rail Control Logic for Enhanced Circuit Robustness.
Data Synchronization Issues in GALS SoCs Rostislav (Reuven) Dobkin and Ran Ginosar Technion Christos P. Sotiriou FORTH ICS- FORTH.
Reading1: An Introduction to Asynchronous Circuit Design Al Davis Steve Nowick University of Utah Columbia University.
Self-Timed Systems Timing complexity growing in digital design -Wiring delays can dominate timing analysis (increasing interdependence between logical.
ELEC 256 / Saif Zahir UBC / 2000 Timing Methodology Overview Set of rules for interconnecting components and clocks When followed, guarantee proper operation.
1 Logic Design of Asynchronous Circuits Jordi Cortadella Jim Garside Alex Yakovlev Univ. Politècnica de Catalunya, Barcelona, Spain Manchester University,
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 24: November 4, 2011 Synchronous Circuits.
Avshalom Elyada, Ran GinosarPipeline Synchronization 1 A Unique and Successfully Implemented Approach to the Synchronization Problem Based on the article.
1 Clockless Logic  Recap: Lookahead Pipelines  High-Capacity Pipelines.
Modern VLSI Design 4e: Chapter 8 Copyright  2008 Wayne Wolf Topics High-level synthesis. Architectures for low power. GALS design.
Asynchronous comparator design
Jordi Cortadella, Universitat Politecnica de Catalunya, Barcelona Mike Kishinevsky, Intel Corp., Strategic CAD Labs, Hillsboro.
Sequential Definitions  Use two level sensitive latches of opposite type to build one master-slave flipflop that changes state on a clock edge (when the.
Synchronous Digital Design Methodology and Guidelines
CSE477 L19 Timing Issues; Datapaths.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 19: Timing Issues; Introduction to Datapath.
Clock Design Adopted from David Harris of Harvey Mudd College.
Hazard-free logic synthesis and technology mapping I Jordi Cortadella Michael Kishinevsky Alex Kondratyev Luciano Lavagno Alex Yakovlev Univ. Politècnica.
Hardware and Petri nets Synthesis of asynchronous circuits from Signal Transition Graphs.
1 Delay Insensitivity does not mean slope insensitivity! Vainbaum Yuri.
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 1 VLSI Architectures Lecture 3 S&F Ch. 5: Handshake Ckt Implementations.
Digital Integrated Circuits© Prentice Hall 1995 Timing ISSUES IN TIMING.
ECE Synthesis & Verification - Lecture 8 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Introduction.
1 Clockless Logic Montek Singh Thu, Jan 13, 2004.
1 Clockless Logic Montek Singh Tue, Mar 23, 2004.
© Ran GinosarAsynchronous Design and Synchronization 1 VLSI Architectures Lecture 2: Theoretical Aspects (S&F 2.5) Data Flow Structures.
1 Clockless Logic Montek Singh Tue, Mar 16, 2004.
ELEC 6200, Fall 07, Oct 24 Jiang: Async. Processor 1 Asynchronous Processor Design for ELEC 6200 by Wei Jiang.
Low Power Design for Wireless Sensor Networks Aki Happonen.
Handshake protocols for de-synchronization I. Blunno, J. Cortadella, A. Kondratyev, L. Lavagno, K. Lwin and C. Sotiriou Politecnico di Torino, Italy Universitat.
COMP Clockless Logic and Silicon Compilers Lecture 3
1 Logic synthesis from concurrent specifications Jordi Cortadella Universitat Politecnica de Catalunya Barcelona, Spain In collaboration with M. Kishinevsky,
Jordi Cortadella, Universitat Politècnica de Catalunya, Spain
ASYNC 2000 Eilat April Priority Arbiters Alex Bystrov David Kinniment Alex Yakovlev University of Newcastle upon Tyne, UK.
High-Throughput Asynchronous Pipelines for Fine-Grain Dynamic Datapaths Montek Singh and Steven Nowick Columbia University New York, USA
Synthesis of Asynchronous Control Circuits with Automatically Generated Relative Timing Assumptions Jordi Cortadella, University Politècnica de Catalunya.
Chapter #6: Sequential Logic Design 6.2 Timing Methodologies
1 Bridging the gap between asynchronous design and designers Peter A. BeerelFulcrum Microsystems, Calabasas Hills, CA, USA Jordi CortadellaUniversitat.
1 Clockless Computing Montek Singh Thu, Sep 13, 2007.
Fall 2009 / Winter 2010 Ran Ginosar (
Asynchronous Circuit Verification and Synthesis with Petri Nets J. Cortadella Universitat Politècnica de Catalunya, Barcelona Thanks to: Michael Kishinevsky.
Lecture 11 MOUSETRAP: Ultra-High-Speed Transition-Signaling Asynchronous Pipelines.
1 Recap: Lectures 5 & 6 Classic Pipeline Styles 1. Williams and Horowitz’s PS0 pipeline 2. Sutherland’s micropipelines.
1 Clockless Logic: Dynamic Logic Pipelines (contd.)  Drawbacks of Williams’ PS0 Pipelines  Lookahead Pipelines.
Asynchronous Circuits Kent Orthner Wed. March 2nd, 2005 Presentation for: High speed and Low Power VLSI, Dr. Maitham Shams Kent Orthner Wed. March 2nd,
Clockless Chips Date: October 26, Presented by:
Amitava Mitra Intel Corp., Bangalore, India William F. McLaughlin
MOUSETRAP Ultra-High-Speed Transition-Signaling Asynchronous Pipelines Montek Singh & Steven M. Nowick Department of Computer Science Columbia University,
Paper review: High Speed Dynamic Asynchronous Pipeline: Self Precharging Style Name : Chi-Chuan Chuang Date : 2013/03/20.
1 Clockless Computing Montek Singh Thu, Sep 6, 2007  Review: Logic Gate Families  A classic asynchronous pipeline by Williams.
Reading1: An Introduction to Asynchronous Circuit Design Al Davis Steve Nowick University of Utah Columbia University.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 26: October 31, 2014 Synchronous Circuits.
Reader: Pushpinder Kaur Chouhan
Reading Assignment: Rabaey: Chapter 9
Lecture 11: FPGA-Based System Design October 18, 2004 ECE 697F Reconfigurable Computing Lecture 11 FPGA-Based System Design.
1 Practical Design and Performance Evaluation of Completion Detection Circuits Fu-Chiung Cheng Department of Computer Science Columbia University.
1 Bridging the gap between asynchronous design and designers Peter A. BeerelFulcrum Microsystems, Calabasas Hills, CA, USA Jordi CortadellaUniversitat.
VADA Lab.SungKyunKwan Univ. 1 L5:Lower Power Architecture Design 성균관대학교 조 준 동 교수
Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California,
EE 466/586 VLSI Design Partha Pande School of EECS Washington State University
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 20: October 25, 2010 Pass Transistors.
1 Recap: Lecture 4 Logic Implementation Styles:  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates, or “pass-transistor” logic.
1 Clockless Logic Montek Singh Thu, Mar 2, Review: Logic Gate Families  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates,
Specification mining for asynchronous controllers Javier de San Pedro† Thomas Bourgeat ‡ Jordi Cortadella† † Universitat Politecnica de Catalunya ‡ Massachusetts.
Asynchronous Interface Specification, Analysis and Synthesis
Recap: Lecture 1 What is asynchronous design? Why do we want to study it? What is pipelining? How can it be used to design really fast hardware?
Fundamentals of Computer Science Part i2
Synthesis of asynchronous controllers from Signal Transition Graphs:
De-synchronization: from synchronous to asynchronous
Clockless Logic: Asynchronous Pipelines
Presentation transcript:

1 Bridging the gap between asynchronous design and designers Thanks to Jordi Cortadella, Luciano Lavagno, Mike Kishinevsky and many others

2 Outline 1.Basic concepts on asynchronous circuit design 2.Logic synthesis from concurrent specifications 3.Design automation for asynchronous circuits

3 Basic concepts on asynchronous circuit design

4 Outline What is an asynchronous circuit ? Asynchronous communication Asynchronous design styles (Micropipelines) Asynchronous logic building blocks Control specification and implementation Delay models and classes of async circuits Why asynchronous circuits ?

5 Synchronous circuit RRRRCL CLK Implicit (global) synchronization between blocks Clock period > Max Delay (CL + R) Time is an independent physical variable (quantity)

6 Asynchronous circuit RRRRCL Req Ack Explicit (local) synchronization: Req / Ack handshakes Time = events + quantity Time does not exist if nothing happens (Aristotle)

7 Motivation for asynchronous Asynchronous design is often unavoidable: Asynchronous interfaces, arbiters etc. Asynchronous interfaces, arbiters etc. Modern clocking is multi-phase and distributed – and virtually ‘asynchronous’ (cf. GALS – next slide): Mesachronous (clock travels together with data) Mesachronous (clock travels together with data) Local (possibly stretchable) clock generation Local (possibly stretchable) clock generation Robust asynchronous design flow is coming (e.g. VLSI programming from Philips, NCL from Theseus Logic, fine-grain pipelining from Fulcrum)

8 Globally Async Locally Sync (GALS) Local CLK RR CL Async-to-sync Wrapper Req1 Req2 Req3 Req4 Ack3 Ack4 Ack2 Ack1 Asynchronous World Clocked Domain

9 Key Design Differences Synchronous logic design: proceeds without taking timing correctness (hazards, signal ack-ing etc.) into account proceeds without taking timing correctness (hazards, signal ack-ing etc.) into account Combinational logic and memory latches (registers) are built separately Combinational logic and memory latches (registers) are built separately Static timing analysis of CL is sufficient to determine the Max Delay (clock period) Static timing analysis of CL is sufficient to determine the Max Delay (clock period) Fixed set-up and hold conditions for latches Fixed set-up and hold conditions for latches

10 Key Design Differences Asynchronous logic design: Must ensure hazard-freedom, signal ack-ing, local timing constraints Must ensure hazard-freedom, signal ack-ing, local timing constraints Combinational logic and memory latches (registers) are often mixed in “complex gates” Combinational logic and memory latches (registers) are often mixed in “complex gates” Dynamic timing analysis of logic is needed to determine relative delays between paths Dynamic timing analysis of logic is needed to determine relative delays between paths To avoid complex issues, circuits may be built as Delay-insensitive and/or Speed- independent (Maller’s theory vs Huffman asynchronous automata)

11 Verification and Testing Differences Synchronous logic verification and testing: Only functional correctness aspect is verified and tested Only functional correctness aspect is verified and tested Testing can be done with standard ATE and at low speed Testing can be done with standard ATE and at low speed Asynchronous logic verification and testing: In addition to functional correctness, temporal aspect is crucial: e.g. causality and order, deadlock-freedom In addition to functional correctness, temporal aspect is crucial: e.g. causality and order, deadlock-freedom Testing must cover faults in complex gates (logic+memory) and must proceed at normal operation rate Testing must cover faults in complex gates (logic+memory) and must proceed at normal operation rate Delay fault testing may be needed Delay fault testing may be needed

12 Synchronous communication Clock edges determine the time instants where data must be sampled Data wires may glitch between clock edges (set- up/hold times must be satisfied) Data are transmitted at a fixed rate (clock frequency)

13 Dual rail Two wires with L(low) and H (high) per bit “LL” = “spacer”, “LH” = “0”, “HL” = “1” “LL” = “spacer”, “LH” = “0”, “HL” = “1” n-bit data communication requires 2n wires Each bit is self-timed Other delay-insensitive codes exist (e.g. k-of-n) and event-based signalling (choice criteria: pin and power efficiency)

14 Bundled data Validity signal Similar to an aperiodic local clock Similar to an aperiodic local clock n-bit data communication requires n+1 wires Data wires may glitch when no valid Signaling protocols level sensitive (latch) level sensitive (latch) transition sensitive (register): 2-phase / 4-phase transition sensitive (register): 2-phase / 4-phase

15 Example: memory read cycle Transition signaling, 4-phase Valid address Address Valid data Data AA DD

16 Example: memory read cycle Transition signaling, 2-phase Valid address Address Valid data Data AA DD

17 Asynchronous modules Signaling protocol: reqin+ start+ [computation] done+ reqout+ ackout+ ackin+ reqin- start- [reset] done- reqout- ackout- ackin- (more concurrency is also possible) Data INData OUT req inreq out ack inack out DATA PATH CONTROL startdone

18 Asynchronous latches: C element C A B Z A B Z Z 1 0 Z Vdd Gnd A A A AB B B B Z Z Z [van Berkel 91] Static Logic Implementation

19 C-element: Other implementations A A B B Gnd Vdd Z A A B B Gnd Vdd Z Weak inverter Quasi-Static Dynamic

20 Dual-rail logic A.t A.f B.t B.f C.t C.f Dual-rail AND gate Valid behavior for monotonic environment

21 Completion detection Dual-rail logic C done Completion detection tree

22 Differential cascode voltage switch logic start A.t B.t C.t A.fB.f C.f Z.tZ.f done 3-input AND/NAND gate N-type transistor network

23 Examples of dual-rail design Asynchronous dual-rail ripple-carry adder (A. Martin, 1991) Critical delay is proportional to logN (N=number of bits) Critical delay is proportional to logN (N=number of bits) 32-bit adder delay (1.6m MOSIS CMOS): 11ns versus 40 ns for synchronous 32-bit adder delay (1.6m MOSIS CMOS): 11ns versus 40 ns for synchronous Async cell transistor count = 34 versus synchronous = 28 Async cell transistor count = 34 versus synchronous = 28 More recent success stories (modularity and automatic synthesis) of dual-rail logic from Null-Convension Logic from Theseus Logic

24 Bundled-data logic blocks Single-rail logic delay startdone Conventional logic + matched delay

25 Micropipelines (Sutherland 89) C Join Merge Toggle r1 r2 g1 g2 d1 d2 Request- Grant-Done (RGD)Arbiter Call r1 r2 r a a1 a2 Select in outf outt sel in out 0 out 1 Micropipeline (2-phase) control blocks

26 Micropipelines (Sutherland 89) LLLLlogic R in A out C C C C R out A in delay

27 Data-path / Control LLLLlogic R in R out CONTROL A in A out Synthesis of control is a major challenge

28 Control specification A+ B+ A- B- A B A input B output

29 Control specification A+ B- A- B+ A B

30 Control specification A+ C- A- C+ A C B+ B- B C

31 Control specification A+ C- A- C+ A C B+ B- B C

32 Control specification C C Ri Ro Ai Ao Ri+ Ao+ Ri- Ao- Ro+ Ai+ Ro- Ai- Ri Ro Ao Ai FIFO cntrl

33 Gate vs wire delay models Gate delay model: delays in gates, no delays in wires Wire delay model: delays in gates and wires

34 Delay models for async. circuits Bounded delays (BD): realistic for gates and wires. Technology mapping is easy, verification is difficult Technology mapping is easy, verification is difficult Speed independent (SI): Unbounded (pessimistic) delays for gates and “negligible” (optimistic) delays for wires. Technology mapping is more difficult, verification is easy Technology mapping is more difficult, verification is easy Delay insensitive (DI): Unbounded (pessimistic) delays for gates and wires. DI class (built out of basic gates) is almost empty DI class (built out of basic gates) is almost empty Quasi-delay insensitive (QDI): Delay insensitive except for critical wire forks (isochronic forks). In practice it is the same as speed independent In practice it is the same as speed independent BD SI  QDI DI

35 Environment models Slow enough environment = Fundamental mode Slow enough environment = Fundamental mode (Inputs change AFTER system has settled) (Inputs change AFTER system has settled) Reactive environment = I/O mode (Inputs may change once the first output changes) (Inputs may change once the first output changes)

36 Correctness of a circuit wrt delay assumptions a b z C-element: z = ab +zb + za a b z

37 Motivation (designer’s view) Modularity for system-on-chip design Plug-and-play interconnectivity Plug-and-play interconnectivity Average-case peformance No worst-case delay synchronization No worst-case delay synchronization Many interfaces are asynchronous Buses, networks,... Buses, networks,...

38 Motivation (technology aspects) Low power Automatic clock gating Automatic clock gating Electromagnetic compatibility No peak currents around clock edges No peak currents around clock edgesSecurity No ‘electro-magnetic difference’ between logical ‘0’ and ‘1’in dual rail code No ‘electro-magnetic difference’ between logical ‘0’ and ‘1’in dual rail codeRobustness High immunity to technology and environment variations (temperature, power supply,...) High immunity to technology and environment variations (temperature, power supply,...)

39 Resistance Concurrent models for specification CSP, Petri nets,...: no more FSMs CSP, Petri nets,...: no more FSMs Difficult to design Hazards, synchronization Hazards, synchronization Complex timing analysis Difficult to estimate performance Difficult to estimate performance Difficult to test No way to stop the clock No way to stop the clock

40 But... some successful stories Philips AMULET microprocessors Sharp Intel (RAPPID) Start-up companies: Theseus logic, Fulcrum, Self-Timed Solutions Theseus logic, Fulcrum, Self-Timed Solutions Recent blurb: It's Time for Clockless Chips, by Claire Tristram (MIT Technology Review, v. 104, no.8, October 2001: ct01/tristram.asp) ct01/tristram.asp ct01/tristram.asp …. ….