1 Bridging the gap between asynchronous design and designers Hao Zheng.

Slides:



Advertisements
Similar presentations
Spatial Computation Thesis committee: Seth Goldstein Peter Lee Todd Mowry Babak Falsafi Nevin Heintze Ph.D. Thesis defense, December 8, 2003 SCS Mihai.
Advertisements

ECE555 Lecture 5 Nam Sung Kim University of Wisconsin – Madison
Adders Used to perform addition, subtraction, multiplication, and division (sometimes) Half-adder adds rightmost (least significant) bit Full-adder.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
McGraw-Hill©The McGraw-Hill Companies, Inc., 2003 Chapter 3 Data Transmission.
Sequential Logic Design
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 1 Embedded Computing.
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
Balanced Device Characterization. Page 2 Outline Characteristics of Differential Topologies Measurement Alternatives Unbalanced and Balanced Performance.
UNITED NATIONS Shipment Details Report – January 2006.
RXQ Customer Enrollment Using a Registration Agent (RA) Process Flow Diagram (Move-In) Customer Supplier Customer authorizes Enrollment ( )
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Properties of Real Numbers CommutativeAssociativeDistributive Identity + × Inverse + ×
Asynchronous Circuits Jordi Cortadella Universitat Politècnica de Catalunya, Barcelona Collège de France May 14 th, 2013.
Microprocessor Architecture Pipelined Architecture
Advance Nano Device Lab. Fundamentals of Modern VLSI Devices 2 nd Edition Yuan Taur and Tak H.Ning 0 Ch9. Memory Devices.
Chapter 3 Basic Logic Gates 1.
Chapter 4 Gates and Circuits.
Introduction to CMOS VLSI Design Combinational Circuits
EE466: VLSI Design Lecture 7: Circuits & Layout
Chapter 4 Gates and Circuits.
Discrete Mathematical Structures: Theory and Applications
CMOS Circuits.
Static CMOS Circuits.
Digital Logical Structures
Chapter 3 Logic Gates.
CMOS Logic Circuits.
The scale of IC design Small-scale integrated, SSI: gate number usually less than 10 in a IC. Medium-scale integrated, MSI: gate number ~10-100, can operate.
Logic Gates Flip-Flops Registers Adders
Transistors: Building blocks of electronic computing Lin Zhong ELEC101, Spring 2011.
ASYNC07 High Rate Wave-pipelined Asynchronous On-chip Bit-serial Data Link R. Dobkin, T. Liran, Y. Perelman, A. Kolodny, R. Ginosar Technion – Israel Institute.
Flip-Flops and Registers
DAQmx下多點(Multi-channels)訊號量測
IP Multicast Information management 2 Groep T Leuven – Information department 2/14 Agenda •Why IP Multicast ? •Multicast fundamentals •Intradomain.
Chapter 4 Gates and Circuits.
Name Convolutional codes Tomashevich Victor. Name- 2 - Introduction Convolutional codes map information to code bits sequentially by convolving a sequence.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
Datorteknik TopologicalSort bild 1 To verify the structure Easy to hook together combinationals and flip-flops Harder to make it do what you want.
Gursharan Singh Tatla PIN DIAGRAM OF 8086 Gursharan Singh Tatla Gursharan Singh Tatla
Analyzing Genes and Genomes
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Essential Cell Biology
Intracellular Compartments and Transport
PSSA Preparation.
Essential Cell Biology
Practical Considerations for Digital Design
Datorteknik TopologicalSort bild 1 To verify the structure Easy to hook together combinationals and flip-flops Harder to make it do what you want.
Mani Srivastava UCLA - EE Department Room: 6731-H Boelter Hall Tel: WWW: Copyright 2003.
Energy Generation in Mitochondria and Chlorplasts
Andrey Mokhov, Victor Khomenko Danil Sokolov, Alex Yakovlev Dual-Rail Control Logic for Enhanced Circuit Robustness.
Reading1: An Introduction to Asynchronous Circuit Design Al Davis Steve Nowick University of Utah Columbia University.
1 Logic Design of Asynchronous Circuits Jordi Cortadella Jim Garside Alex Yakovlev Univ. Politècnica de Catalunya, Barcelona, Spain Manchester University,
Jordi Cortadella, Universitat Politecnica de Catalunya, Barcelona Mike Kishinevsky, Intel Corp., Strategic CAD Labs, Hillsboro.
1 Clockless Logic Montek Singh Thu, Jan 13, 2004.
Jordi Cortadella, Universitat Politècnica de Catalunya, Spain
1 Bridging the gap between asynchronous design and designers Thanks to Jordi Cortadella, Luciano Lavagno, Mike Kishinevsky and many others.
1 Bridging the gap between asynchronous design and designers Peter A. BeerelFulcrum Microsystems, Calabasas Hills, CA, USA Jordi CortadellaUniversitat.
Lecture 11 MOUSETRAP: Ultra-High-Speed Transition-Signaling Asynchronous Pipelines.
Reading1: An Introduction to Asynchronous Circuit Design Al Davis Steve Nowick University of Utah Columbia University.
1 Bridging the gap between asynchronous design and designers Peter A. BeerelFulcrum Microsystems, Calabasas Hills, CA, USA Jordi CortadellaUniversitat.
1 Clockless Logic Montek Singh Thu, Mar 2, Review: Logic Gate Families  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates,
Asynchronous Interface Specification, Analysis and Synthesis
Recap: Lecture 1 What is asynchronous design? Why do we want to study it? What is pipelining? How can it be used to design really fast hardware?
Synthesis of asynchronous controllers from Signal Transition Graphs:
Clockless Logic: Asynchronous Pipelines
Presentation transcript:

1 Bridging the gap between asynchronous design and designers Hao Zheng

2 Outline What is an asynchronous circuit ? Asynchronous communication Asynchronous design styles (Micropipelines) Asynchronous logic building blocks Control specification and implementation Delay models and classes of async circuits Why asynchronous circuits ?

3 Synchronous circuit RRRRCL CLK Implicit (global) synchronization between blocks Clock period > Max Delay (CL + R) Time is an independent physical variable (quantity)

4 Asynchronous circuit RRRRCL Req Ack Explicit (local) synchronization: Req / Ack handshakes Time = events + quantity Time does not exist if nothing happens (Aristotle)

5 Motivation for Asynchronous Asynchronous design is often unavoidable: Asynchronous interfaces, arbiters etc. Modern clocking is multi-phase and distributed – and virtually asynchronous (cf. GALS – next slide): Mesachronous (clock travels together with data) Local (possibly stretchable) clock generation Robust asynchronous design flow is coming (e.g. VLSI programming from Philips, NCL from Theseus Logic, fine-grain pipelining from Fulcrum)

6 Motivation (Technology Aspects) Low power Automatic clock gating Electromagnetic compatibility No peak currents around clock edges Security No electro-magnetic difference between logical 0 and 1in dual rail code Robustness High immunity to technology and environment variations (temperature, power supply,...)

7 Motivation (Designers View) Modularity for system-on-chip design Plug-and-play interconnectivity Average-case peformance No worst-case delay synchronization Many interfaces are asynchronous Buses, networks,...

8 Globally Async Locally Sync (GALS) Local CLK RR CL Async-to-sync Wrapper Req1 Req2 Req3 Req4 Ack3 Ack4 Ack2 Ack1 Asynchronous World Clocked Domain

9 Key Design Differences Synchronous logic design: proceeds without taking timing correctness (hazards, signal ack-ing etc.) into account Combinational logic and memory latches (registers) are built separately Static timing analysis of CL is sufficient to determine the Max Delay (clock period) Fixed set-up and hold conditions for latches

10 Key Design Differences Asynchronous logic design: Must ensure hazard-freedom, signal ack-ing, local timing constraints Combinational logic and memory latches (registers) are often mixed in complex gates Dynamic timing analysis of logic is needed to determine relative delays between paths To avoid complex issues, circuits may be built as Delay-insensitive and/or Speed-independent (Mallers theory vs Huffman asynchronous automata)

11 Verification and Testing Differences Synchronous logic verification and testing: Only functional correctness aspect is verified and tested Testing can be done with standard ATE and at low speed Asynchronous logic verification and testing: In addition to functional correctness, temporal aspect is crucial: e.g. causality and order, deadlock-freedom Testing must cover faults in complex gates (logic+memory) and must proceed at normal operation rate Delay fault testing may be needed

12 Synchronous communication Clock edges determine the time instants where data must be sampled Data wires may glitch between clock edges (set-up/hold times must be satisfied) Data are transmitted at a fixed rate (clock frequency)

13 Dual Rail Two wires with L(low) and H (high) per bit LL = spacer, LH = 0, HL = 1 n-bit data communication requires 2n wires Each bit is self-timed Other delay-insensitive codes exist (e.g. k-of-n) and event-based signalling (choice criteria: pin and power efficiency)

14 Bundled Data Validity signal Similar to an aperiodic local clock n-bit data communication requires n+1 wires Data wires may glitch when no validity signal. Signaling protocols level sensitive (latch) transition sensitive (register): 2-phase / 4-phase

15 Example: Memory Read Cycle Transition signaling, 4-phase Valid address Address Valid data Data AA DD

16 Example: Memory Read Cycle Transition signaling, 2-phase Valid address Address Valid data Data AA DD

17 Asynchronous Modules Signaling protocol: reqin+ start+ [computation] done+ reqout+ ackout+ ackin+ reqin- start- [reset] done- reqout- ackout- ackin- (more concurrency is also possible) Data INData OUT req inreq out ack inack out DATA PATH CONTROL startdone

18 Asynchronous Latches: C element C A B Z A B Z Z 1 0 Z Vdd Gnd A A A AB B B B Z Z Z [van Berkel 91] Static Logic Implementation

19 C-element: Other Implementations A A B B Gnd Vdd Z A A B B Gnd Vdd Z Weak inverter Quasi-Static Dynamic

20 Dual-Rail Logic A.t A.f B.t B.f C.t C.f Dual-rail AND gate Valid behavior for monotonic environment

21 Completion Detection Dual-rail logic C done Completion detection tree

22 Differential Cascode Voltage Switch Logic start A.t B.t C.t A.fB.f C.f Z.tZ.f done 3-input AND/NAND gate N-type transistor network

23 Examples of Dual-Rail Design Asynchronous dual-rail ripple-carry adder (A. Martin, 1991) Critical delay is proportional to logN (N=number of bits) 32-bit adder delay (1.6m MOSIS CMOS): 11ns versus 40 ns for synchronous Async cell transistor count = 34 versus synchronous = 28 More recent success stories (modularity and automatic synthesis) of dual-rail logic from Null- Convension Logic from Theseus Logic

24 Bundled-Data Logic Blocks Single-rail logic delay startdone Conventional logic + matched delay

25 Micropipelines (Sutherland 89) C Join Merge Toggle r1 r2 g1 g2 d1 d2 Request- Grant-Done (RGD)Arbiter Call r1 r2 r a a1 a2 Select in outf outt sel in out 0 out 1 Micropipeline (2-phase) control blocks

26 Micropipelines (Sutherland 89) LLLLlogic R in A out C C CC R out A in delay

27 DataPath / Control LLLLlogic R in R out CONTROL A in A out Synthesis of control is a major challenge

28 Control specification A+ B+ A- B- A B A input B output

29 Control specification A+ B- A- B+ A B

30 Control specification A+ C- A- C+ A C B+ B- B C

31 Control specification A+ C- A- C+ A C B+ B- B C

32 Control Specification C C Ri Ro Ai Ao Ri+ Ao+ Ri- Ao- Ro+ Ai+ Ro- Ai- Ri Ro Ao Ai FIFO cntrl

33 Gate vs Wire delay models Gate delay model: delays in gates, no delays in wires Wire delay model: delays in gates and wires

34 Delay Models for Async. Circuits Bounded delays (BD): realistic for gates and wires. Technology mapping is easy, verification is difficult Speed independent (SI): Unbounded (pessimistic) delays for gates and negligible (optimistic) delays for wires. Technology mapping is more difficult, verification is easy Delay insensitive (DI): Unbounded (pessimistic) delays for gates and wires. DI class (built out of basic gates) is almost empty Quasi-delay insensitive (QDI): Delay insensitive except for critical wire forks (isochronic forks). In practice it is the same as speed independent BD SI QDI DI

35 Environment models Slow enough environment = Fundamental mode (Inputs change AFTER system has settled) Reactive environment = I/O mode (Inputs may change once the first output changes)

36 Correctness of a Circuit wrt Delay Assumptions a b z C-element: z = ab +zb + za a b z

37 Resistance Concurrent models for specification CSP, Petri nets,...: no more FSMs Difficult to design Hazards, synchronization Complex timing analysis Difficult to estimate performance Difficult to test No way to stop the clock

38 But... some successful stories Philips AMULET microprocessors Sharp Intel (RAPPID) Start-up companies: Theseus logic, Fulcrum, Self-Timed Solutions Recent blurb: It's Time for Clockless Chips, by Claire Tristram (MIT Technology Review, v. 104, no.8, October 2001: ct01/tristram.asp) ct01/tristram.asp ….