Fall 2009 / Winter 2010 Ran Ginosar (www.ee.technion.ac.il/~ran)

Slides:



Advertisements
Similar presentations
Self-Timed Logic Timing complexity growing in digital design -Wiring delays can dominate timing analysis (increasing interdependence between logical and.
Advertisements

ASYNC07 High Rate Wave-pipelined Asynchronous On-chip Bit-serial Data Link R. Dobkin, T. Liran, Y. Perelman, A. Kolodny, R. Ginosar Technion – Israel Institute.
Dr. Rabie A. Ramadan Al-Azhar University Lecture 3
Reading1: An Introduction to Asynchronous Circuit Design Al Davis Steve Nowick University of Utah Columbia University.
EXTERNAL COMMUNICATIONS DESIGNING AN EXTERNAL 3 BYTE INTERFACE Mark Neil - Microprocessor Course 1 External Memory & I/O.
Digital Design - Sequential Logic Design Chapter 3 - Sequential Logic Design.
Slide 1/20IWLS 2003, May 30Early Output Logic with Anti-Tokens Charlie Brej, Jim Garside APT Group Manchester University.
Copyright © 2001 Stephen A. Edwards All rights reserved Review of Digital Logic Prof. Stephen A. Edwards.
Circuits require memory to store intermediate data
Presenter : Ching-Hua Huang 2012/4/16 A Low-latency GALS Interface Implementation Yuan-Teng Chang; Wei-Che Chen; Hung-Yue Tsai; Wei-Min Cheng; Chang-Jiu.
P. Keresztes, L.T. Kóczy, A. Nagy, G.Rózsa: Training Electrical Engineers on Asynchronous Logic Circuits on Constant Weight Codes 1 Training Electrical.
Modern VLSI Design 4e: Chapter 8 Copyright  2008 Wayne Wolf Topics High-level synthesis. Architectures for low power. GALS design.
Introduction to CMOS VLSI Design Lecture 19: Design for Skew David Harris Harvey Mudd College Spring 2004.
CSE477 L19 Timing Issues; Datapaths.1Irwin&Vijay, PSU, 2002 CSE477 VLSI Digital Circuits Fall 2002 Lecture 19: Timing Issues; Introduction to Datapath.
Clock Design Adopted from David Harris of Harvey Mudd College.
1 Asynchronous Bit-stream Compression (ABC) IEEE 2006 ABC Asynchronous Bit-stream Compression Arkadiy Morgenshtein, Avinoam Kolodny, Ran Ginosar Technion.
© Ran Ginosar Lecture 3: Handshake Ckt Implementations 1 VLSI Architectures Lecture 3 S&F Ch. 5: Handshake Ckt Implementations.
Henry Hexmoor1 Chapter 7 Henry Hexmoor Registers and RTL.
Digital Integrated Circuits© Prentice Hall 1995 Timing ISSUES IN TIMING.
1 Clockless Logic Montek Singh Thu, Jan 13, 2004.
1 Clockless Logic Montek Singh Tue, Mar 23, 2004.
© Ran GinosarAsynchronous Design and Synchronization 1 VLSI Architectures Lecture 2: Theoretical Aspects (S&F 2.5) Data Flow Structures.
ELEC 6200, Fall 07, Oct 24 Jiang: Async. Processor 1 Asynchronous Processor Design for ELEC 6200 by Wei Jiang.
COMP Clockless Logic and Silicon Compilers Lecture 3
Chapter 4 Gates and Circuits.
Jordi Cortadella, Universitat Politècnica de Catalunya, Spain
9/19/06 Hofstra University – Overview of Computer Science, CSC005 1 Chapter 4 Gates and Circuits.
Network Data Organizational Communications and Technologies Prithvi N. Rao Carnegie Mellon University Web:
Chapter #6: Sequential Logic Design 6.2 Timing Methodologies
مرتضي صاحب الزماني  The registers are master-slave flip-flops (a.k.a. edge-triggered) –At the beginning of each cycle, propagate values from primary inputs.
Avshalom Elyada, Ran GinosarPipeline Synchronization 1 Pipeline Synchronization Continued This second part is based on the recent article Bridging Clock.
Lecture 11 MOUSETRAP: Ultra-High-Speed Transition-Signaling Asynchronous Pipelines.
Electronic Counters.
1 Sequential Circuits Registers and Counters. 2 Master Slave Flip Flops.
Digital System Bus A bus in a digital system is a collection of (usually unbroken) signal lines that carry module-to-module communications. The signals.
Chapter 4 Gates and Circuits.
Chapter 4 Gates and Circuits.
Amitava Mitra Intel Corp., Bangalore, India William F. McLaughlin
MOUSETRAP Ultra-High-Speed Transition-Signaling Asynchronous Pipelines Montek Singh & Steven M. Nowick Department of Computer Science Columbia University,
1 EE 587 SoC Design & Test Partha Pande School of EECS Washington State University
CMOS Design Methods.
Paper review: High Speed Dynamic Asynchronous Pipeline: Self Precharging Style Name : Chi-Chuan Chuang Date : 2013/03/20.
Parallel architecture Technique. Pipelining Processor Pipelining is a technique of decomposing a sequential process into sub-processes, with each sub-process.
Reading1: An Introduction to Asynchronous Circuit Design Al Davis Steve Nowick University of Utah Columbia University.
Fall 2004EE 3563 Digital Systems Design EE 3563 VHSIC Hardware Description Language  Required Reading: –These Slides –VHDL Tutorial  Very High Speed.
1 COMP541 Sequential Circuits Montek Singh Feb 1, 2012.
12004 MAPLD: 153Brej Early output logic and Anti-Tokens Charlie Brej APT Group Manchester University.
Reading Assignment: Rabaey: Chapter 9
1 Bridging the gap between asynchronous design and designers Peter A. BeerelFulcrum Microsystems, Calabasas Hills, CA, USA Jordi CortadellaUniversitat.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU 99-1 Under-Graduate Project Design of Datapath Controllers Speaker: Shao-Wei Feng Adviser:
Implementing Tile-based Chip Multiprocessors with GALS Clocking Styles Zhiyi Yu, Bevan Baas VLSI Computation Lab, ECE Department University of California,
REGISTER TRANSFER LANGUAGE (RTL) INTRODUCTION TO REGISTER Registers1.
Penn ESE370 Fall DeHon 1 ESE370: Circuit-Level Modeling, Design, and Optimization for Digital Systems Day 20: October 25, 2010 Pass Transistors.
1 Recap: Lecture 4 Logic Implementation Styles:  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates, or “pass-transistor” logic.
1 Clockless Logic Montek Singh Thu, Mar 2, Review: Logic Gate Families  Static CMOS logic  Dynamic logic, or “domino” logic  Transmission gates,
Advanced Digital Design
Interconnection Structures
System-on-Chip Design Homework Solutions
Other Approaches.
REGISTER TRANSFER LANGUAGE (RTL)
Advanced Digital Design
Recap: Lecture 1 What is asynchronous design? Why do we want to study it? What is pipelining? How can it be used to design really fast hardware?
Dr. Michael Nasief Lecture 2
CPE/EE 422/522 Advanced Logic Design L03
CS105 Introduction to Computer Concepts GATES and CIRCUITS
Clockless Logic: Asynchronous Pipelines
Registers Today we’ll see some common sequential devices: counters and registers. They’re good examples of sequential analysis and design. They are also.
A Quasi-Delay-Insensitive Method to Overcome Transistor Variation
Early output logic and Anti-Tokens
William Stallings Computer Organization and Architecture
Presentation transcript:

Fall 2009 / Winter 2010 Ran Ginosar (www.ee.technion.ac.il/~ran) VLSI Architectures Fall 2009 / Winter 2010 Ran Ginosar (www.ee.technion.ac.il/~ran) © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

Asynchronous Design and Synchronization Topics Asynchronous VLSI design SoC (System on Chip) global timing, clocking, synchronization  Many-core parallel processors on chip © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

Asynchronous Design and Synchronization Sources Sparsø and Furber, Principles of Asynchronous Circuit Design, Ch. 1-12 Free copy: http://www.ee.technion.ac.il/courses/048878/book.pdf Dally and Poulton, Digital System Engineering, Ch. 9-10 Journal / Conference Papers Slides will be posted on the web. No paper copies. © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

Asynchronous Design and Synchronization Requirements Attendance (roll call) Readings Homework Final Project © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

Asynchronous Design and Synchronization First Assignment Read Chapters 1-2 in Sparso & Furber Read David,Ginosar&Yoeli, “An Efficient Implementation of Boolean Functions as Self-Timed Circuits,” http://www.ee.technion.ac.il/~ran/papers/C-41-1-David-Ginosar-Yoeli-1992-STCL.pdf May skip mathematical proofs. Focus on the logic design Assignment #1 (due by 8 Nov 2009): Using the method described in the paper, design a three input XOR gate. Simulate it (by "hand" or with a logic simulator) showing inputs and outputs for all eight combinations of the input bits. Note it's a dual rail circuit and there should be empty (null, undefined) values as well as valid (data, defined) values Look for DIMS implementation method in the book, and re-implement the circuit using DIMS. Compare the two implementations on number of gates (area), power, speed, leakage, ease of design. © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

What’s the problem? An example SOC – 12 clock domains 54 Mbps 802.11 1 Mbps Bluetooth 100 Mbps Etherent 133 MHz CPU 12 Mbps USB 384 Kbps 3G 75 MHz DSP 20 MHz Flash Memory 50 MHz Memory 66 MHz PCI 1 MHz CF © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

Another, related challenge: Mesochronous SoC © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

Yet Another Challenge: DVS 01000111001110101 50MHz 1.1V 200MHz 1.3V 01000111001110101 100MHz 1.2V 1010 1010 50MHz 1.1V © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

Asynchronous alternatives in SoC Complete ASYNC chip no clocks ASYNC modules Among synchronous (clocked) modules A.k.a. mixed-mode or mixed-timing Mutually-asynchronous SYNC modules Modules are clocked with different clocks The interfaces are asynchronous A.k.a. multi-clock domains (MCD) © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

Why Asynchronous Circuits? We are used to sync design Logic and timing aspects are simpler (why?) Common arguments: Low power (works) High speed (very hard, but works too) Low emission (works) Low sensitivity to PVT variations (works) Process, Voltage, Temperature High modularity (SoC) No clock distribution and timing problems (works) Secure chips (kind of) Cannot achieve all the above at the same time… © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

Asynchronous Design and Synchronization Why Not to Go Async Overhead (area, speed, power) Hard to design Non-decomposable into small combinational logic blocks Converting sync to async is hard / does not achieve the results You have to learn something new! Few CAD tools © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

Why do we care about Async? We have to. Sync is only a nice model for small worlds. Async realities: On-chip clock domain interfaces Off-chip communication timing Sync techniques get ridiculously complex due to ignorance of Async methods Modular SoC © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

Clocking replaced by Handshaking CLK © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

Clocking replaced by Handshaking CTL CL4 REQ ACK LINK / CHANNEL TOKEN FLOW CL TRANSPARENT TO HANDSHAKING DATA EXAMPLE: © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

Asynchronous Design and Synchronization Token Flow Transfer of one token  one handshake cycle Register k is FULL when it has data When register k+1 gets the data from k, Register k+1 becomes FULL Register k now has BUBBLE ( data that has already been copied) FULL register cannot receive data. Only BUBBLE register may receive data. © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

Asynchronous Design and Synchronization Token “Preservation” Tokens do not disappear Tokens do not appear (from nowhere) One token does not overtake another A block (register or CL) with n inputs and m outputs: (when it has a BUBBLE) waits for n tokens on inputs Generates m tokens on outputs n m © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

Comments on the Tokens Game Abstract all communications (handshake) and computations Hide implementation details CL is transparent It does NOT store tokens. They only pass through Special type of CL required: “Function Blocks” Local “clocks” spread over time Lower power Lower emissions No need to synchronize events Before playing more token games, let’s consider some implementation details © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

Asynchronous Design and Synchronization Handshake Protocols Bundled data (aka “single rail”) REQ ACK DATA PUSH CHANNEL (DATA & REQ SAME DIRECTION) 4 PHASE PROTOCOL: ALWAYS LIKE THIS SOME VARIATIONS n © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

Asynchronous Design and Synchronization Handshake Protocols Bundled data (aka “single rail”) REQ PUSH CHANNEL (DATA & REQ SAME DIRECTION) ACK n DATA DATA REQ 2 PHASE PROTOCOL ACK © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

Asynchronous Design and Synchronization Bundling Assumption Each data line is a single wire “Bundled data” aka “single rail” On sender side, time(DATA) < time(REQ) This order is preserved on receiving end: Valid(DATA)  REQ [ data valid precedes REQ=1 ] Non-trivial: inter-line skew must be taken care of and hidden Placement and routing Safety margins at sending end Buffer insertion © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

Asynchronous Design and Synchronization 4-phase vs 2-phase “return to zero” (RZ) is overhead (time and power) “level signaling” “non-return to zero” (NRZ) seems to have lower overhead “transition signaling” But implementation is more complex © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

4-phase dual rail protocol Each data bit encoded into 2 wires EMPTY 0 0 VALUE d.t d.f VALID “0” 0 1 VALID “1” 1 0 Not used 1 1 ACK PUSH CHANNEL 2n DATA No REQ line, but this is how it would look like if we had one DATA EMPTY VALID EMPTY VALID EMPTY VALID ACK E 1 © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

4-phase dual rail protocol Delay Insensitive (DI) Each bit can propagate at own speed 4 phase at higher level (than signals): Sender sends valid word (V) Receiver sets ACK Sender sends empty word (E) (removes the data) Receiver sets ACK Each change is acknowledged / indicated Problems: Glitches, hazards © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

Bundled DataDual Rail © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

Asynchronous Design and Synchronization Muller C-Element A b z 0 0 0 0 1 no change 1 0 no change 1 1 1 Alternative specs: If a=b then z:=a a=b  z:=a z:=ab+z(a+b) © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

Asynchronous Design and Synchronization 1-of-4 Signaling Each 2-bits take 4 wires: 00 1000 01 0100 10 0010 11 0001 Null 0000 Still 2x wires Still no bundling assumption needed Half as many transitions (half power) Less noise sensitive © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

Asynchronous Design and Synchronization Bundled Data1-of-4 © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

DS: 1-of-2 (2-phase) Signaling (dual-rail) Each bit on two wires One wire (D) is the data value (0, 1) The other wire (S) is a “strobe”, helps with phase To change from one value to the next: If different value, toggle D If same value, toggle S Each bit alternates valid/valid/… No NULL values Potentially faster than 4-phase dual-rail 00 01 Interesting, but rarely employed. 10 11 DS Even Odd © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

Classification of Protocols Handshake / Signaling: 2-phase or 4-phase Direction: Push or pull Encoding: Bundled data (single rail), or dual rail (1-of-2), or 1-of-n (e.g. 1-of-4), or m-of-n, … © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

Acknowledgement / Indication A gate / circuit acknowledges its input if, for every input change, there is an output change. Example: Wire Non-indicating example: AND gate Acknowledges all ones: {01,10}11 Does not acknowledge 00{01,10} © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

Asynchronous Design and Synchronization Muller Pipeline “The” delay-insensitive handshake machine C[i] accepts 1/0 from C[i-1] only if C[i+1]=0/1 Think of 1010101.. as waves: 10 10 10 1.. The C-elements propagate waves precisely Timing depends on local delays, may vary along the pipe If RIGHT is quiet, pipe will fill (1010101…) and stall Same for 4-phase, 2-phase Symmetric – same right-to-left (like electrons and holes) © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

Asynchronous Design and Synchronization Pipeline Styles All based on Muller Pipeline 4-phase bundled data: similar to sync pipes based on timing assumptions 2-phase bundled data: aka micropipeline 4-phase dual rail: “the original” Muller pipe © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

4-phase bundled data circuits © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

4-phase bundled data circuits Looks like a sync pipe, with local clocks When full, the C-elements are 1010101…,  only half the latches store data Similar to master-slave flip-flops Speed limited by handshake (2-way comm) We will study better implementations © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

2-phase bundled data (micropipelines) Transition signaling Special “capture-pass” latches alternate between capture and pass © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

Capture-Pass transition-controlled latch Transitions on C and P alternate Micropipelines “Elegant”, no RZ overhead But implementation (latches and other control circuits) is complex © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

4-phase dual rail circuits Muller pipeline (again) with Completion Detection No REQ – embedded in the data © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

4-phase dual rail – many bits © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization

4-phase dual rail – function blocks DIMS – Delay Insensitive Minterm Synthesis Another example for home assignment © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization