Fall 2009 / Winter 2010 Ran Ginosar (www.ee.technion.ac.il/~ran) VLSI Architectures Fall 2009 / Winter 2010 Ran Ginosar (www.ee.technion.ac.il/~ran) © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
Asynchronous Design and Synchronization Topics Asynchronous VLSI design SoC (System on Chip) global timing, clocking, synchronization Many-core parallel processors on chip © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
Asynchronous Design and Synchronization Sources Sparsø and Furber, Principles of Asynchronous Circuit Design, Ch. 1-12 Free copy: http://www.ee.technion.ac.il/courses/048878/book.pdf Dally and Poulton, Digital System Engineering, Ch. 9-10 Journal / Conference Papers Slides will be posted on the web. No paper copies. © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
Asynchronous Design and Synchronization Requirements Attendance (roll call) Readings Homework Final Project © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
Asynchronous Design and Synchronization First Assignment Read Chapters 1-2 in Sparso & Furber Read David,Ginosar&Yoeli, “An Efficient Implementation of Boolean Functions as Self-Timed Circuits,” http://www.ee.technion.ac.il/~ran/papers/C-41-1-David-Ginosar-Yoeli-1992-STCL.pdf May skip mathematical proofs. Focus on the logic design Assignment #1 (due by 8 Nov 2009): Using the method described in the paper, design a three input XOR gate. Simulate it (by "hand" or with a logic simulator) showing inputs and outputs for all eight combinations of the input bits. Note it's a dual rail circuit and there should be empty (null, undefined) values as well as valid (data, defined) values Look for DIMS implementation method in the book, and re-implement the circuit using DIMS. Compare the two implementations on number of gates (area), power, speed, leakage, ease of design. © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
What’s the problem? An example SOC – 12 clock domains 54 Mbps 802.11 1 Mbps Bluetooth 100 Mbps Etherent 133 MHz CPU 12 Mbps USB 384 Kbps 3G 75 MHz DSP 20 MHz Flash Memory 50 MHz Memory 66 MHz PCI 1 MHz CF © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
Another, related challenge: Mesochronous SoC © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
Yet Another Challenge: DVS 01000111001110101 50MHz 1.1V 200MHz 1.3V 01000111001110101 100MHz 1.2V 1010 1010 50MHz 1.1V © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
Asynchronous alternatives in SoC Complete ASYNC chip no clocks ASYNC modules Among synchronous (clocked) modules A.k.a. mixed-mode or mixed-timing Mutually-asynchronous SYNC modules Modules are clocked with different clocks The interfaces are asynchronous A.k.a. multi-clock domains (MCD) © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
Why Asynchronous Circuits? We are used to sync design Logic and timing aspects are simpler (why?) Common arguments: Low power (works) High speed (very hard, but works too) Low emission (works) Low sensitivity to PVT variations (works) Process, Voltage, Temperature High modularity (SoC) No clock distribution and timing problems (works) Secure chips (kind of) Cannot achieve all the above at the same time… © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
Asynchronous Design and Synchronization Why Not to Go Async Overhead (area, speed, power) Hard to design Non-decomposable into small combinational logic blocks Converting sync to async is hard / does not achieve the results You have to learn something new! Few CAD tools © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
Why do we care about Async? We have to. Sync is only a nice model for small worlds. Async realities: On-chip clock domain interfaces Off-chip communication timing Sync techniques get ridiculously complex due to ignorance of Async methods Modular SoC © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
Clocking replaced by Handshaking CLK © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
Clocking replaced by Handshaking CTL CL4 REQ ACK LINK / CHANNEL TOKEN FLOW CL TRANSPARENT TO HANDSHAKING DATA EXAMPLE: © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
Asynchronous Design and Synchronization Token Flow Transfer of one token one handshake cycle Register k is FULL when it has data When register k+1 gets the data from k, Register k+1 becomes FULL Register k now has BUBBLE ( data that has already been copied) FULL register cannot receive data. Only BUBBLE register may receive data. © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
Asynchronous Design and Synchronization Token “Preservation” Tokens do not disappear Tokens do not appear (from nowhere) One token does not overtake another A block (register or CL) with n inputs and m outputs: (when it has a BUBBLE) waits for n tokens on inputs Generates m tokens on outputs n m © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
Comments on the Tokens Game Abstract all communications (handshake) and computations Hide implementation details CL is transparent It does NOT store tokens. They only pass through Special type of CL required: “Function Blocks” Local “clocks” spread over time Lower power Lower emissions No need to synchronize events Before playing more token games, let’s consider some implementation details © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
Asynchronous Design and Synchronization Handshake Protocols Bundled data (aka “single rail”) REQ ACK DATA PUSH CHANNEL (DATA & REQ SAME DIRECTION) 4 PHASE PROTOCOL: ALWAYS LIKE THIS SOME VARIATIONS n © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
Asynchronous Design and Synchronization Handshake Protocols Bundled data (aka “single rail”) REQ PUSH CHANNEL (DATA & REQ SAME DIRECTION) ACK n DATA DATA REQ 2 PHASE PROTOCOL ACK © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
Asynchronous Design and Synchronization Bundling Assumption Each data line is a single wire “Bundled data” aka “single rail” On sender side, time(DATA) < time(REQ) This order is preserved on receiving end: Valid(DATA) REQ [ data valid precedes REQ=1 ] Non-trivial: inter-line skew must be taken care of and hidden Placement and routing Safety margins at sending end Buffer insertion © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
Asynchronous Design and Synchronization 4-phase vs 2-phase “return to zero” (RZ) is overhead (time and power) “level signaling” “non-return to zero” (NRZ) seems to have lower overhead “transition signaling” But implementation is more complex © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
4-phase dual rail protocol Each data bit encoded into 2 wires EMPTY 0 0 VALUE d.t d.f VALID “0” 0 1 VALID “1” 1 0 Not used 1 1 ACK PUSH CHANNEL 2n DATA No REQ line, but this is how it would look like if we had one DATA EMPTY VALID EMPTY VALID EMPTY VALID ACK E 1 © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
4-phase dual rail protocol Delay Insensitive (DI) Each bit can propagate at own speed 4 phase at higher level (than signals): Sender sends valid word (V) Receiver sets ACK Sender sends empty word (E) (removes the data) Receiver sets ACK Each change is acknowledged / indicated Problems: Glitches, hazards © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
Bundled DataDual Rail © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
Asynchronous Design and Synchronization Muller C-Element A b z 0 0 0 0 1 no change 1 0 no change 1 1 1 Alternative specs: If a=b then z:=a a=b z:=a z:=ab+z(a+b) © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
Asynchronous Design and Synchronization 1-of-4 Signaling Each 2-bits take 4 wires: 00 1000 01 0100 10 0010 11 0001 Null 0000 Still 2x wires Still no bundling assumption needed Half as many transitions (half power) Less noise sensitive © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
Asynchronous Design and Synchronization Bundled Data1-of-4 © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
DS: 1-of-2 (2-phase) Signaling (dual-rail) Each bit on two wires One wire (D) is the data value (0, 1) The other wire (S) is a “strobe”, helps with phase To change from one value to the next: If different value, toggle D If same value, toggle S Each bit alternates valid/valid/… No NULL values Potentially faster than 4-phase dual-rail 00 01 Interesting, but rarely employed. 10 11 DS Even Odd © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
Classification of Protocols Handshake / Signaling: 2-phase or 4-phase Direction: Push or pull Encoding: Bundled data (single rail), or dual rail (1-of-2), or 1-of-n (e.g. 1-of-4), or m-of-n, … © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
Acknowledgement / Indication A gate / circuit acknowledges its input if, for every input change, there is an output change. Example: Wire Non-indicating example: AND gate Acknowledges all ones: {01,10}11 Does not acknowledge 00{01,10} © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
Asynchronous Design and Synchronization Muller Pipeline “The” delay-insensitive handshake machine C[i] accepts 1/0 from C[i-1] only if C[i+1]=0/1 Think of 1010101.. as waves: 10 10 10 1.. The C-elements propagate waves precisely Timing depends on local delays, may vary along the pipe If RIGHT is quiet, pipe will fill (1010101…) and stall Same for 4-phase, 2-phase Symmetric – same right-to-left (like electrons and holes) © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
Asynchronous Design and Synchronization Pipeline Styles All based on Muller Pipeline 4-phase bundled data: similar to sync pipes based on timing assumptions 2-phase bundled data: aka micropipeline 4-phase dual rail: “the original” Muller pipe © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
4-phase bundled data circuits © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
4-phase bundled data circuits Looks like a sync pipe, with local clocks When full, the C-elements are 1010101…, only half the latches store data Similar to master-slave flip-flops Speed limited by handshake (2-way comm) We will study better implementations © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
2-phase bundled data (micropipelines) Transition signaling Special “capture-pass” latches alternate between capture and pass © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
Capture-Pass transition-controlled latch Transitions on C and P alternate Micropipelines “Elegant”, no RZ overhead But implementation (latches and other control circuits) is complex © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
4-phase dual rail circuits Muller pipeline (again) with Completion Detection No REQ – embedded in the data © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
4-phase dual rail – many bits © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization
4-phase dual rail – function blocks DIMS – Delay Insensitive Minterm Synthesis Another example for home assignment © 2002-2009 Ran Ginosar Asynchronous Design and Synchronization