Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fall 2009 / Winter 2010 Ran Ginosar (www.ee.technion.ac.il/~ran)

Similar presentations


Presentation on theme: "Fall 2009 / Winter 2010 Ran Ginosar (www.ee.technion.ac.il/~ran)"— Presentation transcript:

1 Fall 2009 / Winter 2010 Ran Ginosar (www.ee.technion.ac.il/~ran)
VLSI Architectures Fall 2009 / Winter 2010 Ran Ginosar ( © Ran Ginosar Asynchronous Design and Synchronization

2 Asynchronous Design and Synchronization
Topics Asynchronous VLSI design SoC (System on Chip) global timing, clocking, synchronization  Many-core parallel processors on chip © Ran Ginosar Asynchronous Design and Synchronization

3 Asynchronous Design and Synchronization
Sources Sparsø and Furber, Principles of Asynchronous Circuit Design, Ch. 1-12 Free copy: Dally and Poulton, Digital System Engineering, Ch. 9-10 Journal / Conference Papers Slides will be posted on the web. No paper copies. © Ran Ginosar Asynchronous Design and Synchronization

4 Asynchronous Design and Synchronization
Requirements Attendance (roll call) Readings Homework Final Project © Ran Ginosar Asynchronous Design and Synchronization

5 Asynchronous Design and Synchronization
First Assignment Read Chapters 1-2 in Sparso & Furber Read David,Ginosar&Yoeli, “An Efficient Implementation of Boolean Functions as Self-Timed Circuits,” May skip mathematical proofs. Focus on the logic design Assignment #1 (due by 8 Nov 2009): Using the method described in the paper, design a three input XOR gate. Simulate it (by "hand" or with a logic simulator) showing inputs and outputs for all eight combinations of the input bits. Note it's a dual rail circuit and there should be empty (null, undefined) values as well as valid (data, defined) values Look for DIMS implementation method in the book, and re-implement the circuit using DIMS. Compare the two implementations on number of gates (area), power, speed, leakage, ease of design. © Ran Ginosar Asynchronous Design and Synchronization

6 What’s the problem? An example SOC – 12 clock domains
54 Mbps 1 Mbps Bluetooth 100 Mbps Etherent 133 MHz CPU 12 Mbps USB 384 Kbps 3G 75 MHz DSP 20 MHz Flash Memory 50 MHz Memory 66 MHz PCI 1 MHz CF © Ran Ginosar Asynchronous Design and Synchronization

7 Another, related challenge: Mesochronous SoC
© Ran Ginosar Asynchronous Design and Synchronization

8 Yet Another Challenge: DVS
50MHz 1.1V 200MHz 1.3V 100MHz 1.2V 1010 1010 50MHz 1.1V © Ran Ginosar Asynchronous Design and Synchronization

9 Asynchronous alternatives in SoC
Complete ASYNC chip no clocks ASYNC modules Among synchronous (clocked) modules A.k.a. mixed-mode or mixed-timing Mutually-asynchronous SYNC modules Modules are clocked with different clocks The interfaces are asynchronous A.k.a. multi-clock domains (MCD) © Ran Ginosar Asynchronous Design and Synchronization

10 Why Asynchronous Circuits?
We are used to sync design Logic and timing aspects are simpler (why?) Common arguments: Low power (works) High speed (very hard, but works too) Low emission (works) Low sensitivity to PVT variations (works) Process, Voltage, Temperature High modularity (SoC) No clock distribution and timing problems (works) Secure chips (kind of) Cannot achieve all the above at the same time… © Ran Ginosar Asynchronous Design and Synchronization

11 Asynchronous Design and Synchronization
Why Not to Go Async Overhead (area, speed, power) Hard to design Non-decomposable into small combinational logic blocks Converting sync to async is hard / does not achieve the results You have to learn something new! Few CAD tools © Ran Ginosar Asynchronous Design and Synchronization

12 Why do we care about Async?
We have to. Sync is only a nice model for small worlds. Async realities: On-chip clock domain interfaces Off-chip communication timing Sync techniques get ridiculously complex due to ignorance of Async methods Modular SoC © Ran Ginosar Asynchronous Design and Synchronization

13 Clocking replaced by Handshaking
CLK © Ran Ginosar Asynchronous Design and Synchronization

14 Clocking replaced by Handshaking
CTL CL4 REQ ACK LINK / CHANNEL TOKEN FLOW CL TRANSPARENT TO HANDSHAKING DATA EXAMPLE: © Ran Ginosar Asynchronous Design and Synchronization

15 Asynchronous Design and Synchronization
Token Flow Transfer of one token  one handshake cycle Register k is FULL when it has data When register k+1 gets the data from k, Register k+1 becomes FULL Register k now has BUBBLE ( data that has already been copied) FULL register cannot receive data. Only BUBBLE register may receive data. © Ran Ginosar Asynchronous Design and Synchronization

16 Asynchronous Design and Synchronization
Token “Preservation” Tokens do not disappear Tokens do not appear (from nowhere) One token does not overtake another A block (register or CL) with n inputs and m outputs: (when it has a BUBBLE) waits for n tokens on inputs Generates m tokens on outputs n m © Ran Ginosar Asynchronous Design and Synchronization

17 Comments on the Tokens Game
Abstract all communications (handshake) and computations Hide implementation details CL is transparent It does NOT store tokens. They only pass through Special type of CL required: “Function Blocks” Local “clocks” spread over time Lower power Lower emissions No need to synchronize events Before playing more token games, let’s consider some implementation details © Ran Ginosar Asynchronous Design and Synchronization

18 Asynchronous Design and Synchronization
Handshake Protocols Bundled data (aka “single rail”) REQ ACK DATA PUSH CHANNEL (DATA & REQ SAME DIRECTION) 4 PHASE PROTOCOL: ALWAYS LIKE THIS SOME VARIATIONS n © Ran Ginosar Asynchronous Design and Synchronization

19 Asynchronous Design and Synchronization
Handshake Protocols Bundled data (aka “single rail”) REQ PUSH CHANNEL (DATA & REQ SAME DIRECTION) ACK n DATA DATA REQ 2 PHASE PROTOCOL ACK © Ran Ginosar Asynchronous Design and Synchronization

20 Asynchronous Design and Synchronization
Bundling Assumption Each data line is a single wire “Bundled data” aka “single rail” On sender side, time(DATA) < time(REQ) This order is preserved on receiving end: Valid(DATA)  REQ [ data valid precedes REQ=1 ] Non-trivial: inter-line skew must be taken care of and hidden Placement and routing Safety margins at sending end Buffer insertion © Ran Ginosar Asynchronous Design and Synchronization

21 Asynchronous Design and Synchronization
4-phase vs 2-phase “return to zero” (RZ) is overhead (time and power) “level signaling” “non-return to zero” (NRZ) seems to have lower overhead “transition signaling” But implementation is more complex © Ran Ginosar Asynchronous Design and Synchronization

22 4-phase dual rail protocol
Each data bit encoded into 2 wires EMPTY VALUE d.t d.f VALID “0” VALID “1” Not used ACK PUSH CHANNEL 2n DATA No REQ line, but this is how it would look like if we had one DATA EMPTY VALID EMPTY VALID EMPTY VALID ACK E 1 © Ran Ginosar Asynchronous Design and Synchronization

23 4-phase dual rail protocol
Delay Insensitive (DI) Each bit can propagate at own speed 4 phase at higher level (than signals): Sender sends valid word (V) Receiver sets ACK Sender sends empty word (E) (removes the data) Receiver sets ACK Each change is acknowledged / indicated Problems: Glitches, hazards © Ran Ginosar Asynchronous Design and Synchronization

24 Bundled DataDual Rail
© Ran Ginosar Asynchronous Design and Synchronization

25 Asynchronous Design and Synchronization
Muller C-Element A b z no change no change Alternative specs: If a=b then z:=a a=b  z:=a z:=ab+z(a+b) © Ran Ginosar Asynchronous Design and Synchronization

26 Asynchronous Design and Synchronization
1-of-4 Signaling Each 2-bits take 4 wires: Null 0000 Still 2x wires Still no bundling assumption needed Half as many transitions (half power) Less noise sensitive © Ran Ginosar Asynchronous Design and Synchronization

27 Asynchronous Design and Synchronization
Bundled Data1-of-4 © Ran Ginosar Asynchronous Design and Synchronization

28 DS: 1-of-2 (2-phase) Signaling (dual-rail)
Each bit on two wires One wire (D) is the data value (0, 1) The other wire (S) is a “strobe”, helps with phase To change from one value to the next: If different value, toggle D If same value, toggle S Each bit alternates valid/valid/… No NULL values Potentially faster than 4-phase dual-rail 00 01 Interesting, but rarely employed. 10 11 DS Even Odd © Ran Ginosar Asynchronous Design and Synchronization

29 Classification of Protocols
Handshake / Signaling: 2-phase or 4-phase Direction: Push or pull Encoding: Bundled data (single rail), or dual rail (1-of-2), or of-n (e.g. 1-of-4), or m-of-n, … © Ran Ginosar Asynchronous Design and Synchronization

30 Acknowledgement / Indication
A gate / circuit acknowledges its input if, for every input change, there is an output change. Example: Wire Non-indicating example: AND gate Acknowledges all ones: {01,10}11 Does not acknowledge 00{01,10} © Ran Ginosar Asynchronous Design and Synchronization

31 Asynchronous Design and Synchronization
Muller Pipeline “The” delay-insensitive handshake machine C[i] accepts 1/0 from C[i-1] only if C[i+1]=0/1 Think of as waves: The C-elements propagate waves precisely Timing depends on local delays, may vary along the pipe If RIGHT is quiet, pipe will fill ( …) and stall Same for 4-phase, 2-phase Symmetric – same right-to-left (like electrons and holes) © Ran Ginosar Asynchronous Design and Synchronization

32 Asynchronous Design and Synchronization
Pipeline Styles All based on Muller Pipeline 4-phase bundled data: similar to sync pipes based on timing assumptions 2-phase bundled data: aka micropipeline 4-phase dual rail: “the original” Muller pipe © Ran Ginosar Asynchronous Design and Synchronization

33 4-phase bundled data circuits
© Ran Ginosar Asynchronous Design and Synchronization

34 4-phase bundled data circuits
Looks like a sync pipe, with local clocks When full, the C-elements are …,  only half the latches store data Similar to master-slave flip-flops Speed limited by handshake (2-way comm) We will study better implementations © Ran Ginosar Asynchronous Design and Synchronization

35 2-phase bundled data (micropipelines)
Transition signaling Special “capture-pass” latches alternate between capture and pass © Ran Ginosar Asynchronous Design and Synchronization

36 Capture-Pass transition-controlled latch
Transitions on C and P alternate Micropipelines “Elegant”, no RZ overhead But implementation (latches and other control circuits) is complex © Ran Ginosar Asynchronous Design and Synchronization

37 4-phase dual rail circuits
Muller pipeline (again) with Completion Detection No REQ – embedded in the data © Ran Ginosar Asynchronous Design and Synchronization

38 4-phase dual rail – many bits
© Ran Ginosar Asynchronous Design and Synchronization

39 4-phase dual rail – function blocks
DIMS – Delay Insensitive Minterm Synthesis Another example for home assignment © Ran Ginosar Asynchronous Design and Synchronization


Download ppt "Fall 2009 / Winter 2010 Ran Ginosar (www.ee.technion.ac.il/~ran)"

Similar presentations


Ads by Google