Presentation is loading. Please wait.

Presentation is loading. Please wait.

Asynchronous Datapath Design Adders Comparators Multipliers Registers Completion Detection Bus Pipeline …..

Similar presentations


Presentation on theme: "Asynchronous Datapath Design Adders Comparators Multipliers Registers Completion Detection Bus Pipeline ….."— Presentation transcript:

1 Asynchronous Datapath Design Adders Comparators Multipliers Registers Completion Detection Bus Pipeline …..

2 Asynchronous Adder Design Motivation Background: Sync and Async adders Delay-insensitive carry-lookahead adders Complexity Analysis Conclusions

3 Motivation Integer addition is one of the most important operations in digital computer systems Statistics shows that in a prototypical RISC machine (DLX) 72% of the instructions perform additions(or subtractions) in the datapath. In ARM processors it even reaches 80%. The performance of processors is significantly influenced by the speed of their adders.

4 Background Adders: synchronous or asynchronous synchronous adders: worst case performance asynchronous adders: average case performance For example: Ripple-Carry Adders(synchronous): O(n) Carry-Completion Sensing Adders(asynchronous): O(log n)

5 Background: Binary Addition Worst case 00000001 + 11111111 ---------------------- S 00000000 C 11111111 ---------------------- 100000000 Adders can perform average case behavior Best case 00000000 + 00000000 ---------------------- S 00000000 C 00000000 ---------------------- 000000000

6 Background Ripple-Carry Adders: One-stage full adder: Logic complexity: O(n) Time complexity: O(n)

7 Background Carry-Sensing Completion Detection Adders: (asynchronous version of RCA)

8 Background One-stage CSCD Adder: Carry-Sensing Completion Detection Adders: Logic complexity: O(n) Time complexity: O(log n)

9 Background Delay-Insensitive Ripple-Carry Adders: (DI version of RCA):

10 Background One-stage DIRCA: DIRCA Adders: Logic complexity: O(n) Time complexity: O(log n) One of the most robust adders

11 Background Completion detection for asynchronous adders:

12 Background DI adder VS Bundling Constraint adder:

13 Carry-Lookahead Adders RCA requires n stage-propagation delays. For high speed processors, this scheme is undesirable. One way to improve adder performance is to use parallel processing in computing the carries. That is why Carry-Lookahead Adders (CLA) are introduced. CLAs: Logic complexity: O(n) Time complexity: O(log n)

14 Carry-Lookahead Adders

15 A module: B module:

16 DI Carry-Lookahead Adders Delay-Insensitive Carry-Lookahead Adders (DICLA) may be implemented by using delay-insensitive code. 1. dual-rail signaling: inputs, sums, and carry bits 2. one-hot code: internal signals A1=0 A0=0 A1=0 A0=1 A1=1 A0=0 A1=1 A0=1 a. No data b. valid 0 c. valid 1 d. illegal a. No data: 000 b. 001 c. 010 d. 100

17 QDI Carry-Lookahead Adders DI C module: 1. internal signals: one-hot code, k, g, p 2. input and sum bits: dual-rail signals CLA A module

18 QDI Carry-Lookahead Adders DI D module: 1. Internal signals: one-hot code, K, G, P 2. Carry bits: dual-rail signals CLA B module

19 DI Carry-Lookahead Adders

20 If A 3 =B 3 then C 3 is carry kill or generate k 3,g 3

21 DI Carry-Lookahead Adders G 3,2, K 3,2 can be used to speed up the carry computation too. k 3,g 3 K 3,2, G 3,2

22 Speeding Up DICLA Idea: Send the carry-generate’s and carry-kill’s to any possible stages which needs these information to compute carries immediately. D module with speed-up circuitry

23 Speeding Up DICLA General form: D module with speed-up circuitry for carry-kill for carry-generate = g j-1 +g j-2 P j-1 +…+g 0 p 1 p 2 …p j-1 This is in fact the full carry-lookahead scheme.

24 Speeding Up DICLA Problem of full carry-lookahead scheme practical limitations on fan-in and fan-out, irregular structure, and many long wire. logic complexity increases more than linearly Solution: use the properties of tree-like structure New speed-up circuitry:

25 SP focuses on the root node of a subtree. All leftmost root node of its right subtree

26 Power of Speed-up Circuitry x : carry chain x’ in r subtree x-x’ in l subtree

27 Power of Speed-up Circuitry Without Speed-up circuitry

28 Power of Speed-up Circuitry With Speed-up circuitry

29 Optimization: Simplified D module Simplified D’ module Better logic complexity Delay-Insensitive again

30

31 Complexity Analysis DICLASP Logic Complexity:  (n) Time Complexity:  (log log n) Best area-time efficiency:  (n log log n)

32 Complexity Analysis

33 CMOS: C module

34 CMOS: SD module

35 CMOS: SD’ module

36 SPICE Simulation: SPICE Simulation contains two parts: Random number inputs: 10000 random generated input pairs Statistical data: running examples on a 32-bit ARM emulator

37 SPICE Simulation: Random number input distribution

38 SPICE Simulation: SPICE simulation results: random number inputs Speedup: DIRCA vs RCA: 6.39 DICLASP vs CLA: 2.64

39 SPICE Simulation: Breakdown of addition/subtraction operations: by runing three benchmark programs: Dhrystone f1, Dhrystone f2 and Espresso dc2 on a 32-bit ARM simulator

40 SPICE Simulation :dynamic traces

41 SPICE Simulation: dynamic traces 83.92% instructions: |carry chain| <17

42 SPICE Simulation: SPICE simulation results: dynamic traces Average computation time: DIRCA 9.61ns DICALSP 5.25ns Speedup: DIRCA vs RCA: 4.1 DICLASP vs CLA: 2.2

43 Conclusion DICLASP Best area-time efficiency:  (n log log n) Correctness: No adder is more robust than DICLASP Cost(Logic Complexity):No parallel adder is cheaper than DICLASP (  (n)). Speed(Time Complexity):No adder is better than DICLASP (  (log log n)). Suitable for VLSI implementation.


Download ppt "Asynchronous Datapath Design Adders Comparators Multipliers Registers Completion Detection Bus Pipeline ….."

Similar presentations


Ads by Google