Presentation is loading. Please wait.

Presentation is loading. Please wait.

D ISTRIBUTED A RITHMETIC (DA) 1. D EFINITION DA is basically (but not necessarily) a bit- serial computational operation that forms an inner (dot) product.

Similar presentations


Presentation on theme: "D ISTRIBUTED A RITHMETIC (DA) 1. D EFINITION DA is basically (but not necessarily) a bit- serial computational operation that forms an inner (dot) product."— Presentation transcript:

1 D ISTRIBUTED A RITHMETIC (DA) 1

2 D EFINITION DA is basically (but not necessarily) a bit- serial computational operation that forms an inner (dot) product of a pair of vectors in single direct step. Why is it called DA? DA is so named because the arithmetic operations that appear in single processing (+,-,*) are not “lumped” in a comfortable familiar fashion, but in an often unrecognizable fashion. 2

3 Motivation The extreme computational efficiency is the most important factor and that factor can be best exploited in circuit design. By careful design one may reduce the total gate count in a signal processing arithmetic unit by a number seldom smaller than 50 percent and often larger than 80 percent. 3

4 Is DA slow? Although it seems to be slow because of the bit- serial natural, but that is not real. 1- if the number of elements in each vector is commensurate with the number of bits in each vector element, e.g., the time required to input eight 8-bits words one at a time in parallel fashion is exactly the same as the time required to input all eight words serially 2-Other modifications to increase the speed may be made by employing techniques such as bit pairing or partitioning the input words into the most significant half and least significant half, the least significant half of the most significant half, etc. 4

5 S O W HY U SE DA? The advantages of DA are best exploited in data- path circuit designing Area savings from using DA can be up to 80% and seldom less than 50% in digital signal processing hardware designs An old technique that has been revived by the wide spread use of Field Programmable Gate Arrays (FPGAs) for Digital Signal Processing (DSP) DA efficiently implements the MAC using basic building blocks (Look Up Tables) in FPGAs 5

6 I NNER - PRODUCT (DA EXAMPLE ): The following expression represents an inner-product (multiply and accumulate) operation A numerical example 6 6

7 A FEW POINTS ABOUT THE INNER-PRODUCT Consider this Note a few points A=[A 1, A 2,…, A K ] is a matrix of “ constant” values x=[x 1, x 2,…, x K ] is matrix of input “variables” Each A k is of M-bits Each x k is of N-bits y should be able large enough to accommodate the result 7

8 A P OSSIBLE H ARDWARE FOR I NNER - PRODUCT 8-bits Multiplier 8-bits Adder 8

9 T HE REALIZATION OF MULTIPLIER IN H ARDWARE Add and shift method to realize a multiplier. Example of 4 bits multiplier: (1011 * 0101) MultiplierMultiplicandPartial result Result 01010000,10110000,0000 (MSB) 00000,1011 1 0 (LSB) 10000,1011 + and << 0000,0000 + and << 0000,0000 0000,1011 + and << 0001,0110 + and 0010,11000011,0111 << 9

10 A POSSIBLE HARDWARE (NOT DA YET!!!) Let, Multi-bit AND gate Registers to hold sum of partial products Shift registers Each scaling accumulator calculates Ai X xi Shift right Adder/Subtractor 10

11 H OW DOES DA WORK ? The “basic” DA technique is bit-serial in nature DA is basically a bit-level rearrangement of the multiply and accumulate operation DA hides the explicit multiplications by ROM look-ups  an efficient technique to implement on Field Programmable Gate Arrays (FPGAs) 11

12 MOVING CLOSER TO DISTRIBUTED ARITHMETIC Consider once again Where A k are fixed coefficients, and X k are the input data words. X k is N-bits fraction signed number → | x k | < 1 x k : { b k0, b k1, b k2 ……, b k(N-1) } where bk0 is the sign bit We can express x k either as signed and magnitude or as 2’s complement … (2) Sign and magnitudeTwo’s complement 12

13 CONTINUE …. For two’s complement representation: From this point, the distributed arithmetic (DA) can be define Because this made the summation has only Instead of compute these value on line, we may precompute the values and store them in ROM with size. The input data (X) can be used to directly address the memory and the result can be dropped into an accumulator. This term represent the negative value of X, Therefore, we need a ROM of size to cover all the negative and positive value of X 13

14 L ET U S REMEMBER O UR H ARDWARE 14

15 B Y USING THE R EARRANGE EQUATION T HE ROM COMES H ERE 15

16 T HE ROM C ONSTRUCTION has only 2 K possible values i.e. (5) can be pre-calculated for all possible values of b 1n b 2n … b Kn We can store these in a look-up table of 2 K words addressed by K-bits i.e. b 1n b 2n … b Kn 16

17 E XAMPLE : The memory must contain all possible combination (16 values). b 1n b 2n b 3n b 4n Contents 00000 0001A 4 =0.11 0010A 3 =0.95 0011A 3 + A 4 =1.06 0100A 2 =-0.30 0101A 2 + A 4 = -0.19 0110A 2 + A 3 =0.65 0111A 2 + A 3 + A 4 =0.75 1000A 1 =0.72 1001A 1 + A 4 =0.83 1010A 1 + A 3 =1.67 1011A 1 + A 3 + A 4 =1.78 1100A 1 + A 2 =0.42 1101A 1 + A 2 + A 4 =0.53 1110A 1 + A 2 + A 3 =1.37 1111A 1 + A 2 + A 3 + A 4 =1.48 What about negative value? 17

18 C ONTINUE …. n=0 TsTs b 1n b 2n b 3n b 4n Contents 100000 10001-A 4 =-0.11 10010-A 3 =-0.95 10011-(A 3 + A 4 )=-1.06 10100-A 2 =0.30 10101-(A 2 + A 4 )= 0.19 10110-(A 2 + A 3 )=-0.65 10111-(A 2 + A 3 + A 4 )=-0.75 11000-A 1 =-0.72 11001-(A 1 + A 4 )=-0.83 11010-(A 1 + A 3 )=-1.67 11011-(A 1 + A 3 + A 4 )=-1.78 11100-(A 1 + A 2 )=-0.42 11101-(A 1 + A 2 + A 4 )=-0.53 11110-(A 1 + A 2 + A 3 )=-1.37 11111-(A 1 + A 2 + A 3 + A 4 )=-1.48 1<=n<=(N-1) TsTs b 1n b 2n b 3n b 4n Contents 000000 00001A 4 =0.11 00010A 3 =0.95 00011A 3 + A 4 =1.06 00100A 2 =-0.30 00101A 2 + A 4 = -0.19 00110A 2 + A 3 =0.65 00111A 2 + A 3 + A 4 =0.75 01000A 1 =0.72 01001A 1 + A 4 =0.83 01010A 1 + A 3 =1.67 01011A 1 + A 3 + A 4 =1.78 01100A 1 + A 2 =0.42 01101A 1 + A 2 + A 4 =0.53 01110A 1 + A 2 + A 3 =1.37 01111A 1 + A 2 + A 3 + A 4 =1.48 It is a single-bit timing signal. During the sign-bit time the control signal Ts=1, otherwise Ts=0. Address 5 bitsData 18

19 A DDER AND FULL M EMORY 19 The input data (2’s-complement number) is delivered in a one- bit-at-a-time (1BAAT) fashion During the sign-bit time Ts= 1, otherwise, Ts=0 When SWA in position: 1- Add and Shift, during the accumulation time. 2- Pass the final result to y (output).

20 K EY I SSUE : ROM S IZE The size of ROM is very important for high speed implementation as well as area efficiency ROM size grows exponentially with each added input address line The number of address lines are equal to the number of elements in the vector plus one i.e. K+1 Elements up to 16 and more are common => 2 17 =128K of ROM!!! We have to reduce the size of ROM 20

21 21 S OLUTION OF THE MEMORY SIZE :  First solution is to reduce the size of memory to 2 K instead of 2 K+1 by modify the adder to adder/ subtractor and using Ts as add/sub-control line.

22 S OLUTION OF THE MEMORY SIZE : Second reduction can lead to reduce the ROM size to 2 K-1. 2‘s-complement=1’s-complement+1 22

23 C ONTINUE …. 23

24 U SING THE NEW FORMULA OF X K : 24 &

25 THE NEW FORMULATION IN OFFSET CODE Constant 25

26 THE BENEFIT: ONLY HALF VALUES TO STORE b 1n b 2n b 3n b 4n c 1n c 2n c 3n c 4n Contents 0000 -1/2 (A 1 + A 2 + A 3 + A 4 ) = -0.74 0001 1-1/2 (A 1 + A 2 + A 3 - A 4 ) = - 0.63 0010 1 -1/2 (A 1 + A 2 - A 3 + A 4 ) = 0.21 0011 1 1-1/2 (A 1 + A 2 - A 3 - A 4 ) = 0.32 0100 1 -1/2 (A 1 - A 2 + A 3 + A 4 ) = -1.04 0101 1 1-1/2 (A 1 - A 2 + A 3 - A 4 ) = - 0.93 0110 1 1 -1/2 (A 1 - A 2 - A 3 + A 4 ) = - 0.09 0111 1 1 1-1/2 (A 1 - A 2 - A 3 - A 4 ) = 0.02 1000 1 -1/2 (-A 1 + A 2 + A 3 + A 4 ) = -0.02 1001 1 1-1/2 (-A 1 + A 2 + A 3 - A 4 ) = 0.09 1010 1 1 -1/2 (-A 1 + A 2 - A 3 + A 4 ) = 0.93 1011 1 1 1-1/2 (-A 1 + A 2 - A 3 - A 4 ) = 1.04 1100 1 1 -1/2 (-A 1 - A 2 + A 3 + A 4 ) = - 0.32 1101 1 1 1-1/2 (-A 1 - A 2 + A 3 - A 4 ) = - 0.21 1110 1 1 1-1/2 (-A 1 - A 2 - A 3 + A 4 ) = 0.63 1111 1 1 1 1-1/2 (-A 1 - A 2 - A 3 - A 4 ) = 0.74 Inverse symmetry 26 b 1n b 2n b 3n b 4n c 1n c 2n c 3n c 4n Contents 0000 -1/2 (A 1 + A 2 + A 3 + A 4 ) = -0.74 0001 1-1/2 (A 1 + A 2 + A 3 - A 4 ) = - 0.63 0010 1 -1/2 (A 1 + A 2 - A 3 + A 4 ) = 0.21 0011 1 1-1/2 (A 1 + A 2 - A 3 - A 4 ) = 0.32 0100 1 -1/2 (A 1 - A 2 + A 3 + A 4 ) = -1.04 0101 1 1-1/2 (A 1 - A 2 + A 3 - A 4 ) = - 0.93 0110 1 1 -1/2 (A 1 - A 2 - A 3 + A 4 ) = - 0.09 0111 1 1 1-1/2 (A 1 - A 2 - A 3 - A 4 ) = 0.02 1111 1 1 1 1-1/2 (-A 1 - A 2 - A 3 - A 4 ) = 0.74 1110 1 1 1-1/2 (-A 1 - A 2 - A 3 + A 4 ) = 0.63 1101 1 1 1-1/2 (-A 1 - A 2 + A 3 - A 4 ) = - 0.21 1100 1 1 -1/2 (-A 1 - A 2 + A 3 + A 4 ) = - 0.32 1011 1 1 1-1/2 (-A 1 + A 2 - A 3 - A 4 ) = 1.04 1010 1 1 -1/2 (-A 1 + A 2 - A 3 + A 4 ) = 0.93 1001 1 1-1/2 (-A 1 + A 2 + A 3 - A 4 ) = 0.09 1000 1 -1/2 (-A 1 + A 2 + A 3 + A 4 ) = -0.02 inv +ve Normal -ve

27 H ARDWARE U SING O FFSET C ODING x1 selects between the two symmetric halves Ts indicates when the sign bit arrives 27 The rest of input data

28 I NCREASING THE S PEED OF DA Because the data enter in serial ;One Bit At A Time (1 BAAT) No. of Clock Cycles Required = N If K=N, then essentially we are taking N clock cycle per dot product. If K>N the DA processor is faster than single parallel multiplier/accumulator. The speed can be increase interring L bit instead of one. We could have 2 BAAT or up to N BAAT in the extreme case N BAAT  One complete result/cycle 28

29 29 Two bit at a time The same contain in both memories Two bits shift We can increase the parallelism as much as we want, but that will lead to increase the number of need gate. 29

30 INCREASE SPEED BY REDUCING THE CARRY PROPAGATION OF ADDER The speed in the critical path is limited by the width of the carry propagation Speed can be improved upon by using techniques to limit the carry propagation Carry Save adder. Carry-Skip adder Carry-lookahead adder 30

31 C ONCLUSION DA is a very efficient means to mechanize computations that are dominated by inner products. By using DA instead of traditional way, a huge reduction in area can be happened. DA is not slow cause of it serial nature, yet sometimes it is faster than parallel one. 31

32 Question ? 32


Download ppt "D ISTRIBUTED A RITHMETIC (DA) 1. D EFINITION DA is basically (but not necessarily) a bit- serial computational operation that forms an inner (dot) product."

Similar presentations


Ads by Google