Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Survey on Low Power Multiplication / Accumulation Speaker : Byoung-Woon Kim.

Similar presentations


Presentation on theme: "A Survey on Low Power Multiplication / Accumulation Speaker : Byoung-Woon Kim."— Presentation transcript:

1 A Survey on Low Power Multiplication / Accumulation Speaker : Byoung-Woon Kim

2 Contents Introduction [1] Interlaced Accumulation Programming [2] Operand Swapping [3] Selective Coefficient Negation [4] Coefficient Optimization [5] Coefficient Reordering Conclusion & Future Works

3 Power Distribution of a DSP Hirotsugu [ISLPED ‘96] : For each test programs Control Address Generation Data Op. Memory Pin Peripheral Clocking Bus Misc. Normalized Power Consumption (%) 10 20 30 40 Variation due to Data Dependency

4 ALU MULT ACC PR XY MUL > (5 * ALU) X Y [ Modified Booth Encoding ] One of 0, X, -X, 2X, -2X based on each 2 bits of Y Multiplication and Accumulation: MAC Major operation in DSP PR CSA CPA

5 Power Consumption by a Multiplier Power Consumption by Data Dependency X : Energy per cycle Y : # of input transitions 1 2 204060 (nJ) 36-bit ALU 2 4 2040 (nJ) 16x16 MPY 1 3 6 8 7 5 Average = 7nJ Little Correlation

6 Power Consumption by a Multiplier What is an important input in terms of power ? 2 4 1015 (nJ) 0x8000 x (random) 1 3 6 8 7 5 5 2 4 1015 (nJ) (random) x 0x8000 1 3 6 8 7 5 5 Average = 5nJ Average = 1nJ

7 Power Consumption by a Multiplier Booth encoding is a significant overhead. 2 4 1015 (nJ) 0x5555 x (random) 1 3 6 8 7 5 5 2 4 1015 (nJ) (random) x 0x5555 1 3 6 8 7 5 5 Average = 6nJ Average = 4nJ

8 Interlaced Accumulation Programming(1/2) Hirotsugu [ISLPED ‘96] Y(k) = C0 * X(k ) + C1 * X(k-1) + C2 * X(k-2) Y(k+1) = C0 * X(k+1) + C1 * X(k ) + C2 * X(k-1) Y(k+2) = C0 * X(k+2) + C1 * X(k+1) + C2 * X(k ) 1 23 4 56 3-tap FIR filter (n=3) 2 46 1 35 Y(k) = C0 * X(k ) + C1 * X(k-1) + C2 * X(k-2) Y(k+1) = C0 * X(k+1) + C1 * X(k ) + C2 * X(k-1) Y(k+2) = C0 * X(k+2) + C1 * X(k+1) + C2 * X(k )

9 Interlaced Accumulation Programming(2/2) More than 40% power is saved by –Keeping a constant at one operand of multiplier X is kept: 7nJ -> 5 ~ 6nJ Y is kept: 7nJ -> 1 ~ 3nJ –Reducing the number of memory access by a half Traditional : two memory operands Interlaced : one memory operand ( data re-use by temporary register )

10 Operand Swapping (1/2) Weight= how many additions are needed ? By Booth Encoding 00111100 00X000X0 Y= Weight = 2 7FFFAAAA 0001AAAA 7FFF6666 0001AAAA 7FFFAAAA 0001 ABA*BB*A 22.0 31.6 28.8 10.0 12.2 Saving 54% 68% 58% Current (mW)Operands Low Weight High Switching

11 Operand Swapping (2/2) For filter operations, one operand is usually is constant. => Operand swapping in compile-time. X Y LowS HighS LowSHighSLowSHighS LowW ->LowWHighW ->HighW LowW ->HighW LowS: Low switching HighS: High switching 4.0 7.7 9.5 13.0 11.9 21.6 21.2 31.2 19.2 27.5 Current (mA) LowW: Low weight HighW: High weight Candidate for Operand Swapping

12 Selective Coefficient Negation To reduce the toggle –store Coeff[i] or -Coeff[i] on memory According to the negation, –use `multiply and add’ (MAC+ instruction) –use `multiply and sub’ (MAC- instruction) GSM Vocoder : 11% power reduction ACC = ACC + (X * Y) ACC = ACC - (X * Y)

13 Coefficient Optimization Mahesh [TVLSI ‘98] The design of the finite wordlength FIR filter –Given N coefficients and constraints, –Find a new set of coefficients such that the total Hamming distance between successive coefficients is minimized. => using a coefficient perturbation & an algorithm similar to simulated annealing But, Hamming distance is not a good cost-function !!!

14 Coefficient Ordering MAC operation : commutative, associative Finding a good ordering –N! cases for a N-tap filter Y(k) = C0 * X(k ) + C1 * X(k-1) + C2 * X(k-2) Y(k) = C1 * X(k-1 ) + C0 * X(k ) + C2 * X(k-2)

15 Conclusion & Future Works Power characteristics of a multiplier Some techniques for low power MACs –Interlaced accumulation programming –Operand swapping –Selective coefficient negation –Coefficient optimization & ordering Find an accurate power model for a multiplier –Cost function for coefficient optimization & instruction-level power optimization An implementation of a multiplier supporting –Selective ‘operand swapping’ & ‘negation’


Download ppt "A Survey on Low Power Multiplication / Accumulation Speaker : Byoung-Woon Kim."

Similar presentations


Ads by Google