Presentation is loading. Please wait.

Presentation is loading. Please wait.

Distributed Arithmetic Dr Sumam David S. Dept. of E&C, NITK Surathkal Courtesy for slides – Xilinx Professor’s Workshop Resources.

Similar presentations


Presentation on theme: "Distributed Arithmetic Dr Sumam David S. Dept. of E&C, NITK Surathkal Courtesy for slides – Xilinx Professor’s Workshop Resources."— Presentation transcript:

1 Distributed Arithmetic Dr Sumam David S. Dept. of E&C, NITK Surathkal Courtesy for slides – Xilinx Professor’s Workshop Resources

2 Objective Distributed arithmetic  What ?  Where ?  How ?

3 What is DA? Multiplication using LUT Used to implement multipliers in LUT rich FPGAs

4 Twos Complement Multiplication One bit at a time:

5 SDA 1-Tap FIR Filter X0X0 Partial Product ROM A0 1 N BITS WIDE SAMPLE DATA +/- Z -1 Scaling Accumulator LUT contains two locations C0C0 A0 0 1 Parallel to serial converter

6 = Sign Extension C0 = (-7) X0 = ( 7)X ( ( (-49) C1 = ( 6) X1 = ( 5)X ) ) ) ) ( 30) = (-1) (-14) (-4) (0) (-19) (Serial-Data / Tap-Parallel Multiply) Distributed Arithmetic for a 2-Tap Filter Partial products of equal weight are added together before being summed to next higher partial product weight Create look-up table of summed partial products

7 SDA 2-Tap FIR Filter LUT contains all possible sums of the partial products C0C0 C 0 + C 1 C1C1 X0X0 X1X1 A0 A1 1 N BITS WIDE SAMPLE DATA Partial Product ROM +/- Z -1 Scaling Accumulator

8 C3C3 + SDA 4-Tap FIR Filter X0X C0C0 X1X1 A0 A1 N BITS WIDE SAMPLE DATA C1C1 + +/- Z -1 Scaling Accumulator 1 X2X C2C2 X3X3 A2 A3 1 + Partial Product ROM 1

9 SDA 8-Tap FIR Filter N BITS WIDE SAMPLE DATA + +/- Z -1 Scaling Accumulator Partial Product ROM X0X0 X1X1 A0 A1 1 X2X2 X3X3 A2 A3 1 1 Partial Product ROM X4X4 X5X5 A0 A1 1 X6X6 X7X7 A2 A input LUT contains all possible sums of the partial products Pre-Adder 1

10 fclk = 200 MHz for both processor and FPGA B = data sample precision for FPGA Xilinx DA FIR Performance Filter Length (Taps) Performance (MMACs/s) Serial FPGA FIR Dual MAC DA FIR B=8 DA FIR B=12 DA FIR B= Sample Rate (MSPS) Single MAC DA FIR B=8 DA FIR B=12 DA FIR B= Serial FPGA FIR Filter Length (Taps)

11 The sample is serialized and processed 1 bit per clock cycle. 8 clock cycles are thus required to process the whole sample The sample is serialized and processed 2 bits per clock cycle. 4 clock cycles are thus required to process the whole sample The sample is serialized and processed 4 bits per clock cycle The sample is processed in parallel 8 bits per clock cycle b0b0 b0b0 b0b0 b3b3 b4b4 b7b7 b3b3 b4b4 b7b7 b0b0 b0b0 b7b7 Serial-DAParallel-DA Multi bits per clock cycle Trade Clock Cycles for Logic Area 20Ms/s 160Ms/s Hardware Over-sampling = 8 b0b0 b7b7 Hardware Over-sampling = 1 Trade Clock Cycles for Logic Area Hardware Over-sampling = 4 Hardware Over-sampling = 2

12 Conclusion Efficiency of computation Slow as its bit serial Memory requirements

13 References The role of Distributed Arithmetic in FPGA based signal processing,


Download ppt "Distributed Arithmetic Dr Sumam David S. Dept. of E&C, NITK Surathkal Courtesy for slides – Xilinx Professor’s Workshop Resources."

Similar presentations


Ads by Google