Download presentation

Presentation is loading. Please wait.

Published byYasmin Hugh Modified about 1 year ago

1
Distributed Arithmetic Dr Sumam David S. Dept. of E&C, NITK Surathkal Courtesy for slides – Xilinx Professor’s Workshop Resources

2
Objective Distributed arithmetic What ? Where ? How ?

3
What is DA? Multiplication using LUT Used to implement multipliers in LUT rich FPGAs

4
Twos Complement Multiplication One bit at a time:

5
SDA 1-Tap FIR Filter X0X0 Partial Product ROM A0 1 N BITS WIDE SAMPLE DATA +/- Z -1 Scaling Accumulator LUT contains two locations C0C0 A0 0 1 Parallel to serial converter

6
= Sign Extension C0 = (-7) X0 = ( 7)X ( ( (-49) C1 = ( 6) X1 = ( 5)X ) ) ) ) ( 30) = (-1) (-14) (-4) (0) (-19) (Serial-Data / Tap-Parallel Multiply) Distributed Arithmetic for a 2-Tap Filter Partial products of equal weight are added together before being summed to next higher partial product weight Create look-up table of summed partial products

7
SDA 2-Tap FIR Filter LUT contains all possible sums of the partial products C0C0 C 0 + C 1 C1C1 X0X0 X1X1 A0 A1 1 N BITS WIDE SAMPLE DATA Partial Product ROM +/- Z -1 Scaling Accumulator

8
C3C3 + SDA 4-Tap FIR Filter X0X C0C0 X1X1 A0 A1 N BITS WIDE SAMPLE DATA C1C1 + +/- Z -1 Scaling Accumulator 1 X2X C2C2 X3X3 A2 A3 1 + Partial Product ROM 1

9
SDA 8-Tap FIR Filter N BITS WIDE SAMPLE DATA + +/- Z -1 Scaling Accumulator Partial Product ROM X0X0 X1X1 A0 A1 1 X2X2 X3X3 A2 A3 1 1 Partial Product ROM X4X4 X5X5 A0 A1 1 X6X6 X7X7 A2 A input LUT contains all possible sums of the partial products Pre-Adder 1

10
fclk = 200 MHz for both processor and FPGA B = data sample precision for FPGA Xilinx DA FIR Performance Filter Length (Taps) Performance (MMACs/s) Serial FPGA FIR Dual MAC DA FIR B=8 DA FIR B=12 DA FIR B= Sample Rate (MSPS) Single MAC DA FIR B=8 DA FIR B=12 DA FIR B= Serial FPGA FIR Filter Length (Taps)

11
The sample is serialized and processed 1 bit per clock cycle. 8 clock cycles are thus required to process the whole sample The sample is serialized and processed 2 bits per clock cycle. 4 clock cycles are thus required to process the whole sample The sample is serialized and processed 4 bits per clock cycle The sample is processed in parallel 8 bits per clock cycle b0b0 b0b0 b0b0 b3b3 b4b4 b7b7 b3b3 b4b4 b7b7 b0b0 b0b0 b7b7 Serial-DAParallel-DA Multi bits per clock cycle Trade Clock Cycles for Logic Area 20Ms/s 160Ms/s Hardware Over-sampling = 8 b0b0 b7b7 Hardware Over-sampling = 1 Trade Clock Cycles for Logic Area Hardware Over-sampling = 4 Hardware Over-sampling = 2

12
Conclusion Efficiency of computation Slow as its bit serial Memory requirements

13
References The role of Distributed Arithmetic in FPGA based signal processing,

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google