Download presentation

Presentation is loading. Please wait.

Published byYasmin Hugh Modified about 1 year ago

1
Distributed Arithmetic Dr Sumam David S. Dept. of E&C, NITK Surathkal Courtesy for slides – Xilinx Professor’s Workshop Resources

2
Objective Distributed arithmetic What ? Where ? How ?

3
What is DA? Multiplication using LUT Used to implement multipliers in LUT rich FPGAs

4
Twos Complement Multiplication One bit at a time:

5
SDA 1-Tap FIR Filter X0X0 Partial Product ROM A0 1 N BITS WIDE SAMPLE DATA +/- Z -1 Scaling Accumulator LUT contains two locations 00000...0 C0C0 A0 0 1 Parallel to serial converter

6
= Sign Extension -2 3 2 2 2 1 2 0 C0 = 1 0 0 1 (-7) X0 = 0 1 1 1 ( 7)X ( 1 0 0 1 (0 0 0 0 1 1 0 0 1 1 1 1 (-49) -2 3 2 2 2 1 2 0 C1 = 0 1 1 0 ( 6) X1 = 0 1 0 1 ( 5)X 0 1 1 0) 0 0 0 0 ) 0 1 1 0 ) 0 0 0 0 ) 0 0 0 1 1 1 1 0 ( 30) 1 1 1 1 1 0 0 1 1 1 1 1 0 0 0 0 = 1 1 1 0 1 1 0 1 ++++++++ (-1) (-14) (-4) (0) (-19) (Serial-Data / Tap-Parallel Multiply) Distributed Arithmetic for a 2-Tap Filter Partial products of equal weight are added together before being summed to next higher partial product weight Create look-up table of summed partial products

7
SDA 2-Tap FIR Filter LUT contains all possible sums of the partial products 00 01 10 11 0000...0 C0C0 C 0 + C 1 C1C1 X0X0 X1X1 A0 A1 1 N BITS WIDE SAMPLE DATA Partial Product ROM +/- Z -1 Scaling Accumulator

8
0000...0 C3C3 + SDA 4-Tap FIR Filter X0X0 0000...0 C0C0 X1X1 A0 A1 N BITS WIDE SAMPLE DATA 0000...0 C1C1 + +/- Z -1 Scaling Accumulator 1 X2X2 0000...0 C2C2 X3X3 A2 A3 1 + Partial Product ROM 1

9
SDA 8-Tap FIR Filter N BITS WIDE SAMPLE DATA + +/- Z -1 Scaling Accumulator Partial Product ROM X0X0 X1X1 A0 A1 1 X2X2 X3X3 A2 A3 1 1 Partial Product ROM X4X4 X5X5 A0 A1 1 X6X6 X7X7 A2 A3 1 1 4 -input LUT contains all possible sums of the partial products Pre-Adder 1

10
fclk = 200 MHz for both processor and FPGA B = data sample precision for FPGA Xilinx DA FIR Performance 050100150200250 0 1000 2000 3000 4000 5000 6000 Filter Length (Taps) Performance (MMACs/s) Serial FPGA FIR Dual MAC DA FIR B=8 DA FIR B=12 DA FIR B=16 10 20 30 40 50 60 Sample Rate (MSPS) Single MAC DA FIR B=8 DA FIR B=12 DA FIR B=16 050100150200250 0 Serial FPGA FIR Filter Length (Taps)

11
The sample is serialized and processed 1 bit per clock cycle. 8 clock cycles are thus required to process the whole sample The sample is serialized and processed 2 bits per clock cycle. 4 clock cycles are thus required to process the whole sample The sample is serialized and processed 4 bits per clock cycle The sample is processed in parallel 8 bits per clock cycle b0b0 b0b0 b0b0 b3b3 b4b4 b7b7 b3b3 b4b4 b7b7 b0b0 b0b0 b7b7 Serial-DAParallel-DA Multi bits per clock cycle Trade Clock Cycles for Logic Area 20Ms/s 160Ms/s Hardware Over-sampling = 8 b0b0 b7b7 Hardware Over-sampling = 1 Trade Clock Cycles for Logic Area Hardware Over-sampling = 4 Hardware Over-sampling = 2

12
Conclusion Efficiency of computation Slow as its bit serial Memory requirements

13
References The role of Distributed Arithmetic in FPGA based signal processing, www.xilinx.comwww.xilinx.com

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google