Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automatic Generation of Customized Discrete Fourier Transform IPs Grace Nordin, Peter A. Milder, James C. Hoe, Markus Püschel Carnegie Mellon University.

Similar presentations


Presentation on theme: "Automatic Generation of Customized Discrete Fourier Transform IPs Grace Nordin, Peter A. Milder, James C. Hoe, Markus Püschel Carnegie Mellon University."— Presentation transcript:

1 Automatic Generation of Customized Discrete Fourier Transform IPs Grace Nordin, Peter A. Milder, James C. Hoe, Markus Püschel Carnegie Mellon University This project is supported in part by NSF awards ITR/NGS-0325687 and SYS-0310941 and a DARPA DESA program www.spiral.net

2 Slide 2 The Paradox of Reusable IPs Boon to productivity  zero effort required  zero knowledge required  zero chance to introduce new bugs Why repeat what has already been done? Bane to optimality  finding the right functionality with the right interface  design tradeoff -- performance, area, power, accuracy..... Are you getting what you really wanted? Solution: Solution: parameterized automatic IP generators  zero effort, knowledge or bugs  allows application specific customization  facilitates design exploration

3 Slide 3 Our Work: Discrete Fourier Transform IPs Discrete Fourier Transform (DFT)  important building block in DSP applications  numerous design “cores” available Current IP libraries support:  various sizes, number formats, data orderings small number  only a small number of microarchitecture choices  (Xilinx LogiCore DFT gives 3 choices) We generate IPs with custom design tradeoffs  degree of parallelism in microarchitecture (min  max)  resource preference (e.g. BRAM vs. slices in FPGAs) Extensible to other common linear DSP transforms

4 Slide 4 Outline Introduction Formula-Driven Design Generation Microarchitecture Parameterization Generator User Interface Experimental Results Conclusions

5 Slide 5 Transforms as Formulas [www.spiral.net] Transform computation is represented as matrix-vector multiplication  Matrix-vector multiplication is O(n 2 ) operations “Fast” algorithms factor the transform into a sequence of structured sparse matrices  O(n log n) operations DFT: FFT: Datapath easily formed from factorized formulas

6 Slide 6 Formula to Datapath Given where is:  apply, then  is a permutationpermute  apply, times in parallel  is a diagonalscale A A B A ×4×4 ×2×2 ×7×7 ×8×8

7 Slide 7 Outline Introduction Formula-Driven Design Generation Microarchitecture Parameterization Generator User Interface Experimental Results Conclusions

8 Slide 8 Simple regular structure embodied in formula Example: Pease DFT diagonal permutation butterfly parallel k stages stage 1 stage 2 stage 3

9 Slide 9 Pease DFT Example: DFT 8 x x x x x x x x x x x x stage 1 stage 2 stage 3 (formula is applied from right to left) (datapath is built left to right) Repeating column structure  hardware reuse without performance penalty without performance penalty

10 Slide 10 x x x x Horizontal folding x x x x x x x x our baseline design degree of freedom: vertical parallelism p  parameter p input bypass register p

11 Slide 11 Vertical (V-)folding according to p latency Fine-grained control over cost/latency tradeoff cost

12 Slide 12 Outline Introduction Formula-Driven Design Generation Microarchitecture Parameterization Generator User Interface Experimental Results Conclusions

13 Slide 13 User Interface http://www.spiral.net/hardware/dftgen.html common DFT options customization options

14 Slide 14 Outline Introduction Formula-Driven Design Generation Microarchitecture Parameterization Generator User Interface Experimental Results Conclusions

15 Slide 15 We compare Xilinx’s fixed design against our variable generated designs Evaluation We compare against Xilinx LogiCore DFT Ver. 3.1  radix-4 burst I/O interface XilinxSPIRAL datapathfixed, one radix- 4 basic block variable, p radix-2 basic blocks cost-performance tradeoff fixed user-controlled, varies with p Comparison  DFT n = {64, 1024, 2048}; width = 16; bit-reversed output  Xilinx ISE ver. 6.1, Xilinx Virtex2-Pro XC2VP100-6

16 Slide 16 DFT 1024 relative to Xilinx Xilinx Performance and resources scale with p 1.0 = 1955 slices 1.0 = 7 BRAMs1.0 = 1 / 5.6 µsec logic storage performance

17 Slide 17 0 2 4 6 8 10 12 14 12481632 p relative slices 0 5 10 15 20 25 30 35 12481632 p relative BRAMs Resource usage preferences Xilinx 1.0 = 1955 slices 1.0 = 7 BRAMs1.0 = 1 / 5.6 µsec logic storage performance 0 2 4 6 12481632 p speedup

18 Slide 18 Resource usage preferences Can control tradeoff between slices and BRAMs Xilinx exchange BRAM for slices  very little change in performance 1.0 = 1955 slices 1.0 = 7 BRAMs1.0 = 1 / 5.6 µsec logic storage performance

19 Slide 19 DFT 64 and DFT 2048 2048 1.0 = 2140 slices 1.0 = 7 BRAMs 1.0 = 1 transform / 24.578 µsec Trends hold for sizes 64, 2048 1.0 = 1743 slices 1.0 = 8 BRAMs 1.0 = 1 transform / 0.648 µsec 64 Xilinx

20 Slide 20 Related Work Kumhom, Johnson, Nagvajara, ASIC/SOC 2000  universal FFT processor microarchitecture based on processing elements interconnected by on-chip reconfigurable network  microarchitecture is scalable in the number of elements  supports both Cooley Tukey and Pease Choi, Scrofano, Prasanna, Jang, FPGA’2003  mapped radix-4 Cooley-Tukey algorithm onto log 2 (n)/2 DFT 4 primitives  scalable datapath between 1 element and 4 elements at a time  show energy and performance improvements from scaling

21 Slide 21 Conclusions Parameterized DFT IP generator formula-driven  matrix formula-driven synthesis  performance/cost tradeoff resources vs. latency  fine-grained control over resources vs. latency  resource usage preference slices and BRAM  can balance tradeoff between slices and BRAM Key results  efficient:  efficient: the Xilinx design point can be matched  customizable: design tradeoffs  customizable: design tradeoffs directly controllable  easy to use: simple yet powerful web interface

22 Slide 22 Web Generator SPIRAL www.spiral.net This work is part of the SPIRAL project, which aims to push the limits of automation in software and hardware development for DSP algorithms. For more information visit: www.spiral.net http://www.spiral.net/hardware/dftgen.html http://www.spiral.net/hardware/dftgen.html

23 Slide 23 V-folding according to p (continued) 0123456701234567 0415263704152637 6  4  2  0 7  5  3  1 p max = n/2 p min = 1

24 Slide 24 V-Folding of Permutations [Takala, et al. ICASSP’2001] where


Download ppt "Automatic Generation of Customized Discrete Fourier Transform IPs Grace Nordin, Peter A. Milder, James C. Hoe, Markus Püschel Carnegie Mellon University."

Similar presentations


Ads by Google