Download presentation

Published byPeyton Keys Modified over 2 years ago

0
**Rader’s FFT algorithm acceleration using Maxeler**

Author: Tadej Matek

1
Fourier Transform Fourier transform decomposes a signal into its frequency components Used in telecommunications, data compression, digital signal processing, fast multiplication of polynomials ... Tadej Matek Source: 1/17

2
**Fourier Transform and computers**

Transformation: Discrete Fourier Transform Time: O(n2) Algorithm(s): Fast Fourier Transform (FFT) (Cooley-Tukey, Bruun’s FFT, Rader’s FFT, Bluestein’s FFT …) Time: O(nlogn) Tadej Matek 2/17

3
**Why is FFT faster than DFT**

Divide & conquer + properties of primitive roots Primitive root of unity: Conquer step (butterfly): Source: Tadej Matek 3/17

4
**Rader’s FFT algorithm overview**

Primitive root defined as: Bit reversal revk(i): rev4(3): 3(10) = 0011(2) → 1100(2) = 12(10) Tadej Matek 4/17

5
**Example of calculation**

n = k = log(n) = z = p = 13 8, 2, 2, 4 i = 0 s = revk(i) = 2 s = revk(i) = 0 i = 1 10, 6 6, 11 8+z0*2 % 13 = 10 8+z2*2 % 13 = 6 2+z0*4 % 13 = 6 2+z2*4 % 13 = 11 i = 0 i = 1 i = 2 i = 3 s = 0 s = 3 s = 2 s = 1 3 4 9 3 Tadej Matek 5/17

6
**Example: fast multiplication**

How to multiply two large polynomials? Basic approach: multiply each component of 1st with each component of 2nd -> O(n2) Using FFT: compute DFT transform of both polynomials, multiply in O(n) time and do inverse FFT -> O(nlogn) Tadej Matek 6/17

7
**Dataflow implementation (1)**

8, 2, 2, 4 Data dependency! 10, 6 6, 11 Kernel needs updated data for each level! Solution: LMem 7/17 Tadej Matek

8
**Dataflow implementation (2)**

Input sequence Call kernel k times CPU (1) (3) (2) ... Output sequence Kernel Manager Manager streams data in and out of Kernel LMem Tadej Matek 8/17

9
**Dataflow implementation (3)**

LMem works in bursts (example: 384 B, but depends on DFE) Good for consecutive calculations zs are calculated on CPU and written to LMem Tadej Matek 9/17

10
**Performance & results (1)**

CPU used for testing: Intel Core2 Quad Processor Q GHz Maxeler card of type MAX2336B was used for DFE testing Tadej Matek 10/17

11
**Performance & results (2)**

Conditions: BIG data, 95% run time in loops Type of experiments: consecutive calculations starting from 10K and up to 10M Consecutive calculations for input sequences of length 32, 64, 128 and 256 Tadej Matek 11/17

12
**Performance & results (3)**

Execution time, N = 32, for CPU and DFE Tadej Matek 12/17

13
**Performance & results (4)**

Speedup according to the number of consecutive calculations for N = 32 13/17 Tadej Matek

14
**Performance & results (5)**

Speedup according to the number of consecutive calculations for N = 64 Tadej Matek 14/17

15
**Performance & results (6)**

Speedup according to the number of consecutive calculations for N = 256 15/17 Tadej Matek

16
**Performance & results (7)**

Speedup according to the size of input sequence (for 100K calculations) 16/17 Tadej Matek

17
**Conclusion FFTs are one of the most used algorithms today**

There can be massive speedup but the requirement are consecutive calculations Power usage: reduced due to lower frequency (200Mhz vs 2.86GHz) Tadej Matek 17/17

Similar presentations

OK

SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION

SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Download ppt on earthquake in india Ppt on bill gates as a leader 7 segment led display ppt online Ppt on travel and tourism class 10 Ppt on social networking project Ppt on self awareness questions Ppt on total internal reflection microscopy Ppt on hong kong tourism Ppt on ball mill Ppt on acute coronary syndrome pathophysiology