# Parallel Processing (CS 730) Lecture 7: Shared Memory FFTs*

Parallel Processing (CS 730) Lecture 7: Shared Memory FFTs*
Introduction
Objective: To derive and implement a shared-memory parallel program for computing the fast Fourier transform (FFT).
Topics
Derivation of the FFT
Recursive version
Iterative version
A parallel divide & conquer algorithm using threads
A parallel loop version using OpenMP
Obtaining additional parallelism

Introduction
Objective: To derive and implement a shared-memory parallel program for computing the fast Fourier transform (FFT).
Topics
Derivation of the FFT
Recursive version
Iterative version
A parallel divide & conquer algorithm using threads
A parallel loop version using OpenMP
Obtaining additional parallelism

FFT as a Matrix Factorization
Compute y = Fnx, where Fn is n-point Fourier matrix. Feb. 14, 2001 Parallel Processing

Matrix Factorizations and Algorithms
function y = fft(x)
n = length(x)
if n == 1
y = x
else
% [x0 x1] = L^n_2 x
x0 = x(1:2:n-1); x1 = x(2:2:n);
% [t0 t1] = (I_2 tensor F_m)[x0 x1]
t0 = fft(x0); t1 = fft(x1);
% w = W_m(omega_n)
w = exp((2*pi*i/n)*(0:n/2-1));
% y = [y0 y1] = (F_2 tensor I_m) T^n_m [t0 t1]
y0 = t0 + w.*t1; y1 = t0 - w.*t1;
y = [y0 y1]
end

Rewrite Rules Feb. 14, 2001 Parallel Processing

FFT Variants Cooley-Tukey Recursive FFT Iterative FFT
Vector FFT (Stockham)
Vector FFT (Korn-Lambiotte)
Parallel FFT (Pease)

Tensor Permutations A natural class of permutations compatible with the FFT. Let  be a permutation of {1,…,t} Mixed-radix counting permutation of vector indices Well-known examples are stride permutations and bit-reversal. Feb. 14, 2001 Parallel Processing

Example (Stride Permutation)
Feb. 14, 2001 Parallel Processing

Example (Bit Reversal)
Feb. 14, 2001 Parallel Processing

Iterative Cooley-Tukey Algorithm
Iterative Cooley-Tukey Algorithm
R
Stage 0
Stage 1
Stage 2
Stage 3

Iterative Cooley-Tukey Algorithm
Iterative Cooley-Tukey Algorithm
R
Stage 0
Stage 1
Stage 2
Stage 3

Modified Pease Algorithm
Modified Pease Algorithm
Stage 0
Stage 1
Stage 2
Stage 3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
2 4 6 8 10 12 14 1 3 5 7 9 11 13 15

Iterative Implementation
function y = ifft2(x)
% Input: x a vector of length n. n = 2^t, t an integer, t >= 0.
% Output: y = F_{2^t} x
% Algorithm: Iterative.
% F_{2^t} = { Prod_{c=1}^t I_{2^{t-c}})
% T^{2^{t-c+1}}_{2^{t-c}}) } R^{2^t}
n = length(x); t = ceil(log2(n));
xt = bitreversal(x); yt = zeros(n,1);
for c=t:-1:1
m = 2^(c-1); p = 2^(t-c);
% W = W_p(omega_{2p})
W = exp((2*pi*i)/(2*p)*-(0:p-1)');
% yt = I_p)xt
for j=0:m-1
% y^{2p}_{j*2p+1} = I_p)T^{2p}_p x^{2p}_{j*2p+1}
% = I_p)(I_p \$ W) x^{2p}_{j*2p+1}
xt((j*2+1)*p+1:(j+1)*2*p) = W .* xt((j*2+1)*p+1:(j+1)*2*p);
yt(j*2*p+1:(j*2+1)*p) = xt(j*2*p+1:(j*2+1)*p) + xt((j*2+1)*p+1:(j+1)*2*p);
yt((j*2+1)*p+1:(j+1)*2*p) = xt(j*2*p+1:(j*2+1)*p) - xt((j*2+1)*p+1:(j+1)*2*p);
end
xt = yt;
end
y = yt;

Iterative Implementation
function y = ipfft2(x)
% In-place Pease FFT algorithm.
% Input: x a vector of length n. n = 2^t, t an integer, t >= 0.
% Output: y = F_{2^t} x
% Algorithm: Conjugated Pease.
% F_{2^t} = { Prod_{c=1}^t F_2)T_c L^n_{2^c} R^{2^t}
%
n = length(x); t = ceil(log2(n));
y = bitreversal(x);
w = exp(-2*pi*i/n);
for c=t-1:-1:0
for r=0:2^(t-1)-1
r0 = mod(r,2^c); r1 = floor(r/2^c);
a0 = r0*2^(t-c) + r1; a1 = a0 + 2^(t-c-1);
y0 = y(a0+1); y1 = w^(r1*2^c) * y(a1+1);
y(a0+1) = y0 + y1; y(a1+1) = y0 - y1;
end
end