Parallelizing the Fast Fourier Transform David Monismith cs599.

Slides:



Advertisements
Similar presentations
DFT & FFT Computation.
Advertisements

David Hansen and James Michelussi
Fourier Transform and its Application in Image Processing
Fast Fourier Transform for speeding up the multiplication of polynomials an Algorithm Visualization Alexandru Cioaca.
Parallel Processing (CS 730) Lecture 7: Shared Memory FFTs*
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Parallel Fast Fourier Transform Ryan Liu. Introduction The Discrete Fourier Transform could be applied in science and engineering. Examples: ◦ Voice recognition.
Digital Kommunikationselektronik TNE027 Lecture 5 1 Fourier Transforms Discrete Fourier Transform (DFT) Algorithms Fast Fourier Transform (FFT) Algorithms.
DFT/FFT and Wavelets ● Additive Synthesis demonstration (wave addition) ● Standard Definitions ● Computing the DFT and FFT ● Sine and cosine wave multiplication.
Digital Signal Processing – Chapter 11 Introduction to the Design of Discrete Filters Prof. Yasser Mostafa Kadah
Digital Signal Processing
Chapter 8: The Discrete Fourier Transform
FFT1 The Fast Fourier Transform. FFT2 Outline and Reading Polynomial Multiplication Problem Primitive Roots of Unity (§10.4.1) The Discrete Fourier Transform.
FFT1 The Fast Fourier Transform by Jorge M. Trabal.
May 29, Final Presentation Sajib Barua1 Development of a Parallel Fast Fourier Transform Algorithm for Derivative Pricing Using MPI Sajib Barua.
The FFT on a GPU Graphics Hardware 2003 July 27, 2003 Kenneth MorelandEdward Angel Sandia National LabsU. of New Mexico Sandia is a multiprogram laboratory.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven Hardware Acceleration (1/4)
COMPE575 Parallel & Cluster Computing 5.1 Pipelined Computations Chapter 5.
Fast Fourier Transform (FFT) (Section 4.11) CS474/674 – Prof. Bebis.
Introduction to Algorithms
The Fourier series A large class of phenomena can be described as periodic in nature: waves, sounds, light, radio, water waves etc. It is natural to attempt.
Fourier Transform and Applications
Fast Fourier Transform Irina Bobkova. Overview I. Polynomials II. The DFT and FFT III. Efficient implementations IV. Some problems.
Numbering Systems CS208.
Motivation Music as a combination of sounds at different frequencies
CHAPTER 8 DSP Algorithm Implementation Wang Weilian School of Information Science and Technology Yunnan University.
CS212: DATA STRUCTURES Lecture 10:Hashing 1. Outline 2  Map Abstract Data type  Map Abstract Data type methods  What is hash  Hash tables  Bucket.
FFT USING OPEN-MP Done by: HUSSEIN SALIM QASIM & Tiba Zaki Abdulhameed
FFT1 The Fast Fourier Transform. FFT2 Outline and Reading Polynomial Multiplication Problem Primitive Roots of Unity (§10.4.1) The Discrete Fourier Transform.
5.6 Convolution and FFT. 2 Fast Fourier Transform: Applications Applications. n Optics, acoustics, quantum physics, telecommunications, control systems,
CS654: Digital Image Analysis
Discrete Fourier Transform Prof. Siripong Potisuk.
The Fast Fourier Transform
Seismic Reflection Data Processing and Interpretation A Workshop in Cairo 28 Oct. – 9 Nov Cairo University, Egypt Dr. Sherif Mohamed Hanafy Lecturer.
Mar. 1, 2001Parallel Processing1 Parallel Processing (CS 730) Lecture 9: Distributed Memory FFTs * Jeremy R. Johnson Wed. Mar. 1, 2001 *Parts of this lecture.
Wavelets and Multiresolution Processing (Wavelet Transforms)
Inverse DFT. Frequency to time domain Sometimes calculations are easier in the frequency domain then later convert the results back to the time domain.
7- 1 Chapter 7: Fourier Analysis Fourier analysis = Series + Transform ◎ Fourier Series -- A periodic (T) function f(x) can be written as the sum of sines.
Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M
Fourier and Wavelet Transformations Michael J. Watts
Professor A G Constantinides 1 Discrete Fourier Transforms Consider finite duration signal Its z-tranform is Evaluate at points on z-plane as We can evaluate.
Fast Fourier Transforms. 2 Discrete Fourier Transform The DFT pair was given as Baseline for computational complexity: –Each DFT coefficient requires.
Lecture 3: The Sampling Process and Aliasing 1. Introduction A digital or sampled-data control system operates on discrete- time rather than continuous-time.
The Discrete Fourier Transform
Parallel Programming - Sorting David Monismith CS599 Notes are primarily based upon Introduction to Parallel Programming, Second Edition by Grama, Gupta,
CS 179: GPU Programming Lecture 9 / Homework 3. Recap Some algorithms are “less obviously parallelizable”: – Reduction – Sorts – FFT (and certain recursive.
Learning from the Past, Looking to the Future James R. (Jim) Beaty, PhD - NASA Langley Research Center Vehicle Analysis Branch, Systems Analysis & Concepts.
بسم الله الرحمن الرحيم Digital Signal Processing Lecture 14 FFT-Radix-2 Decimation in Frequency And Radix -4 Algorithm University of Khartoum Department.
The content of lecture This lecture will cover: Fourier Transform
CS 591 S1 – Computational Audio
DIGITAL SIGNAL PROCESSING ELECTRONICS
Parallel FFT Sathish Vadhiyar.
An Iterative FFT We rewrite the loop to calculate nkyk[1] once
Data Structures and Algorithms (AT70. 02) Comp. Sc. and Inf. Mgmt
FFT-based filtering and the
Polynomial + Fast Fourier Transform
Fourier and Wavelet Transformations
Fast Fourier Transforms Dr. Vinu Thomas
Fast Fourier Transform (FFT) (Section 4.11)
DFT and FFT By using the complex roots of unity, we can evaluate and interpolate a polynomial in O(n lg n) An example, here are the solutions to 8 =
4.1 DFT In practice the Fourier components of data are obtained by digital computation rather than by analog processing. The analog values have to be.
Fast Fourier Transformation (FFT)
Lecture 17 DFT: Discrete Fourier Transform
LECTURE 18: FAST FOURIER TRANSFORM
1-D DISCRETE COSINE TRANSFORM DCT
Image Coding and Compression
Kenneth Moreland Edward Angel Sandia National Labs U. of New Mexico
Speaker: Chris Chen Advisor: Prof. An-Yeu Wu Date: 2014/10/28
LECTURE 18: FAST FOURIER TRANSFORM
Presentation transcript:

Parallelizing the Fast Fourier Transform David Monismith cs599

Outline An Example of a use of the Fast Fourier Transform (FFT) – Audio Processing. Explanation of the Discrete Fourier Transform. Making use of Divide and Conquer to implement the Recursive FFT (Cooley-Tukey Algorithm). Creating an iterative algorithm. Parallelization of the FFT.

Examples: Audio Processing Compression – In audio and video processing, often only certain frequencies can be heard or seen. – The FFT or a similar tool can be used to remove frequency data that cannot be seen or heard. – Only the important data (the frequencies we care about can be stored). – An inverse FFT (or similar operation) can be applied to decompress the data. Audio synthesis – Frequencies can be quickly added/adjusted and converted to a signal (a sound) by using the FFT. – Such operations are often applied in audio synthesizers.

Band Pass Filter Algorithm Convert signal to the frequency domain with FFT. Multiply desired frequencies by 1. Multiply remaining frequencies by zero. Apply the inverse FFT to convert the signal back to the space-time domain.

Discrete Fourier Transform Converts a sequence of values from space-time domain to frequency domain. Useful for signal processing. The standard DFT is too slow for practical use - it requires O(n 2 ) operations. Notice that each value is a summation of all the components in the space-time sequence.

Discrete Fourier Transform The nth root of unity is defined as ω n as shown below. The DFT can be performed as shown in the equation below. Many values in the matrix are repeated, but the repetition is not obvious.

Example of Repetition Assume that N = 5. Notice that the first root of unity is: The sixth root of unity is

Fast Fourier Transform (FFT) Repetition in the roots of unity is highest for the DFT when using array sizes of powers of two. This can be taken advantage of with a divide and conquer algorithm called the FFT. The FFT and can compute a DFT in O(n lg n) time by multiplying array values by the appropriate root of unity and adding the array value at an appropriate stride. This algorithm allows for fast compression and manipulation of signals.

Divide and Conquer in the FFT x[0] x[1] x[2] x[3] x[4] x[5] x[6] x[7] s1[0] s1[1] s1[2] s1[3] s1[4] s1[5] s1[6] s1[7] s2[0] s2[1] s2[2] s2[3] s2[4] s2[5] s2[6] s2[7] X[0] X[1] X[2] X[3] X[4] X[5] X[6] X[7]

Cooley-Tukey FFT Algorithm y = fft(x, n, stride) if(n == 1) y[0] = x[0]; else y1 = fft(x, n/2, 2*stride); y2 = fft(x+stride, n/2, 2*stride); for(int i = 0; i < n/2; i++) y[i] = y1[i] + e^((2*PI*I*i)/n)*y2[i]; y[i+n/2] = y1[i] + e^((2*PI*I*(i+n/2))/n)*y2[i]; end for end if end fft

Example Code Let’s quickly take a look at some example code for the Recursive FFT. Given an array of values: [ 8, 7.0, 9.0, -1.3, 6.3, 8.5, 4.2, 9.1, -5.2 ] Matlab (and Octave) tell us that the fft is: [ *i, *i, *i, *i, *i, *i, *i, *i]

Iterative FFT y = fft(x, n) { r = ceil(log2(n)); //Allocate arrays R and S to be of size n R = x; for(m = 0; m < r; m++) { S[i] = R[i]; //Elements to add at each stage differ in exactly one bit. bit = 1 << (r - m - 1); notBit = ~bit; for(i = 0; i < n; i++) { j = i & notbit; k = i | bit; expFactor = revAndShift(i, r, m); R[i] = S[j] + S[k] * cexp( (2*PI*I*expFactor)/n ); } y = R; }

Reverse and Shift Function //Given i = b 0 b 1 b 2 … b r-1 obtain b m b m-1 … b 0 0 … 0 //Note that there are r - m - 1 zeros. result = revAndShift(unsigned int i, int r, int m) { i = i >> (r-m-1); //remove unwanted bits result = 0; for(int j = 0; j < m+1; j++) { result |= i & 1; i = i >> 1; if(j < m) result = result << 1; } //pad result with zeros result = result << (r-m-1); }

Parallelizing the FFT Assume the number of processes (i.e. running programs) is a power of 2. The number of processes will be referred to as npes. Assume the array size (N) is a power of 2 and is larger than the number of processes. Partition the array into N/npes chunks. Assign one chunk to each process.

Parallelizing the FFT x[0] x[1] x[2] x[3] x[4] x[5] x[6] x[7] s1[0] s1[1] s1[2] s1[3] s1[4] s1[5] s1[6] s1[7] s2[0] s2[1] s2[2] s2[3] s2[4] s2[5] s2[6] s2[7] X[0] X[1] X[2] X[3] X[4] X[5] X[6] X[7] Process 0Process 1 Process 2 Process 3

Parallelizing the FFT Notice that data must be sent to each process and received by each process in the first lg(npes) stages. Additionally, notice that within each process, in a stage where data transfer must occur, data must be sent to a process and received from the same process. This operation can be accomplished using a function called MPI_Sendrecv from the Message Passing Interface API. We will investigate the algorithm to perform this operation next.

Parallel Algorithm //Rank is the process id and npes is the //number of processes y = fft(x, n, rank, npes) { r = ceil(log2(n)); workToDo = n/npes; start = rank*workToDo; end = start + workToDo; //Allocate arrays S, Sk, and R of size workToDo R = x[start…end-1]; for(int m = 0; m < r; m++) { Sk = S = R; bit = 1 << (r - m - 1), notbit = ~bit; splitPoint = npes / (1 << (m+1));

Parallel FFT Algorithm, Cont’d if(splitPoint > 0) { if( ( rank % (splitPoint << 1) ) < splitPoint) Send S to process rank + splitPoint, and receive Sk from rank + splitPoint. else Send Sk to process rank - splitPoint, and receive S from rank – splitPoint. } else Sk = S;

Parallel FFT Algorithm, Cont’d for(int i = start, l = 0; l < workToDo; i++, l++) { j = (i & notbit) % workToDo; k = (i | bit) % workToDo; expFactor = revAndShift(i, r, m); R[l] = S[j] + Sk[k] * e^( (2*PI*I*expFactor)/n ); } y = R; }

References [1] A. Grama, A. Gupta, G. Karypis, and V. Kumar, Introduction to Parallel Computing 2 nd Edition, 2003 [2] J. Demmel, Fast Fourier Transform Lecture, Efficient Algorithms and Intractable Problems, Spring 2007, ctureNotes/Lecture_FFT.pdf ctureNotes/Lecture_FFT.pdf [3] Discrete Fourier Transform, m Additive Synthesis,