CS 179: GPU Programming Lecture 9 / Homework 3. Recap Some algorithms are “less obviously parallelizable”: – Reduction – Sorts – FFT (and certain recursive.

Slides:



Advertisements
Similar presentations
DFT & FFT Computation.
Advertisements

Michael Phipps Vallary S. Bhopatkar
Digital Kommunikationselektronik TNE027 Lecture 5 1 Fourier Transforms Discrete Fourier Transform (DFT) Algorithms Fast Fourier Transform (FFT) Algorithms.
Processor Architecture Needed to handle FFT algoarithm M. Smith.
CS 179: GPU Programming Lecture 8. Last time GPU-accelerated: – Reduction – Prefix sum – Stream compaction – Sorting (quicksort)
CS 179: GPU Programming Lecture 7. Week 3 Goals: – More involved GPU-accelerable algorithms Relevant hardware quirks – CUDA libraries.
Chapter 8: The Discrete Fourier Transform
Lecture #17 INTRODUCTION TO THE FAST FOURIER TRANSFORM ALGORITHM Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh,
FFT-based filtering and the Short-Time Fourier Transform (STFT) R.C. Maher ECEN4002/5002 DSP Laboratory Spring 2003.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven Hardware Acceleration (1/4)
Lecture 14: Laplace Transform Properties
EE-2027 SaS, L11 1/13 Lecture 11: Discrete Fourier Transform 4 Sampling Discrete-time systems (2 lectures): Sampling theorem, discrete Fourier transform.
Image (and Video) Coding and Processing Lecture 2: Basic Filtering Wade Trappe.
Discrete-Time Convolution Linear Systems and Signals Lecture 8 Spring 2008.
Topic 7 - Fourier Transforms DIGITAL IMAGE PROCESSING Course 3624 Department of Physics and Astronomy Professor Bob Warwick.
1 Chapter 8 The Discrete Fourier Transform 2 Introduction  In Chapters 2 and 3 we discussed the representation of sequences and LTI systems in terms.
Numerical Analysis – Digital Signal Processing Hanyang University Jong-Il Park.
CSC589 Introduction to Computer Vision Lecture 8
Processor Architecture Needed to handle FFT algoarithm M. Smith.
CS 179: GPU Computing Lecture 3 / Homework 1. Recap Adding two arrays… a close look – Memory: Separate memory space, cudaMalloc(), cudaMemcpy(), … – Processing:
Transforms. 5*sin (2  4t) Amplitude = 5 Frequency = 4 Hz seconds A sine wave.
: Chapter 14: The Frequency Domain 1 Montri Karnjanadecha ac.th/~montri Image Processing.
CS 6068 Parallel Computing Fall 2013 Lecture 10 – Nov 18 The Parallel FFT Prof. Fred Office Hours: MWF.
Digital Signal Processing – Chapter 10
Technion, CS department, SIPC Spring 2014 Tutorials 12,13 discrete signals and systems 1/39.
Zhongguo Liu_Biomedical Engineering_Shandong Univ. Chapter 8 The Discrete Fourier Transform Zhongguo Liu Biomedical Engineering School of Control.
Hossein Sameti Department of Computer Engineering Sharif University of Technology.
Real time DSP Professors: Eng. Julian S. Bruno Eng. Jerónimo F. Atencio Sr. Lucio Martinez Garbino.
Digital Signal Processing Chapter 3 Discrete transforms.
Lecture 6: DFT XILIANG LUO 2014/10. Periodic Sequence  Discrete Fourier Series For a sequence with period N, we only need N DFS coefs.
ACCESS IC LAB Graduate Institute of Electronics Engineering, NTU Under-Graduate Project Case Study: Single-path Delay Feedback FFT Speaker: Yu-Min.
OFDM DFT  DFT  Inverse DFT  An N-point DFT (or inverse DFT) requires a total of N 2 complex multiplications  This transform can be implemented very.
7- 1 Chapter 7: Fourier Analysis Fourier analysis = Series + Transform ◎ Fourier Series -- A periodic (T) function f(x) can be written as the sum of sines.
EEE 503 Digital Signal Processing Lecture #2 : EEE 503 Digital Signal Processing Lecture #2 : Discrete-Time Signals & Systems Dr. Panuthat Boonpramuk Department.
Digital Signal Processing
Linear filtering based on the DFT
DEPARTMENTT OF ECE TECHNICAL QUIZ-1 AY Sub Code/Name: EC6502/Principles of digital Signal Processing Topic: Unit 1 & Unit 3 Sem/year: V/III.
Fast Fourier Transforms. 2 Discrete Fourier Transform The DFT pair was given as Baseline for computational complexity: –Each DFT coefficient requires.
The Discrete Fourier Transform
CS 179: GPU Programming Lecture 9 / Homework 3. Recap Some algorithms are “less obviously parallelizable”: – Reduction – Sorts – FFT (and certain recursive.
Learning from the Past, Looking to the Future James R. (Jim) Beaty, PhD - NASA Langley Research Center Vehicle Analysis Branch, Systems Analysis & Concepts.
Lecture 17 Outline: DFT Properties and Circular Convolution
EE345S Real-Time Digital Signal Processing Lab Fall 2006 Lecture 17 Fast Fourier Transform Prof. Brian L. Evans Dept. of Electrical and Computer Engineering.
CS 179: GPU Programming Lecture 7. Week 3 Goals: – More involved GPU-accelerable algorithms Relevant hardware quirks – CUDA libraries.
بسم الله الرحمن الرحيم Digital Signal Processing Lecture 14 FFT-Radix-2 Decimation in Frequency And Radix -4 Algorithm University of Khartoum Department.
1 Chapter 8 The Discrete Fourier Transform (cont.)
 Carrier signal is strong and stable sinusoidal signal x(t) = A cos(  c t +  )  Carrier transports information (audio, video, text, ) across.
Chapter 4 Discrete-Time Signals and transform
Lecture 7: Z-Transform Remember the Laplace transform? This is the same thing but for discrete-time signals! Definition: z is a complex variable: imaginary.
CS 591 S1 – Computational Audio
DIGITAL SIGNAL PROCESSING ELECTRONICS
CE Digital Signal Processing Fall Discrete-time Fourier Transform
FFT-based filtering and the
Introduction to Complex Numbers
Fast Fourier Transform
CS 179: GPU Programming Lecture 7.
Real-time 1-input 1-output DSP systems
4.1 DFT In practice the Fourier components of data are obtained by digital computation rather than by analog processing. The analog values have to be.
Chapter 8 The Discrete Fourier Transform
APPLICATION of the DFT: Convolution of Finite Sequences.
Z TRANSFORM AND DFT Z Transform
Chapter 9 Computation of the Discrete Fourier Transform
Lecture 15 Outline: DFT Properties and Circular Convolution
LECTURE 18: FAST FOURIER TRANSFORM
Lecture 16 Outline: Linear Convolution, Block-by Block Convolution, FFT/IFFT Announcements: HW 4 posted, due tomorrow at 4:30pm. No late HWs as solutions.
Chapter 8 The Discrete Fourier Transform
Chapter 8 The Discrete Fourier Transform
Fast Fourier Transform
CE Digital Signal Processing Fall Discrete Fourier Transform (DFT)
LECTURE 18: FAST FOURIER TRANSFORM
Presentation transcript:

CS 179: GPU Programming Lecture 9 / Homework 3

Recap Some algorithms are “less obviously parallelizable”: – Reduction – Sorts – FFT (and certain recursive algorithms)

Parallel FFT structure (radix-2) Bit-reversed access Stage 1Stage 2Stage 3

cuFFT 1D example Correction: Remember to use cufftDestroy(plan) when finished with transforms

Today Homework 3 – Large-kernel convolution Project Introductions

Systems Given input signal(s), produce output signal(s)

LTI system review (Week 1)

Parallelization

A problem… This worked for small impulse responses – E.g. h[n], 0 ≤ n ≤ 20 in HW 1 Homework 1 was “small-kernel convolution”: – (Vocab alert: Impulse responses are often called “kernels”!)

A problem…

DFT/FFT Same problem with Discrete Fourier Transform! Successfully optimized and GPU-accelerated! – O(n 2 ) to O(n log n) – Can we do the same here?

“Circular” convolution

Linear convolution: Circular convolution:

Example: x[0..3], h[0..1] Linear convolution: y[0] = x[0]h[0] y[1] = x[0]h[1] + x[1]h[0] y[2] = x[1]h[1] + x[2]h[0] y[3] = x[2]h[1] + x[3]h[0] y[4] = x[3]h[1] + x[4]h[0] Circular convolution: y[0] = x[0]h[0] + x[3]h[1] + x[2]h[2] + x[3]h[1] y[1] = x[0]h[1] + x[1]h[0] + x[2]h[3] + x[3]h[2] y[2] = x[1]h[1] + x[2]h[0] + x[3]h[3] + x[0]h[2] y[3] = x[2]h[1] + x[3]h[0] + x[0]h[3] + x[1]h[2] = 0

Circular Convolution Theorem* * DFT case

Circular Convolution Theorem* * DFT case O(n log n) Assume n > m O(m log m) O(n) O(n log n) Total: O(n log n)

x[n] and h[n] are different lengths? How to linearly convolve using circular convolution?

Padding x[n] and h[n] – presumed zero where not defined – Computationally: Store x[n] and h[n] as larger arrays – Pad both to at least x.length + h.length - 1

Example: (Padding) x[0..3], h[0..1] Linear convolution: y[0] = x[0]h[0] y[1] = x[0]h[1] + x[1]h[0] y[2] = x[1]h[1] + x[2]h[0] y[3] = x[2]h[1] + x[3]h[0] y[4] = x[3]h[1] + x[4]h[0] Circular convolution: y[0] = x[0]h[0] + x[1]h[4] + x[2]h[3] + x[3]h[2] + x[4]h[1] y[1] = x[0]h[1] + x[1]h[0] + x[2]h[4] + x[3]h[3] + x[4]h[2] y[2] = x[1]h[1] + x[2]h[0] + x[3]h[4] + x[4]h[3] + x[0]h[2] y[3] = x[2]h[1] + x[3]h[0] + x[4]h[4] + x[0]h[3] + x[1]h[2] y[4] = x[3]h[1] + x[4]h[0] + x[0]h[4] + x[1]h[3] + x[2]h[2] N is now (4 + 2 – 1) = 5

Summary Alternate algorithm for large impulse response convolution! – Serial: O(n log n) vs. O(mn) Small vs. large m determines algorithm choice Runtime does “carry over” to parallel situations (to some extent)

Homework 3, Part 1 Implement FFT (“large-kernel”) convolution – Use cuFFT for FFT/IFFT (if brave, try your own) Use “batch” variable to save FFT calculations Correction: Good practice in general, but results in poor performance on Homework 3 – Complex multiplication kernel: Week 1-style – (HW1 difference: Consider right-hand boundary region)

Complex numbers cufftComplex: cuFFT complex number type – Example usage: cufftComplex a; a.x = 3;// Real part a.y = 4;// Imaginary part Element-wise multiplying: (a + bi)(c + di) = (ac - bd) + (ad + bc)i

Homework 3, Part 2

Normalization Amplitudes must lie in range [-1, 1] – Normalize s.t. maximum magnitude is 1 (or 1 - ε) How to find maximum amplitude?

Reduction This time, maximum (instead of sum) – Lecture 7 strategies – “Optimizing Parallel Reduction in CUDA” (Harris)

Homework 3, Part 2 Implement GPU-accelerated normalization – Find maximum (reduction) – Divide by maximum to normalize

(Demonstration) Rooms can be modeled as LTI systems!

Other notes Machines: – Normal mode: haru, mx, minuteman – Audio mode: haru Due date: Friday (4/24), 3 PM Correction: 11:59 PM – Extra office hours: Thursday (4/23), 8-10 PM

Projects