CS 179: GPU Programming Lecture 9 / Homework 3. Recap Some algorithms are “less obviously parallelizable”: – Reduction – Sorts – FFT (and certain recursive.

Slides:



Advertisements
Similar presentations
DFT & FFT Computation.
Advertisements

Michael Phipps Vallary S. Bhopatkar
Computer Vision Lecture 7: The Fourier Transform
Digital Kommunikationselektronik TNE027 Lecture 5 1 Fourier Transforms Discrete Fourier Transform (DFT) Algorithms Fast Fourier Transform (FFT) Algorithms.
DFT/FFT and Wavelets ● Additive Synthesis demonstration (wave addition) ● Standard Definitions ● Computing the DFT and FFT ● Sine and cosine wave multiplication.
CS 179: GPU Programming Lecture 8. Last time GPU-accelerated: – Reduction – Prefix sum – Stream compaction – Sorting (quicksort)
Chapter 8: The Discrete Fourier Transform
FFT-based filtering and the Short-Time Fourier Transform (STFT) R.C. Maher ECEN4002/5002 DSP Laboratory Spring 2003.
General Functions A non-periodic function can be represented as a sum of sin’s and cos’s of (possibly) all frequencies: F(  ) is the spectrum of the function.
Sampling, Reconstruction, and Elementary Digital Filters R.C. Maher ECEN4002/5002 DSP Laboratory Spring 2002.
Reconfigurable Computing S. Reda, Brown University Reconfigurable Computing (EN2911X, Fall07) Lecture 16: Application-Driven Hardware Acceleration (1/4)
EE-2027 SaS, L11 1/13 Lecture 11: Discrete Fourier Transform 4 Sampling Discrete-time systems (2 lectures): Sampling theorem, discrete Fourier transform.
Chapter 5. Operations on Multiple R. V.'s 1 Chapter 5. Operations on Multiple Random Variables 0. Introduction 1. Expected Value of a Function of Random.
Discrete-Time Convolution Linear Systems and Signals Lecture 8 Spring 2008.
FILTERING GG313 Lecture 27 December 1, A FILTER is a device or function that allows certain material to pass through it while not others. In electronics.
CELLULAR COMMUNICATIONS DSP Intro. Signals: quantization and sampling.
Topic 7 - Fourier Transforms DIGITAL IMAGE PROCESSING Course 3624 Department of Physics and Astronomy Professor Bob Warwick.
EE421, Fall 1998 Michigan Technological University Timothy J. Schulz 08-Sept, 98EE421, Lecture 11 Digital Signal Processing (DSP) Systems l Digital processing.
Numerical Analysis – Digital Signal Processing Hanyang University Jong-Il Park.
CSC589 Introduction to Computer Vision Lecture 8
Processor Architecture Needed to handle FFT algoarithm M. Smith.
CS 179: GPU Computing Lecture 3 / Homework 1. Recap Adding two arrays… a close look – Memory: Separate memory space, cudaMalloc(), cudaMemcpy(), … – Processing:
Transforms. 5*sin (2  4t) Amplitude = 5 Frequency = 4 Hz seconds A sine wave.
: Chapter 14: The Frequency Domain 1 Montri Karnjanadecha ac.th/~montri Image Processing.
Digital Signal Processing – Chapter 10
CS 179: GPU Programming Lecture 9 / Homework 3. Recap Some algorithms are “less obviously parallelizable”: – Reduction – Sorts – FFT (and certain recursive.
1 BIEN425 – Lecture 10 By the end of the lecture, you should be able to: –Describe the reason and remedy of DFT leakage –Design and implement FIR filters.
README Lecture notes will be animated by clicks. Each click will indicate pause for audience to observe slide. On further click, the lecturer will explain.
Parallel Algorithms Patrick Cozzi University of Pennsylvania CIS Spring 2012.
Technion, CS department, SIPC Spring 2014 Tutorials 12,13 discrete signals and systems 1/39.
Zhongguo Liu_Biomedical Engineering_Shandong Univ. Chapter 8 The Discrete Fourier Transform Zhongguo Liu Biomedical Engineering School of Control.
Hossein Sameti Department of Computer Engineering Sharif University of Technology.
Real time DSP Professors: Eng. Julian S. Bruno Eng. Jerónimo F. Atencio Sr. Lucio Martinez Garbino.
Digital Signal Processing Chapter 3 Discrete transforms.
Copyright ©2010, ©1999, ©1989 by Pearson Education, Inc. All rights reserved. Discrete-Time Signal Processing, Third Edition Alan V. Oppenheim Ronald W.
Lecture 6: DFT XILIANG LUO 2014/10. Periodic Sequence  Discrete Fourier Series For a sequence with period N, we only need N DFS coefs.
OFDM DFT  DFT  Inverse DFT  An N-point DFT (or inverse DFT) requires a total of N 2 complex multiplications  This transform can be implemented very.
7- 1 Chapter 7: Fourier Analysis Fourier analysis = Series + Transform ◎ Fourier Series -- A periodic (T) function f(x) can be written as the sum of sines.
EEE 503 Digital Signal Processing Lecture #2 : EEE 503 Digital Signal Processing Lecture #2 : Discrete-Time Signals & Systems Dr. Panuthat Boonpramuk Department.
Linear filtering based on the DFT
DEPARTMENTT OF ECE TECHNICAL QUIZ-1 AY Sub Code/Name: EC6502/Principles of digital Signal Processing Topic: Unit 1 & Unit 3 Sem/year: V/III.
Fast Fourier Transforms. 2 Discrete Fourier Transform The DFT pair was given as Baseline for computational complexity: –Each DFT coefficient requires.
The Discrete Fourier Transform
Filters– Chapter 6. Filter Difference between a Filter and a Point Operation is that a Filter utilizes a neighborhood of pixels from the input image to.
Learning from the Past, Looking to the Future James R. (Jim) Beaty, PhD - NASA Langley Research Center Vehicle Analysis Branch, Systems Analysis & Concepts.
EE345S Real-Time Digital Signal Processing Lab Fall 2006 Lecture 17 Fast Fourier Transform Prof. Brian L. Evans Dept. of Electrical and Computer Engineering.
بسم الله الرحمن الرحيم Digital Signal Processing Lecture 14 FFT-Radix-2 Decimation in Frequency And Radix -4 Algorithm University of Khartoum Department.
1 Chapter 8 The Discrete Fourier Transform (cont.)
Chapter 4 Discrete-Time Signals and transform
CS 591 S1 – Computational Audio
DIGITAL SIGNAL PROCESSING ELECTRONICS
CE Digital Signal Processing Fall Discrete-time Fourier Transform
FFT-based filtering and the
(C) 2002 University of Wisconsin, CS 559
Fast Fourier Transform
General Functions A non-periodic function can be represented as a sum of sin’s and cos’s of (possibly) all frequencies: F() is the spectrum of the function.
2D Fourier transform is separable
Real-time 1-input 1-output DSP systems
4.1 DFT In practice the Fourier components of data are obtained by digital computation rather than by analog processing. The analog values have to be.
Chapter 8 The Discrete Fourier Transform
Chapter 9 Computation of the Discrete Fourier Transform
Lecture 15 Outline: DFT Properties and Circular Convolution
LECTURE 18: FAST FOURIER TRANSFORM
Lecture 16 Outline: Linear Convolution, Block-by Block Convolution, FFT/IFFT Announcements: HW 4 posted, due tomorrow at 4:30pm. No late HWs as solutions.
Digital Image Processing Week IV
Chapter 8 The Discrete Fourier Transform
Chapter 8 The Discrete Fourier Transform
Fast Fourier Transform
CE Digital Signal Processing Fall Discrete Fourier Transform (DFT)
LECTURE 18: FAST FOURIER TRANSFORM
Presentation transcript:

CS 179: GPU Programming Lecture 9 / Homework 3

Recap Some algorithms are “less obviously parallelizable”: – Reduction – Sorts – FFT (and certain recursive algorithms)

Parallel FFT structure (radix-2) Bit-reversed access Stage 1Stage 2Stage 3

cuFFT 1D example Correction: Remember to use cufftDestroy(plan) when finished with transforms

Today Homework 3 – Large-kernel convolution

Systems Given input signal(s), produce output signal(s)

LTI system review (Week 1)

Parallelization

A problem… This worked for small impulse responses – E.g. h[n], 0 ≤ n ≤ 20 in HW 1 Homework 1 was “small-kernel convolution”: – (Vocab alert: Impulse responses are often called “kernels”!)

A problem…

DFT/FFT Same problem with Discrete Fourier Transform! Successfully optimized and GPU-accelerated! – O(n 2 ) to O(n log n) – Can we do the same here?

“Circular” convolution “equivalent” of convolution for periodic signals

“Circular” convolution Linear convolution: Circular convolution:

Example: x[0..3], h[0..1] Linear convolution: y[0] = x[0]h[0] x0x1x2x3 h0h x0x1x2x h1h

Example: x[0..3], h[0..1] Linear convolution: y[0] = x[0]h[0] y[1] = x[0]h[1] + x[1]h[0] x0x1x2x3 h0h x0x1x2x h1h x0x1x2x h1h0 0000

Example: x[0..3], h[0..1] Linear convolution: y[0] = x[0]h[0] y[1] = x[0]h[1] + x[1]h[0] y[2] = x[1]h[1] + x[2]h[0] y[3] = x[2]h[1] + x[3]h[0] x0x1x2x3 h0h x0x1x2x h1h x0x1x2x h1h0 000

Example: x[0..3], h[0..1] Linear convolution: y[0] = x[0]h[0] y[1] = x[0]h[1] + x[1]h[0] y[2] = x[1]h[1] + x[2]h[0] y[3] = x[2]h[1] + x[3]h[0] y[4] = x[3]h[1] + x[4]h[0] x0x1x2x3 h0h100

Example: x[0..3], h[0..1] Linear convolution: y[0] = x[0]h[0] y[1] = x[0]h[1] + x[1]h[0] y[2] = x[1]h[1] + x[2]h[0] y[3] = x[2]h[1] + x[3]h[0] y[4] = x[3]h[1] + x[4]h[0] Circular convolution: y[0] = x[0]h[0] + x[3]h[1] + x[2]h[2] + x[3]h[1] y[1] = x[0]h[1] + x[1]h[0] + x[2]h[3] + x[3]h[2] y[2] = x[1]h[1] + x[2]h[0] + x[3]h[3] + x[0]h[2] y[3] = x[2]h[1] + x[3]h[0] + x[0]h[3] + x[1]h[2] x0x1x2x3 h0h100 x0x1x2x3 h0 00 h1 00 x0x1x2x h1h

Circular Convolution Theorem* * DFT case

Circular Convolution Theorem* * DFT case O(n log n) Assume n > m O(m log m) O(n) O(n log n) Total: O(n log n)

x[n] and h[n] are different lengths? How to linearly convolve using circular convolution?

Padding x[n] and h[n] – presumed zero where not defined – Computationally: Store x[n] and h[n] as larger arrays – Pad both to at least x.length + h.length - 1

Example: (Padding) x[0..3], h[0..1] Linear convolution: y[0] = x[0]h[0] y[1] = x[0]h[1] + x[1]h[0] y[2] = x[1]h[1] + x[2]h[0] y[3] = x[2]h[1] + x[3]h[0] y[4] = x[3]h[1] + x[4]h[0] Circular convolution: y[0] = x[0]h[0] + x[3]h[1] + x[2]h[2] + x[3]h[1] y[1] = x[0]h[1] + x[1]h[0] + x[2]h[3] + x[3]h[2] y[2] = x[1]h[1] + x[2]h[0] + x[3]h[3] + x[0]h[2] y[3] = x[2]h[1] + x[3]h[0] + x[0]h[3] + x[1]h[2]

Example: (Padding) x[0..3], h[0..1] Linear convolution: y[0] = x[0]h[0] y[1] = x[0]h[1] + x[1]h[0] y[2] = x[1]h[1] + x[2]h[0] y[3] = x[2]h[1] + x[3]h[0] y[4] = x[3]h[1] + x[4]h[0] Circular convolution: y[0] = x[0]h[0] + x[1]h[4] + x[2]h[3] + x[3]h[2] + x[4]h[1] y[1] = x[0]h[1] + x[1]h[0] + x[2]h[4] + x[3]h[3] + x[4]h[2] y[2] = x[1]h[1] + x[2]h[0] + x[3]h[4] + x[4]h[3] + x[0]h[2] y[3] = x[2]h[1] + x[3]h[0] + x[4]h[4] + x[0]h[3] + x[1]h[2] y[4] = x[3]h[1] + x[4]h[0] + x[0]h[4] + x[1]h[3] + x[2]h[2] N is now (4 + 2 – 1) = 5

Summary Alternate algorithm for large impulse response convolution! – Serial: O(n log n) vs. O(mn) Small vs. large m determines algorithm choice Runtime does “carry over” to parallel situations (to some extent)

Homework 3, Part 1 Implement FFT (“large-kernel”) convolution – Use cuFFT for FFT/IFFT (if brave, try your own) Use “batch” variable to save FFT calculations Correction: Good practice in general, but results in poor performance on Homework 3 Don’t forget padding – Complex multiplication kernel: Multiply the FFT values pointwise!

Complex numbers cufftComplex: cuFFT complex number type – Example usage: cufftComplex a; a.x = 3;// Real part a.y = 4;// Imaginary part Complex Multiplication: (a + bi)(c + di) = (ac - bd) + (ad + bc)i

Homework 3, Part 2 For a normalized Gaussian filter, output values cannot be larger than the largest input Not true in general Low Pass Filter Test Signal and Response

Normalization Amplitudes must lie in range [-1, 1] – Normalize s.t. maximum magnitude is 1 (or 1 - ε) How to find maximum amplitude?

Reduction This time, maximum (instead of sum) – Lecture 7 strategies – “Optimizing Parallel Reduction in CUDA” (Harris)

Homework 3, Part 2 Implement GPU-accelerated normalization – Find maximum (reduction) The max amplitude may be a negative sample – Divide by maximum to normalize

(Demonstration) Rooms can be modeled as LTI systems!

Other notes Machines: – Normal mode: haru, mako, mx, minuteman – Audio mode: haru, mako Due date: – Wednesday (4/20), 3 PM