Download presentation

1
**David Hansen and James Michelussi**

Is F Better than D

2
**Introduction Discrete Fourier Transform (DFT)**

Fast Fourier Transform (FFT) FFT Algorithm – Applying the Mathematics Implementations of DFT and FFT Hardware Benchmarks Conclusion

3
**DFT In 1807 introduced by Jean Baptiste Joseph Fourier.**

allows a sampled or discrete signal that is periodic to be transformed from the time domain to the frequency domain Correlation between the time domain signal and N cosine and N sine waves X(k) = DFT Frequency Signal N = Number of Sample Points X(n) = Time Domain Signal WN = Twiddle Factor

4
**DFT (Walking Speed) Why is this important? Where is this used?**

allows machines to calculate the frequency domain allows for the convolution of signals by just multiplying them together Used in digital spectral analysis for speech, imaging and pattern recognition as well as signal manipulation using filters But the DFT requires N2 multiplications!

5
**FFT (Jet Speed) Requires only (N/2)log2(N) multiplications !**

J. W. Cooley and J. W. Tukey are given credit for bringing the FFT to the world in the 1960s Simply an algorithm for more efficiently calculating the DFT Takes advantage of symmetry and periodicity in the twiddle factors as well as uses a divide and conquer method Symmetry: WNr +N/2 = -WNr Periodicity: WNr+N = WNr Requires only (N/2)log2(N) multiplications ! Faster computation times More precise results due to less round-off error

6
FFT Algorithm Several different types of FFT Algorithms (Radix-2, Radix-4, DIT & DIF) Focus on Radix-2 using Decimation in Time (DIT) method Breaks down the DFT calculation into a number of 2-point DFTs Each 2-point DFT uses an operation called the Butterfly These groups are then re-combined with another group of two and so on for log2(N) stages Using the DIT method the input time domain points must be reordered using bit reversal

7
Butterfly Operation

8
Bit Reversal

9
**8-Point Radix-2 FFT Example**

10
**8-Point Radix-2 FFT Example**

11
**Implementations of DFT and FFT**

David Hansen

12
**DFT Implementation Nested For Loop, (N/2)*N Iterations… O(N2)**

for (r=0; r<=samples/2; r++) { float re = 0.0f, im = 0.0f; float part = (float)r * -2.0f * PI / (float)samples; for (k=0; k<samples; k++) float theta = part * (float)k; re += data_in[k] * cos(theta); im += data_in[k] * sin(theta); } Nested For Loop, (N/2)*N Iterations… O(N2) Cycles / Sample (123 cycles per inner loop iteration) Obvious Inefficiencies, cos and sin math.h functions Efficient assembly coding could reduce the inner loop to 3 cycles per iteration (1,536 cycles / sample)

13
**C++ FFT Implementation**

void fft_float (unsigned NumSamples, float *RealIn, float *ImagIn, float *RealOut, float *ImagOut ) { for ( i=0; i < NumSamples; i++ ) { // Iterate over the samples and perform the bit-reversal j = ReverseBits ( i, NumBits ); } BlockEnd = 1; // Following loop iterates Log2(NumSamples) for ( BlockSize = 2; BlockSize <= NumSamples; BlockSize <<= 1 ) // Perform Angle Calculations (Using math.h sin/cos) // Following 2 loops iterate over NumSamples/2 for ( i=0; i < NumSamples; i += BlockSize ) for ( j=i, n=0; n < BlockEnd; j++, n++ ) // Perform butterfly calculations BlockEnd = BlockSize;

14
**C++ FFT Implementation**

Bit-Reverse For Loop – N iterations Nested For Loops First Outer Loop – Log2(N) iterations Made use of sin/cos math.h functions Second Outer Loop – N / BlockSize iterations Inner Loop – BlockSize/2 iterations O(N + Log2(N) * N/BlockSize * BlockSize/2) O(N+N*Log2(N)) Cycles / Sample

15
**Assembly FFT Implementation**

Bit-Reverse Address Generation Hide Bit-Reverse operation inside first and second FFT Stages Sin and Cos values stored in a Look-Up-Table 256 Kbyte LUT added to Data1 Needed to grow Data1 Memory Space using LDF file Interleaved Real and Imaginary Arrays Quad Reads Loads 2 Complex Points per Cycle Supports the Real FFT for input signals with no Imaginary component 40% Algorithm-based Savings

16
**Assembly FFT Implementation**

Special Butterfly Instruction Can perform addition/subtraction in parallel in one compute block Speeds up the inner-most loop VLIW and SIMD Operations Performs simultaneous operations in both compute blocks Loop unrolling and instruction scheduling keeps the entire processor busy with instructions. 11.35 Cycles per Sample

17
**Assembly FFT Implementation**

_BflyLoop: q[j2+=4]=r27:26; k5=k5+k9; fr6=r30*r12; fr16=r6-r7;; yr3:0=q[j0+=4]; k3=k5 and k4; fr15=r23*r4; fr24=r8+r18, fr26=r8-r18;; xr3:0=q[j0+=4]; r5:4=l[k7+k3]; fr7=r31*r13; fr25=r9+r19, fr27=r9-r19;; q[j1+=4]=r25:24; fr14=r30*r13; fr17=r14+r15;; q[j2+=4]=r27:26; k5=k5+k9; fr6=r2*r4; fr18=r6-r7;; yr11:8=q[j0+=4]; k3=k5 and k4; fr15=r31*r12; fr24=r20+r16, fr26=r20-r16;; xr11:8=q[j0+=4]; r13:12=l[k7+k3]; fr7=r3*r5; fr25=r21+r17, fr27=r21-r17;; q[j1+=4]=r25:24; fr14=r2*r5; fr19=r14+r15;; q[j2+=4]=r27:26; k5=k5+k9; fr6=r10*r12; fr16=r6-r7;; yr23:20=q[j0+=4]; k3=k5 and k4; fr15=r3*r4; fr24=r28+r18, fr26=r28-r18;; xr23:20=q[j0+=4]; r5:4=l[k7+k3]; fr7=r11*r13; fr25=r29+r19, fr27=r29-r19;; q[j1+=4]=r25:24; fr14=r10*r13; fr17=r14+r15;; q[j2+=4]=r27:26; k5=k5+k9; fr6=r22*r4; fr18=r6-r7;; yr31:28=q[j0+=4]; k3=k5 and k4; fr15=r11*r12; fr24=r0+r16, fr26=r0-r16;; xr31:28=q[j0+=4]; r13:12=l[k7+k3]; fr7=r23*r5; fr25=r1+r17, fr27=r1-r17;; .align_code 4; if NLC0E, jump _BflyLoop;

18
DC FFT Test FFT Source Array FFT Output Magnitude

19
Audio FFT Test FFT Source Array FFT Output Magnitude

20
**1024 Point DFT / FFT Comparison**

Implementation Cycles Per Sample DFT Implemented in C 63, cycles / sample DFT Implemented in Assembly 1,536 cycles / sample FFT Implemented in C cycles / sample FFT Implemented in Assembly 11.35 cycles / sample

21
**1024 Point Radix-2 FFT Hardware Comparison**

Processor Architecture Cycles Per Sample Processor Frequency Execution Time ADSP (SHARC) 8.98 cycles / sample 400 MHz 22.99 µSec TigerSHARC (website) 9.16 cycles / sample 600 MHz 15.63 µSec TigerSHARC (our results) 11.35 cycles / sample 19.37 µSec TMS320C6000™ cycles / sample 350 MHz 41.33 µSec TMS320DM644x™ 7.59 cycles / sample 594 MHz 13.08 µSec

22
Conclusion The FFT algorithm is very useful when computing the frequency domain on a DSP. FFT is much faster than a regular DFT algorithm FFT is more precise by having less errors created due to round off. The timed coding examples further support this claim and demonstrate how to code the algorithm. The Radix-2 FFT isn’t the fastest but it uses a less complex addressing and twiddle factor routine In this case (unlike in school) F is better then D.

Similar presentations

OK

10/18/2013PHY 711 Fall 2013 -- Lecture 221 PHY 711 Classical Mechanics and Mathematical Methods 10-10:50 AM MWF Olin 103 Plan for Lecture 22: Summary of.

10/18/2013PHY 711 Fall 2013 -- Lecture 221 PHY 711 Classical Mechanics and Mathematical Methods 10-10:50 AM MWF Olin 103 Plan for Lecture 22: Summary of.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on leadership skills free download Ppt on generation of electricity from waste Ppt online examination project in php Ppt on networking related topics to biochemistry Download ppt on business cycle Ppt on society and culture Ppt on brand positioning strategy Ppt on javascript events status Science ppt on crop production and management Download ppt on nutrition in animals for class 7