Presentation is loading. Please wait.

Presentation is loading. Please wait.

David Hansen and James Michelussi. Introduction  Discrete Fourier Transform (DFT)  Fast Fourier Transform (FFT)  FFT Algorithm – Applying the Mathematics.

Similar presentations

Presentation on theme: "David Hansen and James Michelussi. Introduction  Discrete Fourier Transform (DFT)  Fast Fourier Transform (FFT)  FFT Algorithm – Applying the Mathematics."— Presentation transcript:

1 David Hansen and James Michelussi

2 Introduction  Discrete Fourier Transform (DFT)  Fast Fourier Transform (FFT)  FFT Algorithm – Applying the Mathematics  Implementations of DFT and FFT  Hardware Benchmarks  Conclusion

3 DFT  In 1807 introduced by Jean Baptiste Joseph Fourier.  allows a sampled or discrete signal that is periodic to be transformed from the time domain to the frequency domain  Correlation between the time domain signal and N cosine and N sine waves X(k) = DFT Frequency Signal N = Number of Sample Points X(n) = Time Domain Signal W N = Twiddle Factor

4 DFT (Walking Speed)  Why is this important? Where is this used?  allows machines to calculate the frequency domain  allows for the convolution of signals by just multiplying them together  Used in digital spectral analysis for speech, imaging and pattern recognition as well as signal manipulation using filters  But the DFT requires N 2 multiplications!

5 FFT (Jet Speed)  J. W. Cooley and J. W. Tukey are given credit for bringing the FFT to the world in the 1960s  Simply an algorithm for more efficiently calculating the DFT  Takes advantage of symmetry and periodicity in the twiddle factors as well as uses a divide and conquer method  Symmetry: W N r +N/2 = -W N r  Periodicity: W N r+N = W N r  Requires only (N/2)log 2 (N) multiplications !  Faster computation times  More precise results due to less round-off error

6 FFT Algorithm  Several different types of FFT Algorithms (Radix-2, Radix-4, DIT & DIF)  Focus on Radix-2 using Decimation in Time (DIT) method  Breaks down the DFT calculation into a number of 2-point DFTs  Each 2-point DFT uses an operation called the Butterfly  These groups are then re-combined with another group of two and so on for log 2 (N) stages  Using the DIT method the input time domain points must be reordered using bit reversal

7 Butterfly Operation

8 Bit Reversal

9 8-Point Radix-2 FFT Example


11 David Hansen Implementations of DFT and FFT

12 DFT Implementation  Nested For Loop, (N/2)*N Iterations… O(N 2 )  Cycles / Sample (123 cycles per inner loop iteration)  Obvious Inefficiencies, cos and sin math.h functions  Efficient assembly coding could reduce the inner loop to 3 cycles per iteration (1,536 cycles / sample) for (r=0; r<=samples/2; r++) { float re = 0.0f, im = 0.0f; float part = (float)r * -2.0f * PI / (float)samples; for (k=0; k

13 C++ FFT Implementation void fft_float (unsigned NumSamples, float *RealIn, float *ImagIn, float *RealOut, float *ImagOut ) { for ( i=0; i < NumSamples; i++ ) { // Iterate over the samples and perform the bit-reversal j = ReverseBits ( i, NumBits ); } BlockEnd = 1; // Following loop iterates Log 2 (NumSamples) for ( BlockSize = 2; BlockSize <= NumSamples; BlockSize <<= 1 ) { // Perform Angle Calculations (Using math.h sin/cos) // Following 2 loops iterate over NumSamples/2 for ( i=0; i < NumSamples; i += BlockSize ) { for ( j=i, n=0; n < BlockEnd; j++, n++ ) { // Perform butterfly calculations } BlockEnd = BlockSize; }

14 C++ FFT Implementation  Bit-Reverse For Loop – N iterations  Nested For Loops  First Outer Loop – Log 2 (N) iterations  Made use of sin/cos math.h functions  Second Outer Loop – N / BlockSize iterations  Inner Loop – BlockSize/2 iterations  O(N + Log 2 (N) * N/BlockSize * BlockSize/2)  O(N+N*Log 2 (N))  Cycles / Sample

15 Assembly FFT Implementation  Bit-Reverse Address Generation  Hide Bit-Reverse operation inside first and second FFT Stages  Sin and Cos values stored in a Look-Up-Table  256 Kbyte LUT added to Data1  Needed to grow Data1 Memory Space using LDF file  Interleaved Real and Imaginary Arrays  Quad Reads Loads 2 Complex Points per Cycle  Supports the Real FFT for input signals with no Imaginary component  40% Algorithm-based Savings

16 Assembly FFT Implementation  Special Butterfly Instruction  Can perform addition/subtraction in parallel in one compute block  Speeds up the inner-most loop  VLIW and SIMD Operations  Performs simultaneous operations in both compute blocks  Loop unrolling and instruction scheduling keeps the entire processor busy with instructions.  Cycles per Sample

17 Assembly FFT Implementation _BflyLoop: q[j2+=4]=r27:26; k5=k5+k9; fr6=r30*r12; fr16=r6-r7;; yr3:0=q[j0+=4]; k3=k5 and k4; fr15=r23*r4; fr24=r8+r18, fr26=r8-r18;; xr3:0=q[j0+=4]; r5:4=l[k7+k3]; fr7=r31*r13; fr25=r9+r19, fr27=r9-r19;; q[j1+=4]=r25:24; fr14=r30*r13; fr17=r14+r15;; q[j2+=4]=r27:26; k5=k5+k9; fr6=r2*r4; fr18=r6-r7;; yr11:8=q[j0+=4]; k3=k5 and k4; fr15=r31*r12; fr24=r20+r16, fr26=r20-r16;; xr11:8=q[j0+=4]; r13:12=l[k7+k3]; fr7=r3*r5; fr25=r21+r17, fr27=r21-r17;; q[j1+=4]=r25:24; fr14=r2*r5; fr19=r14+r15;; q[j2+=4]=r27:26; k5=k5+k9; fr6=r10*r12; fr16=r6-r7;; yr23:20=q[j0+=4]; k3=k5 and k4; fr15=r3*r4; fr24=r28+r18, fr26=r28-r18;; xr23:20=q[j0+=4]; r5:4=l[k7+k3]; fr7=r11*r13; fr25=r29+r19, fr27=r29-r19;; q[j1+=4]=r25:24; fr14=r10*r13; fr17=r14+r15;; q[j2+=4]=r27:26; k5=k5+k9; fr6=r22*r4; fr18=r6-r7;; yr31:28=q[j0+=4]; k3=k5 and k4; fr15=r11*r12; fr24=r0+r16, fr26=r0-r16;; xr31:28=q[j0+=4]; r13:12=l[k7+k3]; fr7=r23*r5; fr25=r1+r17, fr27=r1-r17;;.align_code 4; if NLC0E, jump _BflyLoop;

18 DC FFT Test FFT Source ArrayFFT Output Magnitude

19 Audio FFT Test FFT Source ArrayFFT Output Magnitude

20 1024 Point DFT / FFT Comparison ImplementationCycles Per Sample DFT Implemented in C63, cycles / sample DFT Implemented in Assembly1,536 cycles / sample FFT Implemented in C cycles / sample FFT Implemented in Assembly11.35 cycles / sample

21 1024 Point Radix-2 FFT Hardware Comparison Processor ArchitectureCycles Per SampleProcessor FrequencyExecution Time ADSP (SHARC)8.98 cycles / sample400 MHz22.99 µSec TigerSHARC (website)9.16 cycles / sample600 MHz15.63 µSec TigerSHARC (our results)11.35 cycles / sample600 MHz19.37 µSec TMS320C6000™ cycles / sample350 MHz41.33 µSec TMS320DM644x™7.59 cycles / sample594 MHz13.08 µSec

22 Conclusion  The FFT algorithm is very useful when computing the frequency domain on a DSP.  FFT is much faster than a regular DFT algorithm  FFT is more precise by having less errors created due to round off.  The timed coding examples further support this claim and demonstrate how to code the algorithm.  The Radix-2 FFT isn’t the fastest but it uses a less complex addressing and twiddle factor routine  In this case (unlike in school) F is better then D.

Download ppt "David Hansen and James Michelussi. Introduction  Discrete Fourier Transform (DFT)  Fast Fourier Transform (FFT)  FFT Algorithm – Applying the Mathematics."

Similar presentations

Ads by Google