CS 591 S1 – Computational Audio

CS 591 S1 – Computational Audio
Wayne Snyder Computer Science Department Boston University Lecture 14 Issues with the DFT: Resolution "leakage" and "spray” Hann Windows Lecture 15 The FFT: Fast DFT The Inverse FFT: Fast Synthesis of Musical Signals Convolution and Reverberation 1

Digital Audio Fundamentals: The Discrete Fourier Transform
Summary from last time…..

Note: Spectrum consists of complex numbers (with amplitude and phase) indicating SUM (not average) of product of probe waves with test wave (must divide by len(X) to get amplitude). Components of test wave are split among positive and negative frequencies at half amplitude. Negative frequencies occur in last half of spectrum of len(X): for len(X) = 44100, this would be: … …

Interpreting Outputs from the Discrete Fourier Transform: Since negative amplitudes and phase shifts by π have the same effect, and using complex numbers ignores the effect of phase, there are no negative amplitudes. There are negative frequencies, but there is always a “mirror image” among positive and negative frequencies. Spectrum: [ ( -880, 0.8, 0 ), (1760, -0.6, 0), (2640, 0.4, 2.3) ]

The function np.fft.rfft(….) returns the spectrum as positive frequencies only, again, with the sum of the amplitudes instead of the average (simply divide the magnitudes by len(X): S[0]: ( e-08+0j) ∠ S[1]: ( e e-07j) ∠ S[2]: ( j) ∠ 1.0 S[3]: ( j) ∠ 2.0 S[4]: ( e e-06j) ∠ 1.319

Interpreting Outputs from the Discrete Fourier Transform: By adding together the negative and positive amplitudes (or simply doubling), we obtain the spectrum that represents the actual components of the real signal….. No matter what the phase….. Spectrum: [ ( 880, 0.8, 1 ), (1760, 0.6, 2.3), (2640, 0.4, -1.2) ]

Window-Based Analysis using the DFT We are going to start to look at spectra for windows of musical signals and trying to understand what they tell us about the musical performance. There is a tradeoff between Temporal Resolution – What is the shortest musical event we can observe? Spectral Resolution – How many frequencies can we measure?  Window of W Samples 

There is a tradeoff between Temporal Resolution – What is the shortest musical event we can observe? Spectral Resolution – How many frequencies can we measure? The time period of the window is W/SampleRate, and the lowest window frequency (and the increment for the spectrum) is Sample Rate/W, e.g., 2000 samples lasts 2000/44100 = seconds, which is a window frequency of Hz  Window of W Samples 

So: There is a tradeoff between Temporal Resolution – What is the shortest musical event we can observe? Spectral Resolution – How many frequencies can we measure? In a window of W samples, we can measure a fundamental frequency f whose period is the same as the window, and the harmonics 2 *f, 3* f ….. (N/2)* f, thus, in a window of 2000 samples, we can measure Hz, 44.1, 88.2, …, 2205. The measureable frequencies are multiples of the window frequency f. These are the ONLY frequencies we can measure.  Window of W Samples 

Fourier Transform It is instructive to consider how the frequencies correspond to the frequencies of the chromatic scale as exhibited on a piano keyboard:

Digital Audio Fundamentals: Discrete Sine Transform
Let’s assume that 1/10th of a second is the longest we could make a window and still get musically-meaningful results in the general case (e.g., the spectrum analyzer I demonstrated a few weeks back uses this window size). Then the window size n = 4410 would be able to detect 10 Hz, 20 Hz, etc. with a granularity of 10 Hz up to the Nyquist Limit. Here is how these match up to the frequencies in the octave starting at middle C:

Digital Audio Fundamentals: Multiplying Sine Waves
You see the problem: Only one note matches (A4)! Worse, the detectable frequencies are in (musically) random locations between the notes; here are the detectable frequencies right around Middle C: 240 250 260 270 280 B 246.9 C 261.6 C# 277.2

And since the detectable frequencies are in a linear scale, and the piano frequencies in a logarithmic scale, there are twice as many detectable frequencies per octave as we go up the keyboard. Here are the top two notes: 4190 3950 B 3951.0 C 4186.0

In the lowest complete octave of the piano, the detectable frequencies look like this! 30 40 A 27.5 Bb 29.1 B 30.9 C 32.7 C# 34.6 D 32.7 Eb 34.6 E 41.2

The problem is NOT that it is impossible to find a window size and a window frequency which can detect musical frequencies, the problem is that in one run of the fft, you must choose a single window size…..

What happens when the frequencies in the test wave does not match any of the frequencies in the DFT? The problem is in the “non-integral” components of the signal, which do not fit precisely in the window; these will be interpreted noisily by the DFT:

The problem is with the “incomplete” waveforms at the edges of the rectangular window:

When frequencies in the test signal match the DFT probe frequencies exactly, you get a sharp spike which gives the amplitude; when they do NOT, you get “spray” or “leakage” around the test frequency. In this video, the DFT tests for frequencies , 10, 20, ……380, 390, 400, 410, 420, … , and the test signal is a single sine wave which I vary from 380 to 420 Hz in units of 0.1 Hz:

The typical solution used is to de-emphasize the signal components at the edges, by tapering the amplitude of the signal using either a triangular function (which is used to modify the amplitude of the signal)(: W(n,N) = 1 – Abs[(n – (N/2))/(N/2)] for 1 <= n <= N

Or some more complex function: However you do it, these modifications will be interpreted as noise; the author of Musimathics called this “leakage, because energy that should be in one spectral harmonic spreads away (leaks) into adjacent harmonics” (p.139)

Let’s try two of these in our experiment on a non-integral frequency of 50.4 Hz, using the Triangular and the Hann Windows:

Let’s try two of these in our experiment on a non-integral frequency of 50.4 Hz, using the Rectangular (as before), the Triangular, and the Hann Windows:

Clearly both of these reduce the “leakage,” at the cost of reducing the amplitude; how does this play out in the operation of the DFT in a more complex signal? First let’s try integral components and the rectangular window. We’ll calculate the mean absolute error…..

Next let’s try integral components and the triangular window.

Next let’s try integral components and the Hann window. Punchline: Triangular and Hann windows, because they reduce the overall sum of the amplitudes by 0.5, get amplitude off by factor of 0.5, and after this correction, Hann Window does a perfect job on integral components.

Conclusions on windowing for the DFT: (1) Window size determines frequency resolution: given a window size of N samples, with a fundamental frequency of F = Sample Rate / N, we can only probe for the integral frequencies (the harmonics of F): F, 2*F, 3*F, …., k*F, ….., ceiling(Sample Rate / 2) - 1 Any other frequencies will be subject to the “picket fence” problem and only approximated. (2) Non-integral frequencies cause “leakage” to adjacent integral frequencies; good windowing functions (e.g., Hann) mitigate leakage effects and provide reasonably accurate measurements of amplitude of components, after correction.

Professional tools such as Electroacoustics Toolbox allow you to set these features, as well as window length, whether windows overlap, whether and how to average the successive measurements, whether and how to weight the measures to the psychoacoustical properties of human hearing, how to display the result, etc., etc., etc. and to output the analysis to a file.

CS 591 S1 – Computational Audio

Similar presentations

Presentation on theme: "CS 591 S1 – Computational Audio"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CS 591 S1 – Computational Audio

Similar presentations

Presentation on theme: "CS 591 S1 – Computational Audio"— Presentation transcript:

Similar presentations

About project

Feedback