Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiresolution STFT for Analysis and Processing of Audio

Similar presentations


Presentation on theme: "Multiresolution STFT for Analysis and Processing of Audio"— Presentation transcript:

1 Multiresolution STFT for Analysis and Processing of Audio
Talk at B.U. Sept. 2010 Multiresolution STFT for Analysis and Processing of Audio Alexey Lukin Moscow State University, Russia; iZotope Inc., Cambridge, MA

2 Short-Time Fourier Transform
Most commonly used transform for audio: Spectral analysis Noise reduction (spectral subtraction algorithm) Time-variable filters and other effects Very fast implementation for a large number of bands via FFT Good energy compaction for many musical signals Many oscillations in basis functions → ringing (Gibbs phenomenon) Uniform frequency resolution → inadequate resolution at low freqs. + A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

3 Short-Time Fourier Transform
Spectrogram: displays evolution of spectrum in time A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

4 Spectrograms Problems:
Most perceptually meaningful energy is concentrated in a narrow band below 4 kHz → can’t see enough details Time/frequency resolution trade-off Conventional STFT spectrogram (linear frequency scale) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

5 A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”
Spectrograms Problems: Poor frequency resolution at low frequencies → can’t separate bass harmonics from the bass drum Time/frequency resolution trade-off Mel-scale STFT spectrogram (window size = 12 ms) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

6 A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”
Spectrograms Problems: Poor time resolution at transients → time-smearing of drums and other percussive sounds Mel-scale STFT spectrogram (window size = 93 ms) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

7 A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”
Filter banks Idea: Decompositions of a time-frequency plane Decomposition Processing of subband signals Synthesis x[n] y[n] f t STFT DWT Uncertainty principle A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

8 Filter banks Perceptual coding of audio mp3 file x[n] FFT Filter bank
Q Huffman Psychoacoustic model Diagram of an mp3 encoder A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

9 A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”
Filter banks Window size switching (guided by transients detection) Transient Pre-echo Reduced pre-echo A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

10 Proposed approach Transforms should vary
their time-frequency resolution in a perceptually motivated way Imitation of time-frequency resolution of human hearing Adaptation of resolution to local signal features A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

11 A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”
Spectrograms Simple solution: Combine spectrograms with different resolutions: take bass from a spectrogram with good frequency resolution, take treble from a spectrogram with good time resolution Combined resolution spectrogram (window sizes from 12 to 93 ms) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

12 A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”
Spectrograms Simple solution: combine spectrograms with different resolutions Each spectrogram is computed on the same grid of time-frequency points (using zero padding) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

13 A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”
Spectrograms Better approach: select best resolution for each time-frequency neighborhood Criteria? Better frequency resolution at bass (reflects a-priori psychoacoustical knowledge) Maximal energy compaction (to minimize spectral smearing in both time and frequency, i.e. maximize sparsity) best 6 ms 12 ms 24 ms 48 ms 96 ms STFT window size A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

14 A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”
Spectrograms Calculation of sparsity (in a given block, for all T/F resolutions r) Here ai,r are STFT magnitudes in the block, Sr is the spectrum sparsity for the given resolution r, r0 is the resolution with best sparsity. best 6 ms 12 ms 24 ms 48 ms 96 ms STFT window sizes A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

15 A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”
Spectrograms Benefits: Sharper bass drum hits and other transients, even in mid-frequency range Sharper guitar harmonics at high frequencies Adaptive resolution spectrogram (window sizes from 12 to 93 ms) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

16 A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”
Spectrograms Simple solution: Combine spectrograms with different resolutions: take bass from a spectrogram with good frequency resolution, take treble from a spectrogram with good time resolution Combined resolution spectrogram (window sizes from 12 to 93 ms) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

17 A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”
Spectrograms More examples Conventional STFT spectrogram Tone onset waveform A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

18 A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”
Spectrograms More examples Adaptive resolution spectrogram Combined resolution spectrogram A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

19 A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”
Processing framework General framework for multi-resolution processing Perform processing with several different resolutions Adaptively combine (mix) results in a time-frequency space Mixing is controlled by a-priori knowledge of psychoacoustics and analysis of local signal features (e.g. transience or sparsity) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

20 A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”
Noise reduction Spectral subtraction algorithm STFT of a noisy signal Estimate power spectrum of noise (manually or automatically) Subtract noise power spectrum from a signal power spectrum Inverse STFT STFT Noise spectrum estimation Inverse x[t] X[f,t] W[f] S[f,t] s[t] A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

21 A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”
Noise reduction Example of adaptive resolution Better frequency resolution at low frequencies (according to the resolution of human hearing) Better temporal resolution near signal transients (for reduction of Gibbs phenomenon) Spectral subtraction (short windows) of coefficients Mixer y[t] x3[t] (long windows) STFT Synthesis x1[t] x2[t] Transience analysis control A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

22 A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”
Noise reduction Results of single-resolution and multi-resolution algorithms Noisy recording (guitar + castanets) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

23 Noise reduction Results of single-resolution and multi-resolution algorithms Single resolution Multi-resolution (notice less pre-ringing on transients) A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

24 Conclusion When using STFT – do care about the window size!
Choose the size wisely: Maximize sparsity (spactrogram sharpness) Account for human perception A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”

25 ? Your questions Demo web page: http://www.izotope.com/tech/aes_adapt/
A. Lukin, J. Todd “Adaptive Time-Frequency Resolution”


Download ppt "Multiresolution STFT for Analysis and Processing of Audio"

Similar presentations


Ads by Google