Presentation is loading. Please wait.

Presentation is loading. Please wait.

EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003.

Similar presentations


Presentation on theme: "EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003."— Presentation transcript:

1 EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003

2 No, not MS Windows ® …

3 …not those either!

4 Speech windows Speech is NONSTATIONARY

5 Assume speech is stationary over ‘short’ window of time. ‘SEVEN’ Speech windows

6 What is a ‘short’ window of time? 10 μs: smallest difference detectable by auditory system (localization), 3 ms: shortest phoneme (plosive burst), 10 ms: glottal pulse period, 100 ms: average phoneme duration, 4 s: exhale period during speech. ‘Short’ depends on application.

7 Applications using windows Automatic speech recognition, Speech coding/decoding, Speaker identification, Text-to-speech synthesis, Noise reduction Typical window (frame) length: 20-30 ms Typical frame rate: 100 frames/sec

8 Short-time analysis s(n) : entire speech utterance w(n) : window function x(n) : frame of speech Window function is non-zero for N samples, n=0,…,N-1

9 Short-term Fourier Transform s(m) : entire speech utterance w(m) : window function X(n,ω) : STFT of speech at time n STFT is a smoothed version of original spectrum.

10 STFT example s(n) : pure sinewave of infinite length w(n) : rectangular window:

11 STFT example |W(ω)| * |S(ω)| ω0ω0 ω0ω0 = |X(ω)|

12 Window types Rectangular Hann (cosine) Hamming (raised cosine) Blackman Kaiser-Bessel Tradeoff between leakage and blurring

13 Window tradeoff Blurring: main lobe width A Leakage: side lobe suppression B B A

14 Popular windows WindowUnit BWSidelobe Rectangle1-13 dB Hann2-31 dB Hamming2-43 dB Blackman3-68 dB Kaiser- Bessel 4-91 dB

15 Practical issues Rule of thumb: –Time domain, use Rectangle window –Freq domain, use Hamming window Why?

16 Time domain issues Correlation in time domain interfered by tapered windows 20 ms /eh/, male utterance, pitch measurement (normalized autocorrelation). First side peak lower using Hamming window

17 Frequency domain issues fs=12.5 KHz, /eh/, 800 samples, male speaker. Blurring/Leakage tradeoff evidence:


Download ppt "EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003."

Similar presentations


Ads by Google