EEL 6586: AUTOMATIC SPEECH PROCESSING Windows Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 10, 2003
No, not MS Windows ® …
…not those either!
Speech windows Speech is NONSTATIONARY
Assume speech is stationary over ‘short’ window of time. ‘SEVEN’ Speech windows
What is a ‘short’ window of time? 10 μs: smallest difference detectable by auditory system (localization), 3 ms: shortest phoneme (plosive burst), 10 ms: glottal pulse period, 100 ms: average phoneme duration, 4 s: exhale period during speech. ‘Short’ depends on application.
Applications using windows Automatic speech recognition, Speech coding/decoding, Speaker identification, Text-to-speech synthesis, Noise reduction Typical window (frame) length: ms Typical frame rate: 100 frames/sec
Short-time analysis s(n) : entire speech utterance w(n) : window function x(n) : frame of speech Window function is non-zero for N samples, n=0,…,N-1
Short-term Fourier Transform s(m) : entire speech utterance w(m) : window function X(n,ω) : STFT of speech at time n STFT is a smoothed version of original spectrum.
STFT example s(n) : pure sinewave of infinite length w(n) : rectangular window:
STFT example |W(ω)| * |S(ω)| ω0ω0 ω0ω0 = |X(ω)|
Window types Rectangular Hann (cosine) Hamming (raised cosine) Blackman Kaiser-Bessel Tradeoff between leakage and blurring
Window tradeoff Blurring: main lobe width A Leakage: side lobe suppression B B A
Popular windows WindowUnit BWSidelobe Rectangle1-13 dB Hann2-31 dB Hamming2-43 dB Blackman3-68 dB Kaiser- Bessel 4-91 dB
Practical issues Rule of thumb: –Time domain, use Rectangle window –Freq domain, use Hamming window Why?
Time domain issues Correlation in time domain interfered by tapered windows 20 ms /eh/, male utterance, pitch measurement (normalized autocorrelation). First side peak lower using Hamming window
Frequency domain issues fs=12.5 KHz, /eh/, 800 samples, male speaker. Blurring/Leakage tradeoff evidence: