Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ch4 Short-time Fourier Analysis of Speech Signal z Fourier analysis is the spectrum analysis. It is an important method to analyze the speech signal. Short-time.

Similar presentations


Presentation on theme: "Ch4 Short-time Fourier Analysis of Speech Signal z Fourier analysis is the spectrum analysis. It is an important method to analyze the speech signal. Short-time."— Presentation transcript:

1 Ch4 Short-time Fourier Analysis of Speech Signal z Fourier analysis is the spectrum analysis. It is an important method to analyze the speech signal. Short-time Fourier analysis is a stationary analytic method to process the non-stationary signal (speech signal). It is also called time dependent Fourier transformation.

2 4.1 Short-time Fourier Transformation (1) z4.1.1 Definition of Short-time Fourier Transformation  X n (e j ω ) = Σ x(m)w(n-m) e -jωm where n is discrete and ωis continuous yIt is called short-time Fourier transform function or time-frequency function  two interpretations: n = n 0, it is a spectrum function; ω= ω 0, it is a output of bandpass filter w(n) whose center frequency is ω 0.

3 4.2 Spectrograms Based on Short- time Fourier Transformation(1) z4.2.1 Frequency energy density function P n (ω) z P n (ω) = |X(expjω)| 2 = ΣR n (k)exp(jωk) zR n (k)= Σx(m)w(n-m)x(m+k)w(n-m-k) m=- ∞~∞ zNote: if window length is L, R n (k) has length 2L zIf we make the picture according to P n (ω) : zthe x axis is time, the y axis is frequency, the pixel’s greygrade is P n (ω), and the picture is called spectrogram (or sonogram).

4 Spectrograms Based on Short-time Fourier Transformation(2) z4.2.2 Frequency resolution zAccording to previous interpretation, n is fixed. X n (expjω) is the spectrum. x(n) times w(n) corresponds the convolution of X(ω) and W(ω). So the bandwidth of W(ω)  b will affect the frequency resolution. If high frequency resolution is required, b should be small and N should large (b~1/N), that means window length should be large.

5 Spectrograms Based on Short-time Fourier Transformation(3) z4.2.3 Time resolution zAccording to previous second interpretation, ω is fixed. The role of w(n) corresponds a low- pass filter for x(n) exp(jnω k ). The bandwidth of output is the bandwidth of w(n)  b. According to sampling theorem, sample rate is 2b. The time resolution is 1/(2b). If high time resolution is required, b should be large, and N should be small. These two resolutions are contradictory.

6 Spectrograms Based on Short-time Fourier Transformation(4) z 4.2.4 Sonogram of wide or narrow bands zFor practical purpose sometime we need both. zWide band has window length 6.4ms, narrow band 51.2ms (examples), a window with 1s length has 2Hz bandwidth. So the frequency resolution for two case are 39Hz(narrow) and 313Hz(wide). Wide for seeing formants. Narrow for seeing the change of pitch and structure of harmonic wave.

7 4.4 Perceptually Motivated Representations (1) z4.4.1 The Bark and Mel Scales zFleccher’s work pointed to the existence of critical bands in the cochlear response. Critical bands are of great importance in understanding many auditory phenomena such as perception of loudness, pitch and timbre. The auditory system performs frequency analysis of sounds into their component frequencies. One class of critical band is called Bark frequency scale. It is hoped that by treating spectral energy over the Bark scale, a more natural fit with spectral information processing in the ear can be achieved. The Bark scale ranges from 1 to 24 Barks, corresponding to 24 critical bands of hearing :

8 Perceptually Motivated Representations (2) zBark Band # Edge(Hz) Center(Hz) 1100 50 2200 150 3300 250 4400 350 5510 450 6630 570 7770 700 8920 840 9 1080 1000 10 1270 1170 11 1480 1370 12 1720 1600

9 Perceptually motivated Representations (3) zBark Band #Edge(Hz) Center(Hz) 13 20001850 14 23202150 15 27002500 16 31502900 17 37003400 18 44004000 19 53004800 20 64005800 21 77007000 22 95008500 23 12000 10500 24 15500 13500

10 Perceptually Motivated Representations (4) z4.4.2 Mel scale frequency cepstrum zMel scale is another scale such that 1000Hz correspond to 1000 mels: Mel(f) = 1125ln(1+f/700) zHow to get the MFCC : Xa[k] = Σ n=0 N-1 x(n)e -j2nk/N S[m] = ln[Σ k=0 N-1 |X a [k]| 2 H m (k)]

11 Perceptually Motivated Representations (5) H m [k] is a triangle filter : 0 k<f[m-1] H m [k]=2(k-f[m-1])/{(f[m+1]-f[m-1])(f[m]-f[m-1])} f[m-1]<=k<=f[m] 2(f[m+1]-k)/{(f[m+1]-f[m-1])(f[m+1]-f[m])} f[m]<=k<=f[m+1] 0 k>f[m+1]

12 Perceptually Motivated Representations (6) zc(n) = Σ m=0 N-1 S[m]cos(n(m+1/2)/M) 0<=n<=M zM is 24-40. c(n) only take the first 12-13. zMFCC is extensively used in speech circle. zBesides MFCC themselves, the first order and second order of the differences of these coefficients are used as components of the feature vector. zd n (t)= Σ j=1 L j(c n [t+j]-c n [t-j]/(2Σ j=1 L j 2 ) n=1~12 za n (t)= Σ j=1 L j(d n [t+j]-d n [t-j]/(2Σ j=1 L j 2 ) n=1~12


Download ppt "Ch4 Short-time Fourier Analysis of Speech Signal z Fourier analysis is the spectrum analysis. It is an important method to analyze the speech signal. Short-time."

Similar presentations


Ads by Google