Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spectral Analysis Models

Similar presentations


Presentation on theme: "Spectral Analysis Models"— Presentation transcript:

1 Spectral Analysis Models
(a) Pattern Recognition (b) Acoustic phonetic approaches to speech recognition

2 Spectral Analysis Models
LPC analysis model

3 THE BANK-OF-FILTERS FRONT- END PROCESSOR
Complete bank-of-filter analysis model

4 THE BANK-OF-FILTERS FRONT- END PROCESSOR

5 THE BANK-OF-FILTERS FRONT- END PROCESSOR
Typical waveforms and spectra for analysis of a pure sinusoid in the filter-bank model

6 THE BANK-OF-FILTERS FRONT- END PROCESSOR
Typical waveforms and spectra of a voice speech signal in the bank-of-filters analysis model

7 THE BANK-OF-FILTERS FRONT- END PROCESSOR
Ideal (a) and realistic (b) set of filter responses of a Q-channel filter bank covering the frequency range Fs/N to (Q+1/2)Fs/N

8 Types of Filter Bank Used for Speech Recognition

9 Non-uniform Filter Banks

10 Nonuniform Filter Banks

11 Types of Filter Bank Used for Speech Recognition

12 Types of Filter Bank Used for Speech Recognition
Ideal specification of a 4-channel octave band-filter bank (a), a 12-channel third-octave band filter bank (b), and a 7-channel critical band scale filter bank (c) covering the telephone bandwidth range ( HZ) The variation of bandwidth with frequency for the perceptually based critical band scale

13 Implementations of Filter Banks
Instead of direct convolution, which is computationally expensive, we assume each bandpass filter impulse response to be represented by: hi(n), i-th bandpass filter impulse response, is represented by a fixed lowpass window, w(n), modulated by complex exponential Where w(n) is a fixed lowpass window representing the

14 Implementations of Filter Banks
The signals s(m) and w(n-m) used in evaluation of the short-time Fourier transform

15 Frequency Domain Interpretation of the Short-Time Fourier Transform
VALUE SAMPLE LOG MAGNITUDE (dB) FREQUENCY Short-time Fourier transform using a long (500 points or 50 msec) Hamming window on a section of voiced speech

16 Frequency Domain Interpretation of the Short-Time Fourier Transform
VALUE SAMPLE LOG MAGNITUDE (dB) FREQUENCY Short-time Fourier transform using a short (50 points or 5 msec) hamming window on a section of voiced speech

17 Frequency Domain Interpretation of the Short-Time Fourier Transform
VALUE SAMPLE LOG MAGNITUDE (dB) FREQUENCY Short-time Fourier transform using a long (500 points or 50 msec) hamming window on a section of unvoiced speech

18 Frequency Domain Interpretation of the Short-Time Fourier Transform
VALUE SAMPLE LOG MAGNITUDE (dB) FREQUENCY Short-time Fourier transform using a short (50 points or 5 msec) hamming window on a section of unvoiced speech

19 Linear Filter Interpretation of the STFT

20 FFT Implementation of a Uniform Filter Bank

21 Direct implementation of an arbitrary filter bank

22 Nonuniform FIR Filter Bank Implementations
Two arbitrary nonuniform filter-bank filter specifications consisting of eighter 3 bands (part a) or 7 bands (part b).

23 Tree Structure Realizations of Nonuniform Filter Banks

24 Practical Examples of Speech-Recognition Filter Banks
VALUE TIME IN SAMPLES MAGNITUDE (dB) FREQUENCY (kHz)

25 Practical Examples of Speech-Recognition Filter Banks
MAGNITUDE (dB) FREQUENCY (kHz) Window sequence, w(n), (part a), the individual filter response (part b), and the composite response (part c) of a Q = 15 channel, uniform filter bank, designed sing a 101-point Kaiser window smoothed lowpass window (after Dautrich et al).

26 Practical Examples of Speech-Recognition Filter Banks
VALUE TIME IN SAMPLES MAGNITUDE (dB) FREQUENCY (kHz)

27 Practical Examples of Speech-Recognition Filter Banks
MAGNITUDE (dB) FREQUENCY (kHz) Window sequence, w(n), (part a), the individual filter response (part b), and the composite response (part c) of a Q = 15 channel, uniform filter bank, designed sing a 101-point Kaiser window directly as the lowpass window (after Dautrich et al).

28 Generalizations of Filter-Bank Analyzer

29 Generalizations of Filter-Bank Analyzer

30 Generalizations of Filter-Bank Analyzer

31 Generalizations of Filter-Bank Analyzer

32 The Real Cepstrum Goal: Deconvolve spectrum for multiplicative processes In practice, we use the “real” cepstrum: V(f) and U(f) manifest themselves at the low and high of the “quefrency” domain respectively. We can derive cepstral parameters directly from LP analysis: To obtain the relationship between cepstral and predictor coefficients, we can differentiate both sides is taken with respect to z-1:

33 The Real Cepstrum (cont.)
Which simplifies to: Note that the order of the cepstral coefficients need not be the same as the order of the LP model. Typically LP coefficients are used to generate cepstral coefficients.

34 The Signal Model (front-end)

35 A Typical Front-End

36

37

38

39 روش MFCC روش MFCC مبتني بر نحوه ادراک گوش انسان از اصوات مي باشد.
واحد شنيدار گوش انسان Mel مي باشد که به کمک رابطه زير بدست مي آيد:

40 مراحل روش MFCC مرحله 1: نگاشت سيگنال از حوزه زمان به حوزه فرکانس به کمک FFT زمان کوتاه. : سيگنال گفتارZ(n) : تابع پنجره مانند پنجره همينگW(n( WF= e-j2π/F m : 0,…,F – 1; : طول فريم گفتاري.F

41 مراحل روش MFCC مرحله 2: يافتن انرژي هر کانال بانک فيلتر.
تابع فيلترهاي بانک فيلتر است.

42 توزيع فيلتر مبتنی بر معيار مل

43 مراحل روش MFCC مرحله 4: فشرده سازي طيف و اعمال تبديل DCT جهت حصول به ضرايب MFCC در رابطه بالا L،...،0=n مرتبه ضرايب MFCC ميباشد.

44 The Mel Cepstrum Approach
Mel-scaling Windowing IDCT |FFT|2 Low-order coefficients Differentiator Cepstra Delta & Delta Delta Cepstra Time Signal Logarithm

45 Time-Frequency analysis
Short-term Fourier Transform Standard way of frequency analysis: decompose the incoming signal into the constituent frequency components. W(n): windowing function N: frame length p: step size Speech varies along time Quasi-stationary can be assumed

46 Critical band integration
Related to masking phenomenon: the threshold of a sinusoid is elevated when its frequency is close to the center frequency of a narrow-band noise Frequency components within a critical band are not resolved. Auditory system interprets the signals within a critical band as a whole The relation between the critical bandwidth with the frequency. Below and above 1 kHz

47 Bark scale Describes the frequency dependent bandwidth of a masking signal over a sinusoidal signal.

48 Feature orthogonalization
Spectral values in adjacent frequency channels are highly correlated The correlation results in a Gaussian model with lots of parameters: have to estimate all the elements of the covariance matrix Decorrelation is useful to improve the parameter estimation.

49 Cepstrum Computed as the inverse Fourier transform of the log magnitude of the Fourier transform of the signal The log magnitude is real and symmetric -> the transform is equivalent to the Discrete Cosine Transform. Approximately decorrelated

50 Principal Component Analysis (PCA)
Mathematical procedure that transforms a number of (possibly) correlated variables into a (smaller) number of uncorrelated variables called principal components (PC) Find an orthogonal basis such that the reconstruction error over the training set is minimized This turns out to be equivalent to diagonalize the sample autocovariance matrix Complete decorrelation Computes the principal dimensions of variability, but not necessarily provide the optimal discrimination among classes

51 PCA (Cont.) Algorithm Eigen values Eigen vectors Covariance matrix
Apply Transform Output = (R- dim vectors) Input= (N-dim vectors) Covariance matrix Transform matrix Eigen values Eigen vectors Algorithm

52 PCA (Cont.) PCA in speech recognition systems

53 Linear discriminant Analysis
Find an orthogonal basis such that the ratio of the between-class variance and within-class variance is maximized This also turns to be a general eigenvalue-eigenvector problem Complete decorrelation Provide the optimal linear separability under quite restricted assumption

54 PCA vs. LDA

55 Spectral smoothing Formant information is crucial for recognition
Enhance and preserve the formant information: Truncating the number of cepstral coefficients Linear prediction: peak-hugging property

56 Temporal processing To capture the temporal features of the spectral envelop; to provide the robustness: Delta Feature: first and second order differences; regression Cepstral Mean Subtraction: For normalizing for channel effects and adjusting for spectral slope

57 RASTA (RelAtive SpecTral Analysis)
Filtering of the temporal trajectories of some function of each of the spectral values; to provide more reliable spectral features This is usually a bandpass filter, maintaining the linguistically important spectral envelope modulation (1-16Hz)

58

59 RASTA-PLP

60 Perceptual Linear Prediction
Goals: Apply greater weight to perceptually-important portions of the spectrum Avoid uniform weighting across the frequency band Algorithm: Compute the spectrum via a DFT Warp the spectrum along the Bark frequency scale Convolve the warped spectrum with the power spectrum of the simulated critical band masking curve and downsample (to typically 18 spectral samples) Preemphasize by the simulated equal-loudness curve Simulate the non-linear relationship between intensity and perceived loudness by performing a cubic-root amplitude compression Compute an LP model Claims: Improves speaker independent recognition performance Increases robustness to noise, variations in the channel, and microphons.

61


Download ppt "Spectral Analysis Models"

Similar presentations


Ads by Google