Spectral Analysis Models

Spectral Analysis Models
(a) Pattern Recognition (b) Acoustic phonetic approaches to speech recognition

Spectral Analysis Models
LPC analysis model

THE BANK-OF-FILTERS FRONT- END PROCESSOR
Complete bank-of-filter analysis model

Typical waveforms and spectra for analysis of a pure sinusoid in the filter-bank model

Typical waveforms and spectra of a voice speech signal in the bank-of-filters analysis model

Ideal (a) and realistic (b) set of filter responses of a Q-channel filter bank covering the frequency range Fs/N to (Q+1/2)Fs/N

Types of Filter Bank Used for Speech Recognition

Non-uniform Filter Banks

Nonuniform Filter Banks

Ideal specification of a 4-channel octave band-filter bank (a), a 12-channel third-octave band filter bank (b), and a 7-channel critical band scale filter bank (c) covering the telephone bandwidth range ( HZ) The variation of bandwidth with frequency for the perceptually based critical band scale

Implementations of Filter Banks
Instead of direct convolution, which is computationally expensive, we assume each bandpass filter impulse response to be represented by: hi(n), i-th bandpass filter impulse response, is represented by a fixed lowpass window, w(n), modulated by complex exponential Where w(n) is a fixed lowpass window representing the

Implementations of Filter Banks
The signals s(m) and w(n-m) used in evaluation of the short-time Fourier transform

Frequency Domain Interpretation of the Short-Time Fourier Transform
VALUE SAMPLE LOG MAGNITUDE (dB) FREQUENCY Short-time Fourier transform using a long (500 points or 50 msec) Hamming window on a section of voiced speech

VALUE SAMPLE LOG MAGNITUDE (dB) FREQUENCY Short-time Fourier transform using a short (50 points or 5 msec) hamming window on a section of voiced speech

VALUE SAMPLE LOG MAGNITUDE (dB) FREQUENCY Short-time Fourier transform using a long (500 points or 50 msec) hamming window on a section of unvoiced speech

VALUE SAMPLE LOG MAGNITUDE (dB) FREQUENCY Short-time Fourier transform using a short (50 points or 5 msec) hamming window on a section of unvoiced speech

Linear Filter Interpretation of the STFT

FFT Implementation of a Uniform Filter Bank

Direct implementation of an arbitrary filter bank

Nonuniform FIR Filter Bank Implementations
Two arbitrary nonuniform filter-bank filter specifications consisting of eighter 3 bands (part a) or 7 bands (part b).

Tree Structure Realizations of Nonuniform Filter Banks

Practical Examples of Speech-Recognition Filter Banks
VALUE TIME IN SAMPLES MAGNITUDE (dB) FREQUENCY (kHz)

MAGNITUDE (dB) FREQUENCY (kHz) Window sequence, w(n), (part a), the individual filter response (part b), and the composite response (part c) of a Q = 15 channel, uniform filter bank, designed sing a 101-point Kaiser window smoothed lowpass window (after Dautrich et al).

VALUE TIME IN SAMPLES MAGNITUDE (dB) FREQUENCY (kHz)

MAGNITUDE (dB) FREQUENCY (kHz) Window sequence, w(n), (part a), the individual filter response (part b), and the composite response (part c) of a Q = 15 channel, uniform filter bank, designed sing a 101-point Kaiser window directly as the lowpass window (after Dautrich et al).

Generalizations of Filter-Bank Analyzer

The Real Cepstrum Goal: Deconvolve spectrum for multiplicative processes In practice, we use the “real” cepstrum: V(f) and U(f) manifest themselves at the low and high of the “quefrency” domain respectively. We can derive cepstral parameters directly from LP analysis: To obtain the relationship between cepstral and predictor coefficients, we can differentiate both sides is taken with respect to z-1:

The Real Cepstrum (cont.)
Which simplifies to: Note that the order of the cepstral coefficients need not be the same as the order of the LP model. Typically LP coefficients are used to generate cepstral coefficients.

The Signal Model (front-end)

A Typical Front-End

روش MFCC روش MFCC مبتني بر نحوه ادراک گوش انسان از اصوات مي باشد.
واحد شنيدار گوش انسان Mel مي باشد که به کمک رابطه زير بدست مي آيد:

مراحل روش MFCC مرحله 1: نگاشت سيگنال از حوزه زمان به حوزه فرکانس به کمک FFT زمان کوتاه. : سيگنال گفتارZ(n) : تابع پنجره مانند پنجره همينگW(n( WF= e-j2π/F m : 0,…,F – 1; : طول فريم گفتاري.F

مراحل روش MFCC مرحله 2: يافتن انرژي هر کانال بانک فيلتر.
تابع فيلترهاي بانک فيلتر است.

توزيع فيلتر مبتنی بر معيار مل

مراحل روش MFCC مرحله 4: فشرده سازي طيف و اعمال تبديل DCT جهت حصول به ضرايب MFCC در رابطه بالا L،...،0=n مرتبه ضرايب MFCC ميباشد.

The Mel Cepstrum Approach
Mel-scaling Windowing IDCT |FFT|2 Low-order coefficients Differentiator Cepstra Delta & Delta Delta Cepstra Time Signal Logarithm

Time-Frequency analysis
Short-term Fourier Transform Standard way of frequency analysis: decompose the incoming signal into the constituent frequency components. W(n): windowing function N: frame length p: step size Speech varies along time Quasi-stationary can be assumed

Critical band integration
Related to masking phenomenon: the threshold of a sinusoid is elevated when its frequency is close to the center frequency of a narrow-band noise Frequency components within a critical band are not resolved. Auditory system interprets the signals within a critical band as a whole The relation between the critical bandwidth with the frequency. Below and above 1 kHz

Bark scale Describes the frequency dependent bandwidth of a masking signal over a sinusoidal signal.

Feature orthogonalization
Spectral values in adjacent frequency channels are highly correlated The correlation results in a Gaussian model with lots of parameters: have to estimate all the elements of the covariance matrix Decorrelation is useful to improve the parameter estimation.

Cepstrum Computed as the inverse Fourier transform of the log magnitude of the Fourier transform of the signal The log magnitude is real and symmetric -> the transform is equivalent to the Discrete Cosine Transform. Approximately decorrelated

Principal Component Analysis (PCA)
Mathematical procedure that transforms a number of (possibly) correlated variables into a (smaller) number of uncorrelated variables called principal components (PC) Find an orthogonal basis such that the reconstruction error over the training set is minimized This turns out to be equivalent to diagonalize the sample autocovariance matrix Complete decorrelation Computes the principal dimensions of variability, but not necessarily provide the optimal discrimination among classes

PCA (Cont.) Algorithm Eigen values Eigen vectors Covariance matrix
Apply Transform Output = (R- dim vectors) Input= (N-dim vectors) Covariance matrix Transform matrix Eigen values Eigen vectors Algorithm

PCA (Cont.) PCA in speech recognition systems 

Linear discriminant Analysis
Find an orthogonal basis such that the ratio of the between-class variance and within-class variance is maximized This also turns to be a general eigenvalue-eigenvector problem Complete decorrelation Provide the optimal linear separability under quite restricted assumption

PCA vs. LDA

Spectral smoothing Formant information is crucial for recognition
Enhance and preserve the formant information: Truncating the number of cepstral coefficients Linear prediction: peak-hugging property

Temporal processing To capture the temporal features of the spectral envelop; to provide the robustness: Delta Feature: first and second order differences; regression Cepstral Mean Subtraction: For normalizing for channel effects and adjusting for spectral slope

RASTA (RelAtive SpecTral Analysis)
Filtering of the temporal trajectories of some function of each of the spectral values; to provide more reliable spectral features This is usually a bandpass filter, maintaining the linguistically important spectral envelope modulation (1-16Hz)

RASTA-PLP

Perceptual Linear Prediction
Goals: Apply greater weight to perceptually-important portions of the spectrum Avoid uniform weighting across the frequency band Algorithm: Compute the spectrum via a DFT Warp the spectrum along the Bark frequency scale Convolve the warped spectrum with the power spectrum of the simulated critical band masking curve and downsample (to typically 18 spectral samples) Preemphasize by the simulated equal-loudness curve Simulate the non-linear relationship between intensity and perceived loudness by performing a cubic-root amplitude compression Compute an LP model Claims: Improves speaker independent recognition performance Increases robustness to noise, variations in the channel, and microphons.

Spectral Analysis Models

Similar presentations

Presentation on theme: "Spectral Analysis Models"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Spectral Analysis Models

Similar presentations

Presentation on theme: "Spectral Analysis Models"— Presentation transcript:

Similar presentations

About project

Feedback