By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate on the ouptut x(m) to give the processed output that are more suitable for recognition Generalizations of Filter Bank Analyzer
By Sarita Jondhale 2 Generalizations of Filter Bank Analyzer Preprocessor Operations Signal preemphasis: higher frequencies are increased in amplitude Noise elimination Signal enhancement (to make the formant peaks more prominent) The purpose of pre processor is to make the speech signal as clean as possible
By Sarita Jondhale 3 Generalizations of Filter Bank Analyzer Postprocessor Operations Temporal smoothing of sequential filter bank output vectors. Frequency smoothing of individual filter bank output vectors. Normalization of each filter bank output vector Thresholding and/or quantization of the filter-bank outputs vectors Principal components analysis of the filter bank output vector. The purpose of postprocessor is to clean up the output so as to best represent the spectral information in the speech signal. sp
By Sarita Jondhale 4 Spectral Analysis Two methods: The Filter Bank spectrum The Linear Predictive coding (LPC)
By Sarita Jondhale 5 Linear Predictive Coding Estimating the parameters of the current speech sample by using the parameter values from linear combinations of past speech samples.
By Sarita Jondhale 6 Linear Predictive Coding Model For Speech Recognition The reasons why LPC has been widely used: For the quasi steady state voiced regions of speech LPC provides a good model of the speech recognition During unvoiced and transient regions of speech, the LPC model is less effective but it still provides an acceptable useful model for speech recognition purpose The method of LPC is mathematically precise and is simple and straightforward to implement in either in software or hardware sp
By Sarita Jondhale 7 Linear Predictive Coding Model For Speech Recognition The computation involved in LPC processing is considerably less than that required for implementation of the bank-of-filters model The performance of speech recognizers, based on LPC front ends is better than that of recognizer based on filter- bank front ends.
By Sarita Jondhale 8 Linear Predictive Coding (or “LPC”) is a method of predicting a sample of a speech signal based on several previous samples. We can use the LPC coefficients to separate a speech signal into two parts: the transfer function (which contains the vocal quality) and the excitation (which contains the pitch and the sound).
By Sarita Jondhale 9 We can predict that the nth sample in a sequence of speech samples is represented by the weighted sum of the p previous samples:
By Sarita Jondhale 10 The LPC Model The speech sample at time n can be approximated as a linear combination of the past p speech samples,
By Sarita Jondhale 11 The number of samples (p) is referred to as the “order” of the LPC. As p approaches infinity, we should be able to predict the nth sample exactly. However, p is usually on the order of ten to twenty, where it can provide an accurate enough representation with a limited cost of computation. The weights on the previous samples (ak) are chosen in order to minimize the squared error between the real sample and its predicted value. Thus, we want the error signal e(n), which is sometimes referred to as the LPC residual, to be as small as possible
By Sarita Jondhale 12 The LPC Model The speech sample at time n can be approximated as a linear combination of the past p speech samples,
By Sarita Jondhale 13 The LPC Model 1
By Sarita Jondhale 14 The LPC Model The interpretation of equation 1 Excitation source The system is all pole system H(z)=1/A(z) Excitation source can be either a quasiperiodic train pulses (voiced) or a random noise source (unvoiced sounds). Speech signal
By Sarita Jondhale 15 The LPC Model Of the speech being produced Chooses either a quasiperiodic train of pulses as the excitation for voiced sounds or a random noise sequence for unvoiced sounds H(z) controls
By Sarita Jondhale 16 The LPC Model The parameters of this model Voiced/unvoiced classification Pitch period for voiced sounds The gain parameter The coefficients of the digital filter {a k }, These parameters all very slowly with time sp
By Sarita Jondhale 17
By Sarita Jondhale 18 Error signal It is defined as difference between predicted sample and actual signal S[n] is actual signal S[n] is predicted signal
By Sarita Jondhale 19 The spectrum of the error signal e(n) will have a different structure depending on whether the sound it comes from is voiced or unvoiced. Voiced sounds are produced by vibrations of the vocal cords. Their spectrum is periodic with some fundamental frequency (which corresponds to the pitch). Examples of voiced sounds include all of the vowels. Unvoiced signals, however, do not have a fundamental frequency or a harmonic structure. SN
By Sarita Jondhale 20 LPC Analysis Equations Combination of past speech samples
By Sarita Jondhale 21 LPC Analysis Equations The basic problem is to determine the set of predictor coefficients, {a k }, directly from the speech signal so that the spectral properties of the digital filter match those of the speech waveform within the analysis window. Since the spectral characteristics of speech vary over time, the predictor coefficients at a given time, n, must be estimated from a short segment of the speech signal occurring around time n The need is to find a set of predictor coefficients that minimize the mean squared prediction error over a short segment of speech waveform.
By Sarita Jondhale 22 LPC Analysis Equations To calculate prediction coefficients differentiate E n w.r.t. each a k and set the result to zero
By Sarita Jondhale 23 LPC Analysis Equations SN
By Sarita Jondhale 24 LPC Analysis Equations
By Sarita Jondhale 25 LPC Analysis Equations Two methods of defining the range of speech (m) The autocorrelation method The covariance method
By Sarita Jondhale 26 The Autocorrelation Method The speech signal s(m+n) is multiplied by a finite window, w(m) Which is zero outside the range 0≤m ≤N-1 The purpose of window in the above equation is to taper the signal near m=0 and near m=N-1 so as to minimize the errors at section boundaries
By Sarita Jondhale 27 The Autocorrelation Method Upper panel shows the running speech waveform s(m) Middle panel shows the weighted section of speech Bottom panel shows the resulting error signal e n (m), based on optimum selection of predictor parameters
By Sarita Jondhale 28 The Autocorrelation Method
By Sarita Jondhale 29 The Autocorrelation Method
By Sarita Jondhale 30 The Autocorrelation Method For m<0, the prediction error i.e e n (m)=0 since s n (m)=0 for all m<0 For m>N-1+p there is no prediction error because s n (m)=0 for all m>N-1
By Sarita Jondhale 31 The Autocorrelation Method The purpose of window in the above equation is to taper the signal near m=0 and near m=N-1 so as to minimize the errors at section boundaries
By Sarita Jondhale 32 The Autocorrelation Method
By Sarita Jondhale 33 The Autocorrelation Method sp
By Sarita Jondhale 34
By Sarita Jondhale 35 The Autocorrelation Method The p×p matrix of autocorrelation values is a Toeplitz matrix (symmetric with all diagonal elements equal) Can be solved by several procedures like Durbin algorithm
By Sarita Jondhale 36 Covariance Method Instead of using weighting function or window for defining s n (m) we can fix the interval over which the mean-squared error is computed to the range 0 m N-1
By Sarita Jondhale 37 Covariance Method The matrix form of the LPC analysis equations becomes: The resulting covariance matrix is symmetric ( since n ( i, k ) = n ( k, i ) ) but not Toeplitz, can be solved by Cholesky decomposition method. sp
By Sarita Jondhale 38 Examples of LPC Analysis The figure shows the effect of LPC prediction order, p, on the prediction error En, for both voiced and unvoiced speech. For small values of p (1-4) a sharp decrease in prediction error As p increases error decreases much more slowly The prediction error for unvoiced speech for a given value of p, is significantly higher than for voiced speech. i.e. the unvoiced speech is less linearly predictable than voiced speech.
By Sarita Jondhale 39 As p increases more of the detailed properties of the signal spectrum are preserved in the LPC spectrum Beyond some values of p, the details of the signal spectrum that are preserved are generally irrelevant Generally the values of p on the order of 8-10 are reasonable for most speech recognition applications.
By Sarita Jondhale 40 LPC Processor for Speech Recognition
By Sarita Jondhale 41 LPC Processor for Speech Recognition 1.Preemphasis: the digital system (First order FIR filter) used in the preemphasizer is either fixed or slowly adaptive to average transmission conditions, noise background. The output is related to the input by sp
By Sarita Jondhale 42 LPC Processor for Speech Recognition 1.Preemphasis: the digital system (First order FIR filter) used in the preemphasizer is either fixed or slowly adaptive to average transmission conditions, noise background. The output is related to the input by sp
By Sarita Jondhale 43 LPC Processor for Speech Recognition 2.Frame Blocking: In this step the preemphasized speech signal is blocked into frames of N samples, with adjacent frames being separated by M samples.
By Sarita Jondhale 44 Typical LPC Analysis Parameters N: number of samples in analysis frame. M: number of samples shift between analysis frames. p: LPC analysis order. Q: dimension of LPC derived cepstral vector. K: number of frames over which cepstral time derivatives are computed.
By Sarita Jondhale 45 LPC Processor for Speech Recognition In the above figure M=(1/3)N First frame consist of first N speech samples. The second frame begins M samples after the first frame and overlaps it by N-M samples Similarly third frame begins 2M samples after the first frame (or M samples after the first samples) and overlaps it by N-2M samples This process continues until all the speech is accounted for within one or more frames.
By Sarita Jondhale 46 LPC Processor for Speech Recognition
By Sarita Jondhale 47 LPC Processor for Speech Recognition 3.Windowing: the next step in the processing is to window each individual frame so as to minimize the signal discontinuities at the beginning and end of each frame (same as short time spectrum in frequency domain) i.e. we use the window to taper the signal to zero at the beginning and end of each frame
By Sarita Jondhale 48 LPC Processor for Speech Recognition A “typical” window used for the autocorrelation method of LPC is the Hamming window
By Sarita Jondhale 49 LPC Processor for Speech Recognition 4.Autocorrelation Analysis: Each frame of windowed signal is next autocorrelated to give
By Sarita Jondhale 50 LPC Processor for Speech Recognition 5.LPC Analysis: the next step is the LPC analysis, which converts each frame of P+1 autocorrelations into an “LPC parameter set”. The set might be The LPC coefficients The reflection (or PARCOR) coefficients The log area ratio coefficients The cepstral coefficients Or any desired transformations of the above sets The method for converting from autocorrelation coefficients to an LPC parameter set is known as Durbin’s method
By Sarita Jondhale 51
By Sarita Jondhale 52 LPC Processor for Speech Recognition 6.LPC Parameter Conversion to Cepstral Coefficients: A very important parameter set which can be derived directly from the LPC coefficient set, is the LPC cepstral coefficients, c(m) The recursion used is
By Sarita Jondhale 53 LPC Processor for Speech Recognition The cepstral coefficients are the coefficients of the FT representation of the log magnitude spectrum. The cepstral coefficients are more robust reliable feature set for speech recognition than the LPC coefficients, the PARCOR coefficients, or the log area ratio coefficients.
By Sarita Jondhale 54 LPC Processor for Speech Recognition 7. Parameter Weighting: because of the sensitivity of the low- order cepstral coefficients to overall spectral slope and sensitivity of the high-order cepstral coefficients to noise, it is necessary to weight the cepstral coefficients by a tapered window to minimize these sensitivities.
By Sarita Jondhale 55 LPC Processor for Speech Recognition By differentiating the Fourier representation of log magnitude spectrum Any fixed spectral slope in the log magnitude spectral becomes a constant Any prominent spectral peak in the log magnitude spectral (formants) is well preserved
By Sarita Jondhale 56 LPC Processor for Speech Recognition
By Sarita Jondhale Temporal Cepstral Derivative: the Cepstral representation of the speech spectrum provides a good representation of the local spectral properties of the signal for the given frame Better representation can be obtained by including the information about temporal Cepstral derivative.
By Sarita Jondhale 58 LPC Processor for Speech Recognition
By Sarita Jondhale 59 LPC Processor for Speech Recognition
By Sarita Jondhale 60 Typical LPC Analysis Parameters N: number of samples in analysis frame. M: number of samples shift between analysis frames. p: LPC analysis order. Q: dimension of LPC derived cepstral vector. K: number of frames over which cepstral time derivatives are computed.