Speech Processing Final Project

Speech Processing Final Project
Estimation of pole and zero model in voiced speech by Rafael A Alvarez

Introduction This presentation shows the results of speech estimation using an pole-zero model. The pole-zero model was derived using the Linear prediction coding and homomorphic filtering methods. When both methods are combined the combined method is called the homomorphic prediction method. Several signals will be analyzed and the problems encounter in each case will be presented.

Objectives This project attempts to address the following areas:
Modeling of speech using a pole-zero model Modeling of speech using Linear prediction and homomorphic filtering methods. Results of estimating poles using Hommorphic Prediction. Results of Zeros estimation using inverse-filtering and Homomorphic Prediction. Problems estimating the model. Other possible applications .

Speech modeling The complete discrete-time speech production model.
Speech source Gain Mixer Vocal track and lips radiation

Speech modeling Periodic or voiced speech can be modeled with:
Speech signal Gain Glottal flow Vocal track (poles and zeros) Radiation impedance

Speech modeling Complete transfer function of speech signal for voiced sound Radiation impedance Vocal tract zeros (min and max phase) Glottal flow Vocal tract poles

Linear Prediction Analysis
Linear prediction coding approximates the system using an an all pole model. The zero produced at the radiation of the lips is approximated by long set of poles (Not efficient) The resulting Transfer function:

Linear prediction analysis
Linear combination of past values Time domain representation When train of unit samples ug[n] = 0 then the above equation results in:

Linear prediction analysis
Two implementations of this analysis are: Covariance method Considers the value outside the window Autocorrelation method Considers the values outside the window to be zero. Uses a window like the hamming window

Linear Prediction analysis
Results of LPC using autocorrelation methods

Homomorphic filtering
Base on the concept of superposition Can easy separate linearly combine systems. Generalized superposition Can separate non-linearly combine systems. The following properties must apply Canonical formulation of homomorphic system + + + + L y[n] x[n] : . . . .

Homomorphic system for convolution Applying the results from before on systems resulting from convolution gives: . . + + log + + + + . . exp

Algorithm to combined homomorphic filtering and LPC w[n] x s[n] cepstrum liftering inv-cepstrum LPC With the liftering operation (filtering) convolutionaly combined signals can be separated in the quefrency domain.

Homomorphic filtering combined with LPC, Homomorphic prediction

Homomorphic prediction
Combining the previous we can derived a pole-zero model estimation method. Remembering from before the zero-pole model of speech was given by Remember that multiplication in the frequency domain transform into convolution in the time domain.

From the previous equation we have. S(z) = original signal P(z) = glottal flow train B(z)= vocal tract zeros A(z) = vocal track poles * The system zeros and glottal flow poles can be obtained by filtering the signal with the inverse vocal track poles. (Inverse filtering)

An algorithm to estimate poles and zeros can be derived First we obtain an approximation of our vocal tract poles as presented before, w[n] x s[n] LPC w[n] must window an area free of zeros and glottal flow poles The resulting impulse response should represent all the poles in the system. Then this result can be used to inverse filter the original signal.

Zero and glottal flow deconvolution. Separate the glottal flow from the zeros. Separate min and max phase zeros. B(z)P(z) S(z) Inverse- filtering cepstrum P(z) inverse-cepstrum High liftering Bmax(z) B(z) Inverse-cepstrum High liftering Low liftering Bmin(z) Inverse-cepstrum Low liftering

Example of “quefrency” domain signal of a voiced signal.

Results Signal #1 First we examine a simple case of a synthesized signal Pitch period

Zero free area of original signal, pitch-synchronized method L = glottal width (19) M= number of vocal tract zeros (2) I = zero free area of vocal tract (2 poles) L M I

Results Frequency response of the estimated poles of the vocal tract

Results Inverse-Filtered signal
After filtering only the glottal flow train and zeros remain

Results Cepstrum of inverse-filtered signal
By liftering(filtering) the high and low part of the cepstrum the glottal flow can be separated from the zeros.

Results Approximated glottal flow and zeros

Results Signal #2 Second a more realistic signal.

Results Signal #2. Problems: How many zeros?
What is the length of the glottal flow? How many poles?

Results Signal #2 Why are the results so different?

Results Signal #2 Signal #1
The autocorrelation function on Signal #2 shows aliasing. The method will not work if the signal wasn’t sample at a high enough frequency.

Results Signal #3. Voice recorded at 20Khz
Area extracted for processing

Results Problems: How many zeros?
Extract area free of zeros Problems: How many zeros? 6 zeros What is the length of the glottal flow? 38 since its 20khz How many poles? 10 poles

Results Spectrum of zero-free area, all-pole approximation and original signal

Results Resulting inverse-filtered signal
Area should be flat if all the poles where approximated accurately. Compare with other areas it seem flat.

Possible enhancements
Implement a iterative algorithm that optimizes the results by combining different values for the different variables. Length of glottal pulse number of zeros number of poles Try to different approaches using the HF and LPC tools to get a better approximation. Use homomorphic filtering to remove the zeros and/or glottal flow first. Use pitch estimation algorithm to better establish the pitch period Established a better relationship between the zeros and poles in the quefrency domain.

Problems Problems in the method:
Requires a good approximation of the area free of zeros Requires a good approximation of the the number of zeros, poles and length of glottal flow Requires a good approximation of the all pole approximation Requires a high sampling rate of the original signal May not work for high pitch voice

Applicatoins Possible applications include:
Speech synthesis : recreate human voice Speech processing: machine human interaction Speaker recognition: extraction of key features of the speaker

Speech Processing Final Project

Similar presentations

Presentation on theme: "Speech Processing Final Project"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Speech Processing Final Project

Similar presentations

Presentation on theme: "Speech Processing Final Project"— Presentation transcript:

Similar presentations

About project

Feedback