Speech Processing Final Project

Slides:



Advertisements
Similar presentations
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Advertisements

Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
Speech Recognition Chapter 3
A 12-WEEK PROJECT IN Speech Coding and Recognition by Fu-Tien Hsiao and Vedrana Andersen.
Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
Complete Discrete Time Model Complete model covers periodic, noise and impulsive inputs. For periodic input 1) R(z): Radiation impedance. It has been shown.
Jacob Zurasky ECE5525 Fall  Goals ◦ Determine if the principles of speech processing relate to snoring sounds. ◦ Use homomorphic filtering techniques.
Speech and Audio Processing and Recognition
Pole Zero Speech Models Speech is nonstationary. It can approximately be considered stationary over short intervals (20-40 ms). Over thisinterval the source.
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,
A PRESENTATION BY SHAMALEE DESHPANDE
Representing Acoustic Information
Speech Signal Processing I Edmilson Morais and Prof. Greg. Dogil October, 25, 2001.
EE513 Audio Signals and Systems Digital Signal Processing (Systems) Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
1 Chapter 8 The Discrete Fourier Transform 2 Introduction  In Chapters 2 and 3 we discussed the representation of sequences and LTI systems in terms.
Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Digital Systems: Hardware Organization and Design
Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.
Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.
1 Linear Prediction. 2 Linear Prediction (Introduction) : The object of linear prediction is to estimate the output sequence from a linear combination.
1 Linear Prediction. Outline Windowing LPC Introduction to Vocoders Excitation modeling  Pitch Detection.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Zhongguo Liu_Biomedical Engineering_Shandong Univ. Chapter 8 The Discrete Fourier Transform Zhongguo Liu Biomedical Engineering School of Control.
Structure of Spoken Language
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
Chapter 6 Linear Predictive Coding (LPC) of Speech Signals 6.1 Basic Concepts of LPC 6.2 Auto-Correlated Solution of LPC 6.3 Covariance Solution of LPC.
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
A Comparison Of Speech Coding With Linear Predictive Coding (LPC) And Code-Excited Linear Predictor Coding (CELP) By: Kendall Khodra Instructor: Dr. Kepuska.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,
More On Linear Predictive Analysis
SPEECH CODING Maryam Zebarjad Alessandro Chiumento Supervisor : Sylwester Szczpaniak.
Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)
Lecture 12: Parametric Signal Modeling XILIANG LUO 2014/11 1.
Speech Processing Homomorphic Signal Processing. 5 February 2016Veton Këpuska2 Outline  Principles of Homomorphic Signal Processing  Details of Homomorphic.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
Copyright ©2010, ©1999, ©1989 by Pearson Education, Inc. All rights reserved. Discrete-Time Signal Processing, Third Edition Alan V. Oppenheim Ronald W.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,
Linear Prediction.
1 Chapter 8 The Discrete Fourier Transform (cont.)
 Carrier signal is strong and stable sinusoidal signal x(t) = A cos(  c t +  )  Carrier transports information (audio, video, text, ) across.
DIGITAL SIGNAL PROCESSING ELECTRONICS
Chapter 5 Homomorphic Processing(1)
Figure 11.1 Linear system model for a signal s[n].
3.1 Introduction Why do we need also a frequency domain analysis (also we need time domain convolution):- 1) Sinusoidal and exponential signals occur.
Digital Communications Chapter 13. Source Coding
Sampling rate conversion by a rational factor
EE Audio Signals and Systems
EE513 Audio Signals and Systems
Linear Prediction.
Digital Systems: Hardware Organization and Design
Microcomputer Systems 2
Linear Predictive Coding Methods
Chapter 8 The Discrete Fourier Transform
Z TRANSFORM AND DFT Z Transform
ESTIMATED INVERSE SYSTEM
Digital Systems: Hardware Organization and Design
Richard M. Stern demo January 12, 2009
Digital Systems: Hardware Organization and Design
Linear Prediction.
6. Time and Frequency Characterization of Signals and Systems
Homomorphic Speech Processing
EE Audio Signals and Systems
Signals and Systems Revision Lecture 1
DIGITAL CONTROL SYSTEM WEEK 3 NUMERICAL APPROXIMATION
Auditory Morphing Weyni Clacken
Presentation transcript:

Speech Processing Final Project Estimation of pole and zero model in voiced speech by Rafael A Alvarez

Introduction This presentation shows the results of speech estimation using an pole-zero model. The pole-zero model was derived using the Linear prediction coding and homomorphic filtering methods. When both methods are combined the combined method is called the homomorphic prediction method. Several signals will be analyzed and the problems encounter in each case will be presented.

Objectives This project attempts to address the following areas: Modeling of speech using a pole-zero model Modeling of speech using Linear prediction and homomorphic filtering methods. Results of estimating poles using Hommorphic Prediction. Results of Zeros estimation using inverse-filtering and Homomorphic Prediction. Problems estimating the model. Other possible applications .

Speech modeling The complete discrete-time speech production model. Speech source Gain Mixer Vocal track and lips radiation

Speech modeling Periodic or voiced speech can be modeled with: Speech signal Gain Glottal flow Vocal track (poles and zeros) Radiation impedance

Speech modeling Complete transfer function of speech signal for voiced sound Radiation impedance Vocal tract zeros (min and max phase) Glottal flow Vocal tract poles

Linear Prediction Analysis Linear prediction coding approximates the system using an an all pole model. The zero produced at the radiation of the lips is approximated by long set of poles (Not efficient) The resulting Transfer function:

Linear prediction analysis Linear combination of past values Time domain representation When train of unit samples ug[n] = 0 then the above equation results in:

Linear prediction analysis Two implementations of this analysis are: Covariance method Considers the value outside the window Autocorrelation method Considers the values outside the window to be zero. Uses a window like the hamming window

Linear Prediction analysis Results of LPC using autocorrelation methods

Homomorphic filtering Base on the concept of superposition Can easy separate linearly combine systems. Generalized superposition Can separate non-linearly combine systems. The following properties must apply Canonical formulation of homomorphic system + + + + L y[n] x[n] : . . . .

Homomorphic filtering Homomorphic system for convolution Applying the results from before on systems resulting from convolution gives: . . + + log + + + + . . exp

Homomorphic filtering Algorithm to combined homomorphic filtering and LPC w[n] x s[n] cepstrum liftering inv-cepstrum LPC With the liftering operation (filtering) convolutionaly combined signals can be separated in the quefrency domain.

Homomorphic filtering Homomorphic filtering combined with LPC, Homomorphic prediction

Homomorphic prediction Combining the previous we can derived a pole-zero model estimation method. Remembering from before the zero-pole model of speech was given by Remember that multiplication in the frequency domain transform into convolution in the time domain.

Homomorphic prediction From the previous equation we have. S(z) = original signal P(z) = glottal flow train B(z)= vocal tract zeros A(z) = vocal track poles * The system zeros and glottal flow poles can be obtained by filtering the signal with the inverse vocal track poles. (Inverse filtering)

Homomorphic prediction An algorithm to estimate poles and zeros can be derived First we obtain an approximation of our vocal tract poles as presented before, w[n] x s[n] LPC w[n] must window an area free of zeros and glottal flow poles The resulting impulse response should represent all the poles in the system. Then this result can be used to inverse filter the original signal.

Homomorphic prediction Zero and glottal flow deconvolution. Separate the glottal flow from the zeros. Separate min and max phase zeros. B(z)P(z) S(z) Inverse- filtering cepstrum P(z) inverse-cepstrum High liftering Bmax(z) B(z) Inverse-cepstrum High liftering Low liftering Bmin(z) Inverse-cepstrum Low liftering

Homomorphic prediction Example of “quefrency” domain signal of a voiced signal.

Results Signal #1 First we examine a simple case of a synthesized signal Pitch period

Homomorphic prediction Zero free area of original signal, pitch-synchronized method L = glottal width (19) M= number of vocal tract zeros (2) I = zero free area of vocal tract (2 poles) L M I

Results Frequency response of the estimated poles of the vocal tract

Results Inverse-Filtered signal After filtering only the glottal flow train and zeros remain

Results Cepstrum of inverse-filtered signal By liftering(filtering) the high and low part of the cepstrum the glottal flow can be separated from the zeros.

Results Approximated glottal flow and zeros

Results Signal #2 Second a more realistic signal.

Results Signal #2. Problems: How many zeros? What is the length of the glottal flow? How many poles?

Results Signal #2 Why are the results so different?

Results Signal #2 Signal #1 The autocorrelation function on Signal #2 shows aliasing. The method will not work if the signal wasn’t sample at a high enough frequency.

Results Signal #3. Voice recorded at 20Khz Area extracted for processing

Results Problems: How many zeros? Extract area free of zeros Problems: How many zeros? 6 zeros What is the length of the glottal flow? 38 since its 20khz How many poles? 10 poles

Results Spectrum of zero-free area, all-pole approximation and original signal

Results Resulting inverse-filtered signal Area should be flat if all the poles where approximated accurately. Compare with other areas it seem flat.

Possible enhancements Implement a iterative algorithm that optimizes the results by combining different values for the different variables. Length of glottal pulse number of zeros number of poles Try to different approaches using the HF and LPC tools to get a better approximation. Use homomorphic filtering to remove the zeros and/or glottal flow first. Use pitch estimation algorithm to better establish the pitch period Established a better relationship between the zeros and poles in the quefrency domain.

Problems Problems in the method: Requires a good approximation of the area free of zeros Requires a good approximation of the the number of zeros, poles and length of glottal flow Requires a good approximation of the all pole approximation Requires a high sampling rate of the original signal May not work for high pitch voice

Applicatoins Possible applications include: Speech synthesis : recreate human voice Speech processing: machine human interaction Speaker recognition: extraction of key features of the speaker