Speech Recognition Chapter 3

Slides:



Advertisements
Similar presentations
ECE 8443 – Pattern Recognition EE 3512 – Signals: Continuous and Discrete Objectives: Response to a Sinusoidal Input Frequency Analysis of an RC Circuit.
Advertisements

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Liner Predictive Pitch Synchronization Voiced speech detection, analysis and synthesis Jim Bryan Florida Institute of Technology ECE5525 Final Project.
CS 551/651: Structure of Spoken Language Lecture 11: Overview of Sound Perception, Part II John-Paul Hosom Fall 2010.
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
A 12-WEEK PROJECT IN Speech Coding and Recognition by Fu-Tien Hsiao and Vedrana Andersen.
Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.
Complete Discrete Time Model Complete model covers periodic, noise and impulsive inputs. For periodic input 1) R(z): Radiation impedance. It has been shown.
Speech and Audio Processing and Recognition
AGC DSP AGC DSP Professor A G Constantinides©1 A Prediction Problem Problem: Given a sample set of a stationary processes to predict the value of the process.
Unit 9 IIR Filter Design 1. Introduction The ideal filter Constant gain of at least unity in the pass band Constant gain of zero in the stop band The.
Speech Coding Nicola Orio Dipartimento di Ingegneria dell’Informazione IV Scuola estiva AISV, 8-12 settembre 2008.
Speech & Audio Processing
1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.
Overview of Adaptive Multi-Rate Narrow Band (AMR-NB) Speech Codec
Analysis & Synthesis The Vocoder and its related technology.
A PRESENTATION BY SHAMALEE DESHPANDE
Warped Linear Prediction Concept: Warp the spectrum to emulate human perception; then perform linear prediction on the result Approaches to warp the spectrum:
Representing Acoustic Information
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
Topics covered in this chapter
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Digital Systems: Hardware Organization and Design
Linear Prediction Coding of Speech Signal Jun-Won Suh.
Speech Coding Using LPC. What is Speech Coding  Speech coding is the procedure of transforming speech signal into more compact form for Transmission.
Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.
T – Biomedical Signal Processing Chapters
1 Linear Prediction. 2 Linear Prediction (Introduction) : The object of linear prediction is to estimate the output sequence from a linear combination.
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
1 Linear Prediction. Outline Windowing LPC Introduction to Vocoders Excitation modeling  Pitch Detection.
♥♥♥♥ 1. Intro. 2. VTS Var.. 3. Method 4. Results 5. Concl. ♠♠ ◄◄ ►► 1/181. Intro.2. VTS Var..3. Method4. Results5. Concl ♠♠◄◄►► IIT Bombay NCC 2011 : 17.
Basics of Neural Networks Neural Network Topologies.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Linear Predictive Analysis 主講人:虞台文. Contents Introduction Basic Principles of Linear Predictive Analysis The Autocorrelation Method The Covariance Method.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
ITU-T G.729 EE8873 Rungsun Munkong March 22, 2004.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
More On Linear Predictive Analysis
Present document contains informations proprietary to France Telecom. Accepting this document means for its recipient he or she recognizes the confidential.
Autoregressive (AR) Spectral Estimation
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Chapter 20 Speech Encoding by Parameters 20.1 Linear Predictive Coding (LPC) 20.2 Linear Predictive Vocoder 20.3 Code Excited Linear Prediction (CELP)
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,
Linear Prediction.
Chapter 5. Transform Analysis of LTI Systems Section
Adv DSP Spring-2015 Lecture#11 Spectrum Estimation Parametric Methods.
PATTERN COMPARISON TECHNIQUES
Vocoders.
Cepstrum and MFCC Cepstrum MFCC Speech processing.
Linear Prediction.
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Linear Predictive Coding Methods
The Vocoder and its related technology
EE513 Audio Signals and Systems
Digital Systems: Hardware Organization and Design
Linear Prediction.
Chapter 7 Finite Impulse Response(FIR) Filter Design
Chapter 8 The Discrete Fourier Transform
Chapter 8 The Discrete Fourier Transform
INTRODUCTION TO THE SHORT-TIME FOURIER TRANSFORM (STFT)
Chapter 7 Finite Impulse Response(FIR) Filter Design
Speech Processing Final Project
Presented by Chen-Wei Liu
Speech Signal Representations
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

Speech Recognition Chapter 3

Speech Front-Ends Linear Prediction Analysis Linear-Prediction Based Processing Cepstral Analysis Auditory signal Processing

Linear Prediction Analysis Introduction Linear Prediction Model Linear Prediction Coefficients Computation Linear Prediction for Automatic Speech Recognition Linear Prediction in Speech Processing How good is the LP Model.

Signal Processing Front End Convert the speech waveform in some type of parametric representation. sk Filterbank Signal Processing Front End Linear Prediction Front End Linear Prediction Coefficients O=o(1)o(2)..o(T)

Introduction In short intervals, it provides a good model of the speech. Mathematical precise and simple. Easy to implement in software or hardware. Works fine for recognition applications. It also has applications in formant and pitch estimation, speech coding and synthesis.

Linear Prediction Model Basic idea: are called LP(Linear Prediction) coefficients. By including the excitation signal, we obtain: where is the normalised excitation and is the gain of the excitation.

In the z-domain (secc. 1.1.4, pp. 15, Deller) leading to the transfer function (Fig. 3.27)

LP model retains the spectral magnitude, but it has a minimum phase (Sec. 1.1.7, Deller) feature. However, in practice, phase is not very important for speech perception. Observation: H(z) models the glottal filter(G(z)) and the lips radiation(R(z).

Linear Prediction Coefficients Computation Introduction Methogologies

Linear Prediction Coefficients Computation LP coefficients can be obtained by solving the next equation system (Secc. 3.3.2, Prove ):

Methodologies Autocorrelation Method Covariance Method Not commonly used in Speech Recognition

Autocorrelation Method Assumptions: Each frame is independent (Fig. 3.29 ). Solution (Juang, secc. 3.3.3 pp105-106): where (2) M es el número de parametros LPC. These equations are know as Yule-Walker equations.

Using matrix notation: or

Features Symetric. Diagonal elements are the same. Toeplitz Matriz

This matrix is known as Toeplitz This matrix is known as Toeplitz. A linear system with this matrix can be solved very efficient. Examples (Fig. 3.32 and 3.33 ) Example (Fig. 3.34 ) Example (Fig. 3.35 ) Example (Fig. 3.36 )

Linear Prediction for Automatic Speech Recogition To minimise signal discontinuity Flats the spectrum equation (2) usually M=8 Incorporate signal dynamics to minimise noise sensitivity To Cepstral Coefficients Durbin Algorithm

Preemphasis The transfer function of the glottis can be modelled as follows: The radiation effect can be modelled as follows:

Hence, to obtain the transfer function of the vocal tract the other pole must be cancelled as follows:.

Preemphasis sould be done only for sonorant sounds. This process can be automated as follows. where is the autocorrelation function.

N samples size frame, M samples frame shift

Minimize signal discontinuities at the edges of the frames. A typical window is the Hamming window.

LPC Analysis Converts the autocorrelations coefficients into LPC “parameter set”. LPC Parameter set LPC coefficients Reflection (PARCOR) coefficients log area ratio coefficients The formal method to obtain the LPC parameter set is know as Durbin’s method.

Durbin’s method

LPC (Typical values)

LPC Parameter Conversion Conversion to Cepstral Coeficients. Robust feature set for speech recognition. Algorithm:

Parameter weighting low-order cepstral coefficents are highly sensibles to noise

Temporal Cepstral Derivative First or second order derivatives is enough. It can be aproximated as follows:

Given

Hamming Windowed Large prediction errors since speech is predicted form previous samples arbitray set to zero.

Large prediction errors since speech is predicted form previous samples arbitray set to zero.

Unvoiced signals are not position sensitive. It does not show special effect at the edges.

Observe the “whitening” phenomena at the error spectrum.

Observe the “whitening phenomena at the error specturm

Observe the error wave periodicity behaviour taken as bases for the Pitch Estimators.

Observe that a sharp decrease in the prediction error is obtain for small M value (M=1...4). Observe that unvoiced signal has higher RMS error.

Observe the all-pole model ability to match the spectrum.

Linear Prediction in Speech Processing LPC for Vocal Tract Shape Estimation LPC for Pitch Detection LPC for Formant prediction

LPC for Vocal Tract Shape Estimation To minimise signal discontinuity Free of glottis and radiation effects Vocal Tract Shape Estimation Parameter Calculation to minimise noise sensitivity To Cepstral Coefficients

Parameter Calculation Durbin’s Method (As in Speech Recognition) In case, this method is used, first the autocorrelation analysis should be performed. Lattice Filter

Lattice Filter The reflection coefficients are obtain directly form the signal, avoiding the autocorrelation analysis. Methods: Itakura-Saito (Parcor) Burg New forms Advantage: Easier to implement in Hardware Disadvantage: needs around 5 times more calculation.

Itakura-Saito (PARCOR) where Accumulates over time (n). It can be shown that the PARCOR coefficients, obtain for the Itakura-Saito method are exactly the same as the reflection coefficients obtained by the Levison Durbin algorithm. Example

Burg where Example

Example Itakura-Saito Burg

New Forms Stroback, New forms of Levinson and Schur algorithms, IEEE Signal Processing Magazine, pp. 12-36, 1991.

Vocal Tract Shape Estimation From: We obtain Therefore, by setting the the lips area to an arbitrary value we can obtain the vocal tract configuration relative to the initial condition. This technique as been succesfully used to train deaf persons.

LPC for Pitch Detection Speech Sampled at 10KHz Inverse Filering A(z) LPF 800Hz DownSampler 5:1 Peak finding Autocorrelation LPC Analysis V/U decision or Pitch

LPC for Formant Detection Sampled Speech Formants LPC Spectrum Emphasis Peaks (second derivative) Peak finding LPC Analysis

LPC Spectrum LP assumes that the vocal tract system can be modelled with an all-pole system: The spectrum can be obtain by In order to emphasis formant peaks we can set

In order to increase the spectral resolution we pad with zeros: Therefore Spectrum (DTFT) Spectrum (DFT) In order to increase the spectral resolution we pad with zeros: In order to use an FFT algorithm

Caclulate the Spectral magnitude(DFT) Invert the Spectral magnitude(DFT) This spectrum is called the LPC Spectrum.

How good is the LP Model As shown by the physiological analysis of the vocal tract the speech model is as follows: However, it can be shown ( ), that LP Model is good for estimating the magnitude of pole-zero system.

Prove According to lema 1 ( ) and lema 2 ( ) , can be written as follows: The estimates are calculated such that it correspond to the of this model. All pass component

Since hence therefore, if the estimators, are exacts, then at least we obtain a model with a correct magnitude.

Lema 1 Lema 1(System Decomposition): Any causal ration system can be descomponed as (prove ): Minimal phase component

Prove For two poles and two zeros: Lets define: Re-arranging this equation:

With the knowledge that: Hence:

Therefore: End of prove.

Lema 2 Lema 2: Minimum phase component can be expresed as an all-pole system: in theory goes to infinity, in practice is limited.

Linear Prediction Based Procesing Critics to the Linear Prediction Model Perceptual Linear Prediction (PLP) LP Cepstra

Critics to the Linear Prediction Model The LP spectrum approximate the speech spectrum equally well at all frequencies of the analysis band. This property is inconsistent with the human hearing.

Precepual Linear Prediction (PLP) Critical Band Spectral Analysis Equal Loudness Pre-emphasis Intensity Loudness IDFT Yule-Walker Equations Solutions

Critical Band Analysis Speech Signal Frame Critical Band Spectral Resolution Short-Term Spectra Windowing DFT (20 ms) (200 samples 56 zeros for padding for Ts=10KHz)c DFT (20 ms Hamming Window

Critical-Band Spectral Resolution Frequency Warping (Hertz -> Barks) Convolution and Downsampling filter-bank masking curve approximation

Equal Loudness Pre-emphasis Approximate the non-equal sensitivity of the human hearing at different frequencies.

Intensitive Loudnes Power Law Approximate the non-linear relation between the intensity of sound and its perceived loudness.

Cepstral Analysis Introduction Homomorphic Processing Cepstral Spectrum Cepstrum Mel-Cepstrum Cepstrum in Speech Processing

Introduction When speech is pre-emphasised The excitation is not necessary for estimate the vocal tract function. Therefore, it is desirable to separate the excitation information form the vocal tract information.

We can think the speech spectrum as a signal, we can observer that is composed for the multiplication of a slow signal, and a fast signal, . Therefore, we can try to obtain the best of this knowledge. The formal technique which exploit this feature is called “Homomorphic Processing”.

Homomorphic Processing It is a technique to filter no-lineal systems. In Homomorphic Processing the non-linear related signals are transform the signal to a linear domain. H[ ] F(z) H-1[ ]

log[ ] S+(z) exp[ ] In order to obtain a linear system a complex log transformation is applied to the speech spectrum. log[ ] S+(z) exp[ ]

Cepstral Spectrum Definition. where is the STFT

Cepstrum Definition.

Cepstrum In Speech Processing Pitch Estimation Format Estimation Pitch and Formant Estimation

Pitch Estimation Sampled Speech High-Pass Liftering Emphasis Peaks (second derivative) Peak finding Cepstrum Pitch

Formant Estimation Sampled Speech Low-Pass Liftering Emphasis Peaks (second derivative) Peak finding Cepstrum Formants

Pitch and Formant Estimation Sampled Speech High-Pass Liftering Emphasis Peaks (second derivative) Peak finding Cepstrum Pitch Low-Pass Liftering Emphasis Peaks (second derivative) Peak finding Formants