A Tutorial on Bayesian Speech Feature Enhancement

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Uncertainty Representation. Gaussian Distribution variance Standard deviation.
Visual Recognition Tutorial
MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, ASRU. IEEE Workshop on Author.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Independent Component Analysis (ICA) and Factor Analysis (FA)
Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.
Comparative survey on non linear filtering methods : the quantization and the particle filtering approaches Afef SELLAMI Chang Young Kim.
Visual Recognition Tutorial
Estimation and the Kalman Filter David Johnson. The Mean of a Discrete Distribution “I have more legs than average”
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
(1) A probability model respecting those covariance observations: Gaussian Maximum entropy probability distribution for a given covariance observation.
Lecture II-2: Probability Review
Normalised Least Mean-Square Adaptive Filtering
Review of Probability.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Cepstral Vector Normalization based On Stereo Data for Robust Speech Recognition Presenter: Shih-Hsiang Lin Luis Buera, Eduardo Lleida, Antonio Miguel,
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Part 4 Curve Fitting.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Kalman Filter (Thu) Joon Shik Kim Computational Models of Intelligence.
Probabilistic Robotics Bayes Filter Implementations.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Multimodal Information Analysis for Emotion Recognition
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
Overview Particle filtering is a sequential Monte Carlo methodology in which the relevant probability distributions are iteratively estimated using the.
Basics of Neural Networks Neural Network Topologies.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Definitions Random Signal Analysis (Review) Discrete Random Signals Random.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Lecture 2: Statistical learning primer for biologists
ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Independent Component Analysis Independent Component Analysis.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION Kalman Filter with Process Noise Gauss- Markov.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
The Unscented Kalman Filter for Nonlinear Estimation Young Ki Baik.
Feature Transformation and Normalization Present by Howard Reference : Springer Handbook of Speech Processing, 3.3 Environment Robustness (J. Droppo, A.
Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability Primer Bayesian Brain Probabilistic Approaches to Neural Coding 1.1 A Probability.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Chapter 7. Classification and Prediction
PATTERN COMPARISON TECHNIQUES
Deep Feedforward Networks
LECTURE 06: MAXIMUM LIKELIHOOD ESTIMATION
(5) Notes on the Least Squares Estimate
Probability Theory and Parameter Estimation I
LECTURE 11: Advanced Discriminant Analysis
Model Inference and Averaging
Speech Enhancement with Binaural Cues Derived from a Priori Codebook
Outline Introduction Signal, random variable, random process and spectra Analog modulation Analog to digital conversion Digital transmission through baseband.
Ch3: Model Building through Regression
CH 5: Multivariate Methods
PSG College of Technology
Filtering and State Estimation: Basic Concepts
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
EE513 Audio Signals and Systems
LECTURE 15: REESTIMATION, EM AND MIXTURES
Biointelligence Laboratory, Seoul National University
Presented by Chen-Wei Liu
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

A Tutorial on Bayesian Speech Feature Enhancement SCALE Workshop, January 2010 A Tutorial on Bayesian Speech Feature Enhancement Friedrich Faubel

I Motivation

Speech Recognition System Overview A speech recognition system converts speech to text. It basically consists of two components: Front End: extracts speech features from the audio signal Decoder: finds that sentence (sequence of acoustical states), which is the most likely explanation for the observed sequence of speech features Front End Decoder Text Speech

Speech Feature Extraction Windowing

Speech Feature Extraction Windowing

Speech Feature Extraction Windowing

Speech Feature Extraction Windowing

Speech Feature Extraction Time Frequency Analysis Performing spectral analysis separately for each frame yields a time-frequency representation

Speech Feature Extraction Time Frequency Analysis Performing spectral analysis separately for each frame yields a time-frequency representation

Speech Feature Extraction Perceptual Representation Emulation of the logarithmic frequency and intensity perception of the human auditory system

Background Noise Background noise distorts speech features Result: features don’t match the features used during training Consequence: severely degraded recognition performance

Overview of the Tutorial I - Motivation II - The effect of noise to speech features III - Transforming probabilities IV - The MMSE solution to speech feature enhancement V - Model-based speech feature enhancement VI - Experimental results VII - Extensions

II Interaction Function The Effect of Noise

Interaction Function + = Principle of Superposition: signals are additive noise clean speech noisy speech + =

Interaction Function In the signal domain we have the following relationship: noisy speech noise clean speech

Interaction Function In the signal domain we have the following relationship:

Interaction Function In the signal domain we have the following relationship: After Fourier transformation, this becomes:

Interaction Function In the signal domain we have the following relationship: After Fourier transformation, this becomes: Taking the magnitude square on both sides, we get:

Interaction Function In the signal domain we have the following relationship: After Fourier transformation, this becomes: Taking the magnitude square on both sides, we get:

Interaction Function In the signal domain we have the following relationship: After Fourier transformation, this becomes: Taking the magnitude square on both sides, we get:

Interaction Function In the signal domain we have the following relationship: After Fourier transformation, this becomes: Taking the magnitude square on both sides, we get:

Interaction Function In the signal domain we have the following relationship: After Fourier transformation, this becomes: Taking the magnitude square on both sides, we get:

Interaction Function In the signal domain we have the following relationship: After Fourier transformation, this becomes: Taking the magnitude square on both sides, we get:

Interaction Function In the signal domain we have the following relationship: After Fourier transformation, this becomes: Taking the magnitude square on both sides, we get:

Interaction Function In the signal domain we have the following relationship: After Fourier transformation, this becomes: Taking the magnitude square on both sides, we get:

Interaction Function In the signal domain we have the following relationship: After Fourier transformation, this becomes: Taking the magnitude square on both sides, we get:

Interaction Function In the signal domain we have the following relationship: After Fourier transformation, this becomes: Taking the magnitude square on both sides, we get:

Interaction Function In the signal domain we have the following relationship: After Fourier transformation, this becomes: Taking the magnitude square on both sides, we get:

Interaction Function In the signal domain we have the following relationship: After Fourier transformation, this becomes: Taking the magnitude square on both sides, we get:

Interaction Function In the signal domain we have the following relationship: After Fourier transformation, this becomes: Taking the magnitude square on both sides, we get:

Interaction Function In the signal domain we have the following relationship: After Fourier transformation, this becomes: Taking the magnitude square on both sides, we get:

Interaction Function In the signal domain we have the following relationship: After Fourier transformation, this becomes: Taking the magnitude square on both sides, we get:

Interaction Function Taking the magnitude square on both sides, we get:

Interaction Function Taking the magnitude square on both sides, we get: Hence, in the power spectral domain we have:

Interaction Function Taking the magnitude square on both sides, we get: Hence, in the power spectral domain we have: phase term

Interaction Function Taking the magnitude square on both sides, we get: Hence, in the power spectral domain we have: relative phase

Interaction Function The relative phase between two waves describes their relative offset in time (delay) time relative phase

Interaction Function = = = = When 2 sound sources are present the following can happen: = = amplification amplification = = attenuation cancellation

Interaction Function Taking the magnitude square on both sides, we get: Hence, in the power spectral domain we have: relative phase

Interaction Function Taking the magnitude square on both sides, we get: Hence, in the power spectral domain we have: zero in average

Interaction Function Taking the magnitude square on both sides, we get: Hence, in the power spectral domain we have: In the log power spectral domain that becomes:

Interaction Function Taking the magnitude square on both sides, we get: Hence, in the power spectral domain we have: In the log power spectral domain that becomes:

Interaction Function Taking the magnitude square on both sides, we get: Hence, in the power spectral domain we have: In the log power spectral domain that becomes:

Interaction Function Taking the magnitude square on both sides, we get: Hence, in the power spectral domain we have: In the log power spectral domain that becomes:

Interaction Function Taking the magnitude square on both sides, we get: Hence, in the power spectral domain we have: In the log power spectral domain that becomes: Acero, 1990

Interaction Function Taking the magnitude square on both sides, we get: Hence, in the power spectral domain we have: In the log power spectral domain that becomes: But is that really right?

Interaction Function The mean of a nonlinearly transformed random variable is not necessarily equal to the nonlinear transform of the random variable’s mean. nonlinear transform

Interaction Function The mean of a nonlinearly transformed random variable is not necessarily equal to the nonlinear transform of the random variable’s mean. nonlinear transform

Interaction Function Phase-averaged relationship between clean and noisy speech:

III Transforming Probabilities

Transforming Probabilities Motivation In the signal domain we have the following relationship: In the log Mel domain that translates to: nonlinear interaction function

Transforming Probabilities Motivation noise power noisy speech power clean speech power

Transforming Probabilities Motivation noisy speech power clean speech power noise power

Transforming Probabilities Motivation clean speech power noise power noisy speech power

Transforming Probabilities Motivation

Transforming Probabilities Motivation Transformation results in a non-Gaussian probability distribution for noisy speech features.

Transforming Probabilities Introduction Transformation of a random variable Transformation Probability density function

Transforming Probabilities Introduction Transformation of a random variable Transformation Probability density function The transformation maps each x to a y:

Transforming Probabilities Introduction Transformation of a random variable Transformation Probability density function The transformation maps each x to a y: Conversely, each y can be identified with

Transforming Probabilities Introduction Transformation of a random variable Transformation Probability density function Idea: use to map distribution of y to distribution of x

Transforming Probabilities Introduction Transformation of a random variable Transformation Probability density function Idea: use to map distribution of y to distribution of x change of variables

Transforming Probabilities Introduction Transformation of a random variable Transformation Probability density function Idea: use to map distribution of y to distribution of x Jacobian determinant

Transforming Probabilities Introduction Transformation of a random variable Transformation Probability density function Idea: use to map distribution of y to distribution of x Fundamental Transformation Law of Probability

Transforming Probabilities Monte Carlo Idea: approximate probability distribution by samples drawn from the distribution. discrete probability mass pdf

Transforming Probabilities Monte Carlo Idea: approximate probability distribution by samples drawn from the distribution. pdf cumulative density function

Transforming Probabilities Monte Carlo Idea: approximate probability distribution by samples drawn from the distribution. Then: transform each sample pdf transformed pdf

Transforming Probabilities Monte Carlo Idea: approximate probability distribution by samples drawn from the distribution. Then: transform each sample histogram transformed pdf

Transforming Probabilities Local Linearization Idea: Locally linearize the interaction function around the mean of speech and noise, using a first order Taylor series expansion. Note: a linear transformation of a Gaussian random variable results in a Gaussian random variable.

Transforming Probabilities Local Linearization Idea: Locally linearize the interaction function around the mean of speech and noise, using a first order Taylor series expansion. Moreno, 1996 Vector Taylor Series Approach Note: a linear transformation of a Gaussian random variable results in a Gaussian random variable.

Transforming Probabilities Local Linearization Idea: Locally linearize the interaction function around the mean of speech and noise, using a first order Taylor series expansion.

Transforming Probabilities Local Linearization Idea: Locally linearize the interaction function around the mean of speech and noise, using a first order Taylor series expansion.

Transforming Probabilities Local Linearization Idea: Locally linearize the interaction function around the mean of speech and noise, using a first order Taylor series expansion.

Transforming Probabilities The Unscented Transform Idea: similar as in Monte Carlo, select points in a determi- nistic fashion and in such a way that they capture the mean and covariance of the distribution select points

Transforming Probabilities The Unscented Transform select points

Transforming Probabilities The Unscented Transform select points transform points

Transforming Probabilities The Unscented Transform select points transform points Re-estimate parameters of the Gaussian distribution

Transforming Probabilities The Unscented Transform Comparison to local linearization: local linearization unscented transform

Transforming Probabilities The Unscented Transform select points transform points Re-estimate parameters of the Gaussian distribution

Transforming Probabilities The Unscented Transform transform points

Transforming Probabilities The Unscented Transform The points selected by the un-scented transform lie on lines around the center point. transform points

Transforming Probabilities The Unscented Transform The points selected by the un-scented transform lie on lines around the center point. transform points

Transforming Probabilities The Unscented Transform The points selected by the un-scented transform lie on lines around the center point. transform points

Transforming Probabilities The Unscented Transform The points selected by the un-scented transform lie on lines around the center point. After nonlinear transformation, the points might no longer lie on a line transform points

Transforming Probabilities The Unscented Transform The points selected by the un-scented transform lie on lines around the center point. After nonlinear transformation, the points might no longer lie on a line transform points

Transforming Probabilities The Unscented Transform The points selected by the un-scented transform lie on lines around the center point. After nonlinear transformation, the points might no longer lie on a line transform points Hence we can measure the degree of nonlinearity as the average distance of each three points from a linear fit of the three points.

Transforming Probabilities The Unscented Transform transform points Hence we can measure the degree of nonlinearity as the average distance of each three points from a linear fit of the three points.

Transforming Probabilities The Unscented Transform Hence we can measure the degree of nonlinearity as the average distance of each three points from a linear fit of the three points. transform points

Transforming Probabilities The Unscented Transform Hence we can measure the degree of nonlinearity as the average distance of each three points from a linear fit of the three points. This can be shown to be closely related to the R2 measure used in linear regression. transform points

Transforming Probabilities The Unscented Transform true distribution Gaussian fit High degree of nonlinearity Gaussian fit does not well represent the transformed distribution

Transforming Probabilities An Adaptive Level of Detail Approach Idea: splitting a Gaussian into two Gaussian components decreases the covariance and thereby the nonlinearity.

Transforming Probabilities An Adaptive Level of Detail Approach Idea: splitting a Gaussian into two Gaussian components decreases the covariance and thereby the nonlinearity. 2 Gaussians

Transforming Probabilities An Adaptive Level of Detail Approach Algorithm, Adaptive Level of Detail Transform [ALoDT] start with one Gaussian g transform that Gaussian with the UT identify Gaussian component with highest dnl split that component into 2 Gaussians g1, g2 transform g1 and g2 with the UT while #(Gaussians) < N: repeat step 3.

Transforming Probabilities An Adaptive Level of Detail Approach Density approximation with the Adaptive Level of Detail Transform unscented transform

Transforming Probabilities An Adaptive Level of Detail Approach Density approximation with the Adaptive Level of Detail Transform ALoDT-2

Transforming Probabilities An Adaptive Level of Detail Approach Density approximation with the Adaptive Level of Detail Transform ALoDT-4

Transforming Probabilities An Adaptive Level of Detail Approach Density approximation with the Adaptive Level of Detail Transform ALoDT-8

Transforming Probabilities An Adaptive Level of Detail Approach Density approximation with the Adaptive Level of Detail Transform ALoDT-16

Transforming Probabilities An Adaptive Level of Detail Approach Density approximation with the Adaptive Level of Detail Transform ALoDT-32

Transforming Probabilities An Adaptive Level of Detail Approach Kullback Leibler divergence between approximated and true distribution (Monte Carlo with 10M samples). Adaptive Level of Detail Transform N 1 2 4 8 16 32 KLD 0.190 0.078 0.025 0.017 0.007 0.004 decrease by a factor of 48

IV Speech Feature Enhancement The MMSE Solution

Speech Feature Enhancement The MMSE Solution Idea: train speech recognition system on clean speech try to map distorted features to clean speech features Systematic Approach: derive an estimator for clean speech given noisy speech

Speech Feature Enhancement The MMSE Solution Let be an estimator for clean speech , given noisy speech .

Speech Feature Enhancement The MMSE Solution Let be an estimator for clean speech , given noisy speech . Then the expected mean square error introduced by using instead of the true is:

Speech Feature Enhancement The MMSE Solution Let be an estimator for clean speech , given noisy speech . Then the expected mean square error introduced by using instead of the true is:

Speech Feature Enhancement The MMSE Solution Then the expected mean square error introduced by using instead of the true is:

Speech Feature Enhancement The MMSE Solution Then the expected mean square error introduced by using instead of the true is: Minimizing the MSE with respect to yields the optimal estimator with respect to the MMSE criterion:

Speech Feature Enhancement The MMSE Solution Then the expected mean square error introduced by using instead of the true is: Minimizing the MSE with respect to yields the optimal estimator with respect to the MMSE criterion:

Speech Feature Enhancement The MMSE Solution Minimizing the MSE with respect to yields the optimal estimator with respect to the MMSE criterion:

Speech Feature Enhancement The MMSE Solution Minimizing the MSE with respect to yields the optimal estimator with respect to the MMSE criterion: But how to obtain this distribution?

Speech Feature Enhancement The MMSE Solution Minimizing the MSE with respect to yields the optimal estimator with respect to the MMSE criterion: Idea: assume that the joint distribution of S and Y is Gaussian

Speech Feature Enhancement The MMSE Solution Minimizing the MSE with respect to yields the optimal estimator with respect to the MMSE criterion: Idea: assume that the joint distribution of S and Y is Gaussian

Speech Feature Enhancement The MMSE Solution Minimizing the MSE with respect to yields the optimal estimator with respect to the MMSE criterion: Idea: assume that the joint distribution of S and Y is Gaussian Afify, 2007 Stereo-Based Stochastic Mapping

Speech Feature Enhancement The MMSE Solution Minimizing the MSE with respect to yields the optimal estimator with respect to the MMSE criterion: Idea: assume that the joint distribution of S and Y is Gaussian

Speech Feature Enhancement The MMSE Solution Minimizing the MSE with respect to yields the optimal estimator with respect to the MMSE criterion: Idea: assume that the joint distribution of S and Y is Gaussian

Speech Feature Enhancement The MMSE Solution Idea: assume that the joint distribution of S and Y is Gaussian

Speech Feature Enhancement The MMSE Solution Idea: assume that the joint distribution of S and Y is Gaussian Then the conditional distribution of S|Y is again Gaussian: with conditional mean and covariance matrix

Speech Feature Enhancement The MMSE Solution Minimizing the MSE with respect to yields the optimal estimator with respect to the MMSE criterion:

Speech Feature Enhancement The MMSE Solution Minimizing the MSE with respect to yields the optimal estimator with respect to the MMSE criterion: Under the Gaussian assumption, this integral is easily obtained:

Speech Feature Enhancement The MMSE Solution Minimizing the MSE with respect to yields the optimal estimator with respect to the MMSE criterion: Under the Gaussian assumption, this integral is easily obtained: This is exactly what you get with the vector Taylor series approach Moreno, 1996

Speech Feature Enhancement The MMSE Solution Minimizing the MSE with respect to yields the optimal estimator with respect to the MMSE criterion: Under the Gaussian assumption, this integral is easily obtained: Problem: speech is known to be multi modal

Speech Feature Enhancement The MMSE Solution Minimizing the MSE with respect to yields the optimal estimator with respect to the MMSE criterion:

Speech Feature Enhancement The MMSE Solution Minimizing the MSE with respect to yields the optimal estimator with respect to the MMSE criterion: Introduce the index k of the mixture component as a hidden variable.

Speech Feature Enhancement The MMSE Solution Minimizing the MSE with respect to yields the optimal estimator with respect to the MMSE criterion: Then rewrite this as

Speech Feature Enhancement The MMSE Solution Minimizing the MSE with respect to yields the optimal estimator with respect to the MMSE criterion:

Speech Feature Enhancement The MMSE Solution Minimizing the MSE with respect to yields the optimal estimator with respect to the MMSE criterion: pull the sum out of the integral

Speech Feature Enhancement The MMSE Solution Minimizing the MSE with respect to yields the optimal estimator with respect to the MMSE criterion:

Speech Feature Enhancement The MMSE Solution Minimizing the MSE with respect to yields the optimal estimator with respect to the MMSE criterion: independent of s

Speech Feature Enhancement The MMSE Solution Minimizing the MSE with respect to yields the optimal estimator with respect to the MMSE criterion: pull this out of the integral

Speech Feature Enhancement The MMSE Solution Minimizing the MSE with respect to yields the optimal estimator with respect to the MMSE criterion:

Speech Feature Enhancement The MMSE Solution Minimizing the MSE with respect to yields the optimal estimator with respect to the MMSE criterion: Probability that clean speech originated from the kth Gaus-sian given the noisy speech spectrum y.

Speech Feature Enhancement The MMSE Solution Minimizing the MSE with respect to yields the optimal estimator with respect to the MMSE criterion: Clean speech estimate of the k-th Gaussian:

Speech Feature Enhancement The MMSE Solution Minimizing the MSE with respect to yields the optimal estimator with respect to the MMSE criterion: Bayes’ theorem

Speech Feature Enhancement The MMSE Solution Minimizing the MSE with respect to yields the optimal estimator with respect to the MMSE criterion:

Speech Feature Enhancement The MMSE Solution Minimizing the MSE with respect to yields the optimal estimator with respect to the MMSE criterion:

Speech Feature Enhancement The MMSE Solution Minimizing the MSE with respect to yields the optimal estimator with respect to the MMSE criterion: joint distribution

V Model-Based Speech Feature Enhancement

Model-Based Speech Feature Enhancement Distribution of clean speech is modeled as Gaussian Mixture

Model-Based Speech Feature Enhancement Distribution of clean speech is modeled as Gaussian Mixture + +

Model-Based Speech Feature Enhancement Distribution of clean speech is modeled as Gaussian Mixture Noise is modeled as a single Gaussian

Model-Based Speech Feature Enhancement Distribution of clean speech is modeled as Gaussian Mixture Noise is modeled as a single Gaussian

Model-Based Speech Feature Enhancement Distribution of clean speech is modeled as Gaussian Mixture Noise is modeled as a single Gaussian Presence of noise changes the clean speech distribution according to the interaction function

Model-Based Speech Feature Enhancement Distribution of clean speech is modeled as Gaussian Mixture Noise is modeled as a single Gaussian Presence of noise changes the clean speech distribution according to the interaction function Construct the joint distribution of clean and noisy speech based on this model

Model-Based Speech Feature Enhancement Distribution of clean speech is modeled as Gaussian Mixture Noise is modeled as a single Gaussian Presence of noise changes the clean speech distribution according to the interaction function Construct the joint distribution of clean and noisy speech based on this model

Model-Based Speech Feature Enhancement Construct the joint distribution of clean and noisy speech based on this model

Model-Based Speech Feature Enhancement Construct the joint distribution of clean and noisy speech based on this model

Model-Based Speech Feature Enhancement Construct the joint distribution of clean and noisy speech based on this model

Model-Based Speech Feature Enhancement Construct the joint distribution of clean and noisy speech based on this model

Model-Based Speech Feature Enhancement Noise Estimation: Find that noise distribution, which is the most likely explanation for the observed, noisy speech features

Model-Based Speech Feature Enhancement Noise Estimation: Find that noise distribution, which is the most likely explanation for the observed, noisy speech features mean and covariance of the noise

Model-Based Speech Feature Enhancement Noise Estimation: Find that noise distribution, which is the most likely explanation for the observed, noisy speech features Problem: the observations are also dependent on speech!

Model-Based Speech Feature Enhancement Problem: the observations are also dependent on speech! hidden variable

Model-Based Speech Feature Enhancement Problem: the observations are also dependent on speech! hidden variable

Model-Based Speech Feature Enhancement Problem: the observations are also dependent on speech! hidden variable

Model-Based Speech Feature Enhancement Noise Estimation: Find that noise distribution, which is the most likely explanation for the observed, noisy speech features Problem: the observations are also dependent on speech!

Model-Based Speech Feature Enhancement Noise Estimation: Find that noise distribution, which is the most likely explanation for the observed, noisy speech features Problem: the observations are also dependent on speech! Hence, the Expectation Maximization algorithm is used. Rose, 1994 Moreno, 1996

Model-Based Speech Feature Enhancement Expectation Step: construct the joint distribution by using the current noise parameter estimate . Then calculate .

Model-Based Speech Feature Enhancement Expectation Step: construct the joint distribution by using the current noise parameter estimate . Then calculate .

Model-Based Speech Feature Enhancement Expectation Step: construct the joint distribution by using the current noise parameter estimate . Then calculate . Maximization Step: Reestimate by ac-cumulating statistics of the instantaneous noise estimates for each possible , weighted by the probability that clean speech originated from this Gaussian.

Model-Based Speech Feature Enhancement Maximization Step: Reestimate by ac-cumulating statistics of the instantaneous noise estimates for each possible , weighted by the probability that clean speech originated from this Gaussian.

Model-Based Speech Feature Enhancement Maximization Step: Reestimate by ac-cumulating statistics of the instantaneous noise estimates for each possible , weighted by the probability that clean speech originated from this Gaussian:

Model-Based Speech Feature Enhancement Maximization Step: Reestimate by ac-cumulating statistics of the instantaneous noise estimates for each possible , weighted by the probability that clean speech originated from this Gaussian:

Model-Based Speech Feature Enhancement Maximization Step: Reestimate by ac-cumulating statistics of the instantaneous noise estimates for each possible , weighted by the probability that clean speech originated from this Gaussian: But how to obtain this distribution?

Model-Based Speech Feature Enhancement Maximization Step: Reestimate by ac-cumulating statistics of the instantaneous noise estimates for each possible , weighted by the probability that clean speech originated from this Gaussian:

Model-Based Speech Feature Enhancement Maximization Step: Reestimate by ac-cumulating statistics of the instantaneous noise estimates for each possible , weighted by the probability that clean speech originated from this Gaussian:

Model-Based Speech Feature Enhancement Maximization Step: Reestimate by ac-cumulating statistics of the instantaneous noise estimates for each possible , weighted by the probability that clean speech originated from this Gaussian:

Model-Based Speech Feature Enhancement Maximization Step: Reestimate by ac-cumulating statistics of the instantaneous noise estimates for each possible , weighted by the probability that clean speech originated from this Gaussian:

Model-Based Speech Feature Enhancement Maximization Step: Reestimate by ac-cumulating statistics of the instantaneous noise estimates for each possible , weighted by the probability that clean speech originated from this Gaussian: So, we have , need .

Model-Based Speech Feature Enhancement Maximization Step: Reestimate by ac-cumulating statistics of the instantaneous noise estimates for each possible , weighted by the probability that clean speech originated from this Gaussian: But that is just the conditional Gaussian distribution with conditional mean and covariance .

Model-Based Speech Feature Enhancement Maximization Step: Reestimate by ac-cumulating statistics of the instantaneous noise estimates for each possible , weighted by the probability that clean speech originated from this Gaussian:

VI Experimental Results

Experimental Results Speech Recognition Experiments clean speech from MC-WSJ-AV corpus noise from the NOISEX-92 database (artifically added) MFCC with 13 components, stacking of 15 frames, LDA cepstral mean and variance normalization 1743 acoustical states; 70308 Gaussians

Experimental Results WER, destroyer engine noise

Experimental Results WER, factory noise

VII Extensions

Extensions Sequential noise estimation: Sequential expectation maximization (SEM), Kim, 1998

Extensions Sequential noise estimation: Sequential expectation maximization (SEM), Kim, 1998 Interacting Multiple Model (IMM) Kalman Filter, Kim, 1999

Extensions Sequential noise estimation: Sequential expectation maximization (SEM), Kim, 1998 Interacting Multiple Model (IMM) Kalman Filter, Kim, 1999 Particle filter, Yao, 2001

Extensions Sequential noise estimation: Sequential expectation maximization (SEM), Kim, 1998 Interacting Multiple Model (IMM) Kalman Filter, Kim, 1999 Particle filter, Yao, 2001 Improve speech recognition through: Combination with Joint Uncertainty Decoding, Shinohara, 2008

Extensions Sequential noise estimation: Sequential expectation maximization (SEM), Kim, 1998 Interacting Multiple Model (IMM) Kalman Filter, Kim, 1999 Particle filter, Yao, 2001 Improve speech recognition through: Combination with Joint Uncertainty Decoding, Shinohara, 2008 Combination with bounded conditional mean imputation?