1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister.

Slides:



Advertisements
Similar presentations
TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST
Advertisements

By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
Learning Introductory Signal Processing Using Multimedia 1 Outline Overview of Information and Communications Some signal processing concepts Tools available.
Linearization Variance Estimators for Survey Data: Some Recent Work
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
Addition Facts
Chapter 7 Sampling and Sampling Distributions
Biostatistics Unit 5 Samples Needs to be completed. 12/24/13.
Chapter 6 The Mathematics of Diversification
Solving Equations How to Solve Them
Hypothesis Tests: Two Independent Samples
Insert Date HereSlide 1 Using Derivative and Integral Information in the Statistical Analysis of Computer Models Gemma Stephenson March 2007.
Submission doc.: IEEE 11-14/0353r0 March 2014 Dongguk Lim, LG ElectronicsSlide 1 Suggestion on PHY Abstraction for Evaluation Methodology Date:
1 Using Bayesian Network for combining classifiers Leonardo Nogueira Matos Departamento de Computação Universidade Federal de Sergipe.
Addition 1’s to 20.
25 seconds left…...
Week 1.
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
Computer Vision Lecture 7: The Fourier Transform
Discriminative Training in Speech Processing Filipp Korkmazsky LORIA.
SURE-LET for Orthonormal Wavelet-Domain Video Denoising Florian Luisier, Member, IEEE, Thierry Blu, Senior Member, IEEE, and Michael Unser, Fellow, IEEE.
Probabilistic Reasoning over Time
Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
Building an ASR using HTK CS4706
Speech Enhancement Based on a Combination of Spectral Subtraction and MMSE Log-STSA Estimator in Wavelet Domain LATSI laboratory, Department of Electronic,
HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR.
Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING MARCH 2010 Lan-Ying Yeh
Department of Electrical Engineering and Information Sciences Institute of Communication Acoustics (IKA) 1 Institute of Communication Acoustics (IKA)
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
Isolated-Word Speech Recognition Using Hidden Markov Models
Cepstral Vector Normalization based On Stereo Data for Robust Speech Recognition Presenter: Shih-Hsiang Lin Luis Buera, Eduardo Lleida, Antonio Miguel,
Heart Sound Background Noise Removal Haim Appleboim Biomedical Seminar February 2007.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering.
Jacob Zurasky ECE5526 – Spring 2011
Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.
Multimodal Information Analysis for Emotion Recognition
Basics of Neural Networks Neural Network Topologies.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.
Performance Comparison of Speaker and Emotion Recognition
ICASSP 2006 Robustness Techniques Survey ShihHsiang 2006.
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.
A New Approach to Utterance Verification Based on Neighborhood Information in Model Space Author :Hui Jiang, Chin-Hui Lee Reporter : 陳燦輝.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Nonlinear State Estimation
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Feature Transformation and Normalization Present by Howard Reference : Springer Handbook of Speech Processing, 3.3 Environment Robustness (J. Droppo, A.
Speech Enhancement Summer 2009
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
Statistical Models for Automatic Speech Recognition
Filtering and State Estimation: Basic Concepts
Statistical Models for Automatic Speech Recognition
8-Speech Recognition Speech Recognition Concepts
Master Thesis Presentation
A Tutorial on Bayesian Speech Feature Enhancement
Missing feature theory
LECTURE 15: REESTIMATION, EM AND MIXTURES
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Dealing with Acoustic Noise Part 1: Spectral Estimation
Presented by Chen-Wei Liu
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister

2 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister PROPAGATION OF STATISTICAL INFORMATION THROUGH NON-LINEAR FEATURE EXTRACTIONS FOR ROBUST SPEECH RECOGNITION Overview: 1.Introduction: Automatic speech recognition. 2.Problem: Imperfect noise suppression. 3.Proposed solution: Uncertainty propagation. 4.Tests & results. 5.Conclusions. R. F. Astudillo, D. Kolossa and R. Orglmeister - TU-Berlin

3 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister Automatic Speech Recognizer (ASR) Feature extraction transforms signal into a domain more suitable for recognition. Speech recognizer models abstract speech components like phonemes or triphones, generates transcription. Most of speech recognition applications need noise suppression preprocessing.

4 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister Non-linear transformations that imitate the way humans process speech. Robust against inter-speaker and intra-speaker variability. Mel-cepstral and RASTA-PLP transformations. Feature Extraction

5 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister Speech Recognition Statistical models are used to model speech. Hidden Markov models with mixture of Gaussians (multivariable) for the emitting states. Example: Mel-cepstral features

6 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister Noise Suppression MMSE-LSA bayesian estimation [Ephraim1985] is one of the most used. Leaves residual noise. Introduces artifacts in speech. Most methods obtain an estimation of the short-time spectrum (STFT) of the clean signal. Problem: Imperfect estimation.

7 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister Solution: Modeling Uncertainty of Estimation We model each element of the STFT as a complex Gaussian random distribution. Mean set equal to estimated clean value. Parameter controls the uncertainty.

8 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister Propagation of Uncertainty We propagate first and second order moments of the distributions. Correlation between feature appears (covariance). Resulting uncertainty is combined with statistical model parameters for robust speech recognition

9 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister Propagation of Uncertainty We propagate first and second order moments of the distributions. Correlation between feature appears (covariance). Resulting uncertainty is combined with statistical model parameters for robust speech recognition

10 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister Approaches to Uncertainty Propagation Analytic solutions Imply complex calculations. Specific for each transformation. Pseudo-Montecarlo Unscented Transform [Julier1996]. Inefficient for high number of dimensions (i.e. STFT 256 dim./frame). ► Piecewise Propagation Efficient combination of both methods. Valid for many feature extractions (i.e. MELSPEC, MFCC, RASTA-PLP).

11 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister Piecewise Uncertainty Propagation Exemplified with Mel-Ceptral transformation: 1.Modulus extraction (non-linear). 2.Filterbank (linear). 3.Logarithm (non-linear). 4.Discrete-cosine-transform (linear). 5.Delta and acceleration coefficients (linear).

12 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister Propagation through Modulus By integrating the phase of a complex Gaussian distribution we obtain the Rice distribution. Mean and variance can be calculated as: were L is the Legendre polynom.

13 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister Propagation through filterbank Each filter output m is a weighted sum of frequency moduli. It can be expressed as a matrix multiplication. Mean and variance can be calculated as:

14 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister Full Covariance and other linear transformations DCT, delta and acceleration can be computed similarly. Covariance after filterbank is no longer diagonal. Additional computation costs.

15 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister Propagation through Logarithm Non-linear transformation Distribution after filterbank difficult to model not diagonal Dimesionality of the Mel features much smaller than the STFT features ► Unscented transform can be applied efficiently

16 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister Unscented Transform Only points must be propagated. Points on the th covariace contour and the mean. = feature dimension Example for =2

17 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister Unscented Transform II Mean and covariances are calculated by using weighted averages: Parameter allows higher moments of the distribution to be considered.

18 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister Use of Uncertainty After propagation of uncertainty, missing feature techniques or uncertainty decoding may be applied. These techniques combine uncertainty and model information to ignore or reestimate noisy features. Parameters of state f1

19 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister Use of Uncertainty II Modified imputation [Kolossa2005] showed the best performance. It reestimates features for state q by maximizing the probability: Assuming multivariate Gaussian distribution for uncertainty and model:

20 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister Recognition Tests TI-DIGITS database % correct identified words WindnoiseStreetnoise Test TypeUncertainty-15dB5dB-15dB5dB Clean Speech ( )98.76 Noisy ( ) MMSE-LSA ( ) Aprox. uncertainty Ideal uncertainty files (20 different speakers). Best, second best results.

21 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister Conclusions The use of uncertainty in Mel-cepstral domain is useful to compensate imperfect estimation during noise suppression. Piecewise uncertainty propagation is valid for multiple feature extractions. Better estimation of uncertainty should improve the results.

22 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister Thank You! Some literature: [Ephraim1985] Y. Ephraim, and D. Malah, Acoustics, Speech, and Signal Processing, IEEE Transactions on 33, 443–445 (1985). [Julier1996] S. Julier, and J. Uhlmann, A general method for approximating nonlinear transformations of probability distributions, Tech. rep., University of Oxford, UK (1996). [Kolossa2005] D. Kolossa, A. Klimas, and R. Orglmeister, “Separation and robust recognition of noisy, convolutive speech mixtures using time-frequency masking and missing data techniques,” Applications of Signal Processing to Audio and Acoustics, IEEE Workshop on, 2005, pp