ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.

Slides:



Advertisements
Similar presentations
Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
Advertisements

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Speech Recognition with Hidden Markov Models Winter 2011
Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented.
AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION Michael L. Seltzer, Dong Yu Yongqiang Wang ICASSP 2013 Presenter : 張庭豪.
Hidden Markov Model based 2D Shape Classification Ninad Thakoor 1 and Jean Gao 2 1 Electrical Engineering, University of Texas at Arlington, TX-76013,
Presenter: Yufan Liu November 17th,
Visual Recognition Tutorial
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Sample-Separation-Margin Based Minimum Classification Error Training of Pattern Classifiers with Quadratic Discriminant Functions Yongqiang Wang 1,2, Qiang.
Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.
LORIA Irina Illina Dominique Fohr Chania Meeting May 9-10, 2007.
HIWIRE Progress Report – July 2006 Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Alex Potamianos Technical University.
1 Speech Enhancement Wiener Filtering: A linear estimation of clean signal from the noisy signal Using MMSE criterion.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Example Clustered Transformations MAP Adaptation Resources: ECE 7000:
Advanced Signal Processing 2, SE Professor Horst Cerjak, Andrea Sereinig Graz, Basics of Hidden Markov Models Basics of HMM-based.
Isolated-Word Speech Recognition Using Hidden Markov Models
Cepstral Vector Normalization based On Stereo Data for Robust Speech Recognition Presenter: Shih-Hsiang Lin Luis Buera, Eduardo Lleida, Antonio Miguel,
Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition Bing Zhang and Spyros Matsoukas BBN Technologies Present.
Codebook-based Feature Compensation for Robust Speech Recognition 2007/02/08 Shih-Hsiang Lin ( 林士翔 ) Graduate Student National Taiwan Normal University,
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering.
1 Improved Speaker Adaptation Using Speaker Dependent Feature Projections Spyros Matsoukas and Richard Schwartz Sep. 5, 2003 Martigny, Switzerland.
LOG-ENERGY DYNAMIC RANGE NORMALIZATON FOR ROBUST SPEECH RECOGNITION Weizhong Zhu and Douglas O’Shaughnessy INRS-EMT, University of Quebec Montreal, Quebec,
DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.
Speech Parameter Generation From HMM Using Dynamic Features Keiichi Tokuda, Takao Kobayashi, Satoshi Imai ICASSP 1995 Reporter: Huang-Wei Chen.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
MINIMUM WORD CLASSIFICATION ERROR TRAINING OF HMMS FOR AUTOMATIC SPEECH RECOGNITION Yueng-Tien, Lo Speech Lab, CSIE National.
ICASSP 2006 Robustness Techniques Survey ShihHsiang 2006.
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
A New Approach to Utterance Verification Based on Neighborhood Information in Model Space Author :Hui Jiang, Chin-Hui Lee Reporter : 陳燦輝.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Survey of Robust Speech Techniques in ICASSP 2009 Shih-Hsiang Lin ( 林士翔 ) 1Survey of Robustness Techniques in ICASSP 2009.
Exploring the Use of Speech Features and Their Corresponding Distribution Characteristics for Robust Speech Recognition Shih-Hsiang Lin, Berlin Chen, Yao-Ming.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Present by: Fang-Hui Chu Large Margin Gaussian Mixture Modeling for Phonetic Classification and Recognition Fei Sha*, Lawrence K. Saul University of Pennsylvania.
Christoph Prinz / Automatic Speech Recognition Research Progress Hits the Road.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Statistical Models for Automatic Speech Recognition
Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing
Statistical Models for Automatic Speech Recognition
Automatic Speech Recognition: Conditional Random Fields for ASR
A Tutorial on Bayesian Speech Feature Enhancement
Missing feature theory
DCT-based Processing of Dynamic Features for Robust Speech Recognition Wen-Chi LIN, Hao-Teng FAN, Jeih-Weih HUNG Wen-Yi Chu Department of Computer Science.
LECTURE 15: REESTIMATION, EM AND MIXTURES
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin

2 Topic Word Graph based Feature Enhancement for Noisy Speech Recognition Stereo-based Stochastic Mapping for Robust Speech Recognition Combination of Recognizers and Fusion of Features Approach to Missing Data ASR Under Non-Stationary Noise Conditions

WORD GRAPH BASED FEATURE ENHANCEMENT FOR NOISY SPEECH RECOGNITION Zhi-Jie Yan 1 Frank K. Soong 2 Ren-Hua Wang 1 1 iFlytek Speech Lab, University of Science and Technology of China, Hefei, P. R. China, Microsoft Research Asia, Beijing, P. R. China, SPE-L3: Robust Features and Acoustic Modeling

4 Introduction This paper presents a word graph based feature enhancement method for robust speech recognition in noise –The word graph based approach would have more chance that the correct hypotheses exist in the graph with relatively lower posterior probabilities (or likelihoods) than the incorrect first best hypothesis The proposed method is based upon Wiener fitering of the Mel- filter bank energy, given –the input noisy speech –a signal processing based estimate of noise –a clean trained Hidden Markov Model (HMM) Therefore, the enhanced speech feature after Wiener filtering can match the clean speech model better in the acoustic space, and thus leads to an improved recognition performance

5 Algorithm Overview rough estimate of noise spectrum re-estimate noise mean normalized speech model based estimate of the clean speech final estimate of clean speech corresponding clean speech

6 More Details… Kernel posterior probabilities for each Gaussian component of the model can be calculated –These posterior probabilities will serve as the weighting coefficients for synthesizing the model based clean speech for Wiener filtering –Using the word graph, the posterior probability of kernel k at time t, given the entire observation sequence can be formulated as: The model based clean speech estimate for Wiener filtering is constructed in two steps –First step, for each time frame t, the expected values of the mean and covariance of the clean speech feature are calculated using the kernel posterior probabilities along with the kernel parameters Word Posterior Probability (WPP) State Occupancy Probability Kernel Occupancy Probability

7 More Details… (cont.) –Second step, clean speech S3 can be synthesized in ML sense the ML solution of S3 can be obtained by solving the weighted normal equation synthesized clean speech

8 More Details… (cont.) Wiener filtering of the Mel-filter bank energy is performed in the linear spectral domain In the last step, is converted to the cepstral domain, and then rescore the word graph –Re-decode S4 within the constrained search space defined by the word graph  final estimate of clean speech

9 Experimental Results signal processing based feature enhancement consistently improves the recognition performance, and the overall relative error rate reduction is 35.44% the GER of the decoded word graph is significantly lower than the WER of the first best hypothesis (only about 1/4 ∼ 1/5) The results show that an overall relative error rate reduction of 57.89% is obtained Using word graph constrained second pass decoding, this result is obtained with a minor increase of the computational cost The experimental results suggest that the difference between the two decoding scenarios is minimal

STEREO-BASED STOCHASTIC MAPPING FOR ROBUST SPEECH RECOGNITION Mohamed Afif, Xiaodong Cui, and Yuqing Gao IBM T.J. Watson Research Center 1101 Old Kitchawan Road, Yorktown Heights, NY, SPE-L3: Robust Features and Acoustic Modeling

11 Introduction The idea is based on building a GMM for the joint distribution of the clean and noisy channels during training and using an iterative compensation algorithm during testing. –Also interpreted as a mixture of linear transforms that are estimated in a special way using stereo data –Stack both the clean and noisy channels to form a large augmented space and to build a statistical model in this new space The observed noisy speech and the augmented statistical model are used to predict the clean speech

12 Algorithm Formulation Assume we have a set of stereo data {(x i, y i )} Define z ≡ (x, y) as the concatenation of the two channels The first step in constructing the mapping is training the joint probability model for p(z) Once this model is constructed it can be used during testing to estimate the clean speech given the noisy observations –The problem of estimating x in Equation looks like a mixture estimation problem where

13 Algorithm Formulation (cont.) Hence, we will iteratively optimize an EM objective function given by – is the value of x from previous iteration

14 Algorithm Formulation (cont.) By differentiating Equation with respect to x, setting the resulting derivative to zero An interesting special case arises when x is a scalar

15 Experimental Results The first three lines refer to train/test conditions where the clean refers to the CT and noisy to the HF It can be observed that the proposed mapping outperforms SPLICE for all GMM sizes with the difference decreasing with increasing the GMM size Both methods are considerably better than the VTS result in the last row of Table 1 Using a time window gives an improvement over the baseline SSM with a slight cost during runtime These results are not given for SPLICE because using biases requires that both the input and output spaces have the same dimensions Digit recognition in the car

16 Experimental Results (cont.) English large vocabulary speech recognition SSM brings considerable improvement over MST even in the clean speech condition MFCC Feature LDA+MLLT Feature Building maps for the final feature space (after LDA and MLLT) looks to be slightly better than the original cepstral space

Combination of Recognizers and Fusion of Features Approach to Missing Data ASR Under Non-Stationary Noise Conditions Neil Joshi and Ling Guan Department of Electrical and Computer Engineering Ryerson University Toronto ON M5B 2K3, Canada SPE-P14: Robustness II

18 Introduction This paper proposes a method a enhance speech recognition performance using missing data techniques for non-stationary noise conditions –By incorporating more resilient feature sets into the decoding process –Two separate HMM based models One using spectral features The other MFCC features The statistical dependencies found in the models are based upon a coupled HMM methodology, the Fused HMM model –One using standard ASR techniques (traditional MFCC based HMM models) –The other missing data based (missing data theory spectral HMM models)

19 Coupled Fused HMM The fused HMM model models the relationship between HMMs using a probabilistic fusion model The statistical dependencies between two HMM process is thus,

20 Experimental Results the fused decoder is found to significantly increase recognition performance over conventional missing data decode process