Combination of Feature and Channel Compensation (1/2)

Slides:

Advertisements

Similar presentations

Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.

Advertisements

1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old.

Modulation Spectrum Factorization for Robust Speech Recognition Wen-Yi Chu 1, Jeih-weih Hung 2 and Berlin Chen 1 Presenter : 張庭豪.

Distribution-Based Feature Normalization for Robust Speech Recognition Leveraging Context and Dynamics Cues Yu-Chen Kao and Berlin Chen Presenter : 張庭豪.

AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION Michael L. Seltzer, Dong Yu Yongqiang Wang ICASSP 2013 Presenter : 張庭豪.

Advances in WP1 Nancy Meeting – 6-7 July

HIWIRE MEETING Nancy, July 6-7, 2006 José C. Segura, Ángel de la Torre.

HIWIRE MEETING Torino, March 9-10, 2006 José C. Segura, Javier Ramírez.

Single-Channel Speech Enhancement in Both White and Colored Noise Xin Lei Xiao Li Han Yan June 5, 2002.

MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, ASRU. IEEE Workshop on Author.

Signal Modeling for Robust Speech Recognition With Frequency Warping and Convex Optimization Yoon Kim March 8, 2000.

Speech Enhancement Based on a Combination of Spectral Subtraction and MMSE Log-STSA Estimator in Wavelet Domain LATSI laboratory, Department of Electronic,

HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR.

Speech Recognition in Noise

HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University.

Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.

Advances in WP1 and WP2 Paris Meeting – 11 febr

EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.

HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.

A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST

1 Robust HMM classification schemes for speaker recognition using integral decode Marie Roch Florida International University.

Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,

By Grégory Brillant Background calibration techniques for multistage pipelined ADCs with digital redundancy.

Cepstral Vector Normalization based On Stereo Data for Robust Speech Recognition Presenter: Shih-Hsiang Lin Luis Buera, Eduardo Lleida, Antonio Miguel,

Extraction of Fetal Electrocardiogram Using Adaptive Neuro-Fuzzy Inference Systems Khaled Assaleh, Senior Member,IEEE M97G0224 黃阡.

Speech Enhancement Using Spectral Subtraction

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.

REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.

Noise Compensation for Speech Recognition with Arbitrary Additive Noise Ji Ming School of Computer Science Queen’s University Belfast, Belfast BT7 1NN,

Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.

LOG-ENERGY DYNAMIC RANGE NORMALIZATON FOR ROBUST SPEECH RECOGNITION Weizhong Zhu and Douglas O’Shaughnessy INRS-EMT, University of Quebec Montreal, Quebec,

Experimental Results ■ Observations:  Overall detection accuracy increases as the length of observation window increases.  An observation window of 100.

Basics of Neural Networks Neural Network Topologies.

LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.

Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.

Yi-zhang Cai, Jeih-weih Hung 2012/08/17 報告者：汪逸婷 1.

Speech Enhancement Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimation method.

Robust Feature Extraction for Automatic Speech Recognition based on Data-driven and Physiologically-motivated Approaches Mark J. Harvilla1, Chanwoo Kim2.

Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,

Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans.

ICASSP 2006 Robustness Techniques Survey ShihHsiang 2006.

ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.

RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.

By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.

1 Voicing Features Horacio Franco, Martin Graciarena Andreas Stolcke, Dimitra Vergyri, Jing Zheng STAR Lab. SRI International.

January 2001RESPITE workshop - Martigny Multiband With Contaminated Training Data Results on AURORA 2 TCTS Faculté Polytechnique de Mons Belgium.

Survey of Robust Speech Techniques in ICASSP 2009 Shih-Hsiang Lin ( 林士翔 ) 1Survey of Robustness Techniques in ICASSP 2009.

Feature Transformation and Normalization Present by Howard Reference : Springer Handbook of Speech Processing, 3.3 Environment Robustness (J. Droppo, A.

A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.

Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.

1 LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING ON AURORA 2.0 Chia-Ping Chen, Jeff Bilmes and Katrin Kirchhoff SSLI Lab Department of Electrical Engineering.

Speech Enhancement Summer 2009

Feature Mapping FOR SPEAKER Diarization IN NOisy conditions

Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.

Speech Enhancement with Binaural Cues Derived from a Priori Codebook

3. Applications to Speaker Verification

Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing

朝陽科技大學資訊工程系謝政勳 Application of GM(1,1) Model to Speech Enhancement and Voice Activity Detection 朝陽科技大學資訊工程系謝政勳

AN ANALYSIS OF TWO COMMON REFERENCE POINTS FOR EEGS

A Tutorial on Bayesian Speech Feature Enhancement

EE513 Audio Signals and Systems

Missing feature theory

Wiener Filtering: A linear estimation of clean signal from the noisy signal Using MMSE criterion.

Speech / Non-speech Detection

Presented by Chen-Wei Liu

NON-NEGATIVE COMPONENT PARTS OF SOUND FOR CLASSIFICATION Yong-Choon Cho, Seungjin Choi, Sung-Yang Bang Wen-Yi Chu Department of Computer Science &

INTRODUCTION TO ADVANCED DIGITAL SIGNAL PROCESSING

Presenter: Shih-Hsiang(士翔)

Presentation transcript:

Combination of Feature and Channel Compensation (1/2) It’s often the case that, in addition to environmental acoustic noise, there also exists linear channel distortion which may be caused by transducer mismatch

Combination of Feature and Channel Compensation (2/2)

Cepstral-Domain Acoustic Feature Compensation Based on Decomposition of Speech and Noise for ASR in Noisy Environments Hong Kook Kim, Senior Member, IEEE, and Richard C. Rose, Senior Member, IEEE Graduate Institute of Computer Science and Information Engineering, National Taiwan Normal University, presented by Chen-Wei Liu 2005-03-31

References [1] Tracking Speech-presence uncertainty to improve speech enhancement in non-stationary noise environments D.Malah, R. V. Cox, and A. J. Accardi ICASSP, 1999 [2] Speech Enhancement Using a MMSE-LSA Estimator Y.Ephraim and D. Malah IEEE Trans, ASSP, 1985

Introduction (1/5) All techniques for HMM and feature compensation in ASR are complicated by the fact that The interaction between speech and acoustic background noise in cepstrum domain is highly nonlinear Whereas environmental noise and speech are considered to be additive in the linear spectrum domain Their interaction is considered to be more difficult to characterize in the log spectral amplitude and the cepstrrum domains Consequently, the goal of decomposing a noise corrupted speech signal into clean speech and pure noise components has always been difficult to achieve

Introduction (2/5) This paper presents an approach for cepstrum-domain feature compensation in ASR which exploits noisy speech decomposition The approach relies on a minimum mean squared error log spectral amplitude estimator (MMSE-LSA) The MMSE-LSA estimator of a clean speech magnitude spectrum is represented as A noisy speech magnitude spectrum multiplied by a frequency dependent spectral gain function that is derived from the estimate of noise spectrum, SNR, and speech absence probability. As a result, the estimated log spectrum of clean speech becomes a sum of the log spectra of noisy speech and the gain function

Introduction (3/5) By converting these log spectra into cepstra, it turns out to be that estimated noise and clean speech can be considered to be additive in the cepstrum domain The proposed approach performs frame level decomposition of noisy speech cepstrum and compensates for additive noise in the cepstrum domain Furthermore, the cepstrum decomposition technique can be extended into a low complexity robust algorithm for implementing parallel model combination (PMC)

Introduction (4/5) The PMC algorithm combines separately estimated speech and noise HMMs to obtain a single HMM model that describes the noise corrupted cepstrum observation vectors Owing to the highly nonlinear interaction of speech and noise in the cepstrum domain Traditional approaches to PMC require a great deal of computational complexity to convert to a linear spectral representation Where additive models of speech and noise can be assumed for combining speech and background model distributions

Introduction (5/5) By applying the proposed approach, corrupted HMM models can be obtained by adding the means and variances of clean HMMs and those of estimated noise HMMs While the cepstrum decomposition technique has the application to model compensation This paper will consider the technique from only a feature compensation point of view

Review of Speech Enhancement Algorithm based on MMSE-LSA (1/6) A nonlinear frequency depend gain function is applied to the spectral components of the noisy speech signal in an attempt to obtain estimates of spectral components of corresponding clean speech A modified MMSE-LSA estimation criterion with a soft-decision based modification for taking into account the speech presence is used to derive the gain function

Review of Speech Enhancement Algorithm based on MMSE-LSA (2/6)

Review of Speech Enhancement Algorithm based on MMSE-LSA (3/6) The objective of the MMSE-LSA is to find the estimator that minimizes the distortion measure for a given noisy observation spectrum The modified MMSE-LSA gives an estimate of clean speech spectrum that has the form Gain function Gain modification function

Review of Speech Enhancement Algorithm based on MMSE-LSA (4/6) Posteriori SNR Prior SNR

Review of Speech Enhancement Algorithm based on MMSE-LSA (5/6) Ratio between speech presence and absence

Review of Speech Enhancement Algorithm based on MMSE-LSA (6/6) There are no unique optimum parameter settings because these parameters are also depend on The characteristics of input noise and the efficiency of the noise psd estimation Use of this speech enhancement algorithm as a preprocessor to feature extraction will be referred to in the next section as speech enhancement based front-end (SE)

Cepstrum-Domain Feature Compensation (1/3) Cepstrum Subtraction Method The speech enhancement algorithm works by multiplying the frequency dependent gain function and noisy magnitude spectrum By applying the inverse Fourier transform to the above

Cepstrum-Domain Feature Compensation (2/3)

Cepstrum-Domain Feature Compensation (3/3) Assuming that the enhanced speech signal is an estimate of the clean speech signal, the cepstrum for clean speech is approximated as The above equation implies that the noisy speech cepstrum can be decomposed into a linear combination of the estimated clean speech cepstrum and noise cepstrum, and this is so called Cepstrum Subtraction Method (CSM)

Speech Recognition Experiments Baseline front-end Frame rate 100 Hz 512 point FFT over the windowed speech segment 24 filterbank log-magnitude 13 MFCCS with 1st and 2nd difference MFCCs Each word was modeled by 16-state 3-mixture HMM Database Aurora 2.0 and subset of Aurora 3.0 Aurora 2.0 Clean-condition training : 8440 digit strings Multi-condition training : 8440 digit strings divided into 20 groups

Clean Condition finding

Clean Condition Baseline CSM CSM reduced AVG. WER. Except Clean CSM increased WER under subway in 10db in SetA and SetC CSM

Clean Condition CSM+CMS

Multi Condition

Multi Condition Baseline CSM

Multi Condition CSM+CMS

Mismatched Transducer Condition A total of 274 context-dependent sub-word models Sub-word models contained a head-body-tail structure Head and tail models were represented with 3 states Body was represented with 4 states Each state has 8 Gaussians The recognition system 274 HMM models, 831 states, 6672 mixtures Recorded over PSTN

Mismatched Transducer Condition Dominant sources of variabilities was transducer variability Training data : vast array of transducers But testing set was not (significant mismatch) Table 5 only simulated channel mismatch

Real Adverse Environment Case Aurora 3.0 - Finnish

Real Adverse Environment Case Aurora 3.0 - Finnish

Conclusion Advantages of CSM The ability to make a soft-decision about whether a given frequency bin within an input frame corresponds to speech or noise Providing the estimates of that are updated for each analysis frame CSM gave a better performance than SE in any acoustic noise and different transducer conditions The best performance was achieved by combined CSM and CMS

Experimental Results