Presenter: Shih-Hsiang(士翔)

Slides:



Advertisements
Similar presentations
Figures for Chapter 7 Advanced signal processing Dillon (2001) Hearing Aids.
Advertisements

Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
Speech Enhancement through Noise Reduction By Yating & Kundan.
Distribution-Based Feature Normalization for Robust Speech Recognition Leveraging Context and Dynamics Cues Yu-Chen Kao and Berlin Chen Presenter : 張庭豪.
AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION Michael L. Seltzer, Dong Yu Yongqiang Wang ICASSP 2013 Presenter : 張庭豪.
Advances in WP1 Nancy Meeting – 6-7 July
Communications & Multimedia Signal Processing Analysis of the Effects of Train noise on Recognition Rate using Formants and MFCC Esfandiar Zavarehei Department.
Single-Channel Speech Enhancement in Both White and Colored Noise Xin Lei Xiao Li Han Yan June 5, 2002.
MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, ASRU. IEEE Workshop on Author.
Speech Enhancement Based on a Combination of Spectral Subtraction and MMSE Log-STSA Estimator in Wavelet Domain LATSI laboratory, Department of Electronic,
HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR.
Speech Recognition in Noise
LORIA Irina Illina Dominique Fohr Chania Meeting May 9-10, 2007.
Advances in WP1 and WP2 Paris Meeting – 11 febr
Communications & Multimedia Signal Processing Analysis of Effects of Train/Car noise in Formant Track Estimation Qin Yan Department of Electronic and Computer.
Representing Acoustic Information
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
Cepstral Vector Normalization based On Stereo Data for Robust Speech Recognition Presenter: Shih-Hsiang Lin Luis Buera, Eduardo Lleida, Antonio Miguel,
Speech Enhancement Using Spectral Subtraction
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.
Survey of ICASSP 2013 section: feature for robust automatic speech recognition Repoter: Yi-Ting Wang 2013/06/19.
Ekapol Chuangsuwanich and James Glass MIT Computer Science and Artificial Intelligence Laboratory,Cambridge, Massachusetts 02139,USA 2012/07/2 汪逸婷.
Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.
LOG-ENERGY DYNAMIC RANGE NORMALIZATON FOR ROBUST SPEECH RECOGNITION Weizhong Zhu and Douglas O’Shaughnessy INRS-EMT, University of Quebec Montreal, Quebec,
Informing Multisource Decoding for Robust Speech Recognition Ning Ma and Phil Green Speech and Hearing Research Group The University of Sheffield 22/04/2005.
Basics of Neural Networks Neural Network Topologies.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING Nobuyuki Miyake, Tetsuya Takiguchi and Yasuo Ariki Department of Computer and System.
Yi-zhang Cai, Jeih-weih Hung 2012/08/17 報告者:汪逸婷 1.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Noise Reduction Two Stage Mel-Warped Weiner Filter Approach.
Speech Enhancement Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimation method.
Robust Feature Extraction for Automatic Speech Recognition based on Data-driven and Physiologically-motivated Approaches Mark J. Harvilla1, Chanwoo Kim2.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Performance Comparison of Speaker and Emotion Recognition
ICASSP 2006 Robustness Techniques Survey ShihHsiang 2006.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.
1 Voicing Features Horacio Franco, Martin Graciarena Andreas Stolcke, Dimitra Vergyri, Jing Zheng STAR Lab. SRI International.
January 2001RESPITE workshop - Martigny Multiband With Contaminated Training Data Results on AURORA 2 TCTS Faculté Polytechnique de Mons Belgium.
Survey of Robust Speech Techniques in ICASSP 2009 Shih-Hsiang Lin ( 林士翔 ) 1Survey of Robustness Techniques in ICASSP 2009.
Feature Transformation and Normalization Present by Howard Reference : Springer Handbook of Speech Processing, 3.3 Environment Robustness (J. Droppo, A.
UNIT-IV. Introduction Speech signal is generated from a system. Generation is via excitation of system. Speech travels through various media. Nature of.
Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.
1 LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING ON AURORA 2.0 Chia-Ping Chen, Jeff Bilmes and Katrin Kirchhoff SSLI Lab Department of Electrical Engineering.
Speech Enhancement Summer 2009
PATTERN COMPARISON TECHNIQUES
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
Digital Communications Chapter 13. Source Coding
Speech Enhancement with Binaural Cues Derived from a Priori Codebook
Dynamical Statistical Shape Priors for Level Set Based Tracking
Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing
朝陽科技大學 資訊工程系 謝政勳 Application of GM(1,1) Model to Speech Enhancement and Voice Activity Detection 朝陽科技大學 資訊工程系 謝政勳
A Tutorial on Bayesian Speech Feature Enhancement
Missing feature theory
DCT-based Processing of Dynamic Features for Robust Speech Recognition Wen-Chi LIN, Hao-Teng FAN, Jeih-Weih HUNG Wen-Yi Chu Department of Computer Science.
Survey of Robust Techniques
A maximum likelihood estimation and training on the fly approach
Wiener Filtering: A linear estimation of clean signal from the noisy signal Using MMSE criterion.
Improved Spread Spectrum: A New Modulation Technique for Robust Watermarking IEEE Trans. On Signal Processing, April 2003 Multimedia Security.
Survey of Robust Techniques
Presented by Chen-Wei Liu
Measuring the Similarity of Rhythmic Patterns
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

Presenter: Shih-Hsiang(士翔) ICASSP 2005

Abstract

Introduction Speech recognition System can be divided into two parts Front-End processing (Feature extraction) Suppress(抑制) the noise Get more robust parameters Back-End processing (HMM decoding) Compensate(補償) for noise Adapt the parameters In order to get more noise robust features, there are numerous efforts (based on MFCCs) Add pre-processing Noise reduction, speech enhancement Incorporate algorithms in an MFCC calculation framework Frequency masking, SNR-Normalization Add feature post-processing techniques Cepstral Channel Normalization, Cepstral Mean Normalization

Introduction CMN is known to be a simple noise robust post-feature processing technique Comparing with cepstral coefficients, the log-energy feature has quite different characteristics But the log-energy feature (or C0) is treated in the same way as other cepstral coefficients They propose a log-energy dynamic range Normalization (ERN) method to minimize mismatch between training and testing data

Energy Dynamic Range Normalization Leads to a mismatch between the clean and noisy speech 10 dB SNR Comparing with clean speech, the log-energy feature sequence of noisy speech are Elevated minimum value Valleys are buried by additive noise energy, while peaks are not affected as much

Energy Dynamic Range Normalization Log-energy dynamic range of the sequence Algorithm Find Max = Max(Log(Energyi) i=1..n ) Min = Min (Log(Energyi) i=1..n ) Calculate target T_Min = αx Max(Log(Energyi) i=1..n ) If Min (Log(Energyi) i=1..n ) < T_Min For i=1..n  Liner Non-Liner

Energy Dynamic Range Normalization

Experiment Evaluated on the Aurora 2 Relative improvement (R.I.) Overall accuracy Tow experiments Experiment 1:explore how good the performances are in the sense of relative improvement Experiment 2 :compare with other techniques

Experiment 1

Experiment 2

Conclusions The proposed log-energy dynamic range normalization algorithm can have overall about a 30.83% relative performance improvement It also can be combined with the cepstral mean or variance normalization techniques to achieve an even better result The proposed does not require any prior knowledge of noise and level It is difficult to use energy dynamic range normalization to deal with channel distortion Reducing mismatch in log-energy leads to a large recognition improvement

Presenter: Shih-Hsiang(士翔) ASRU 2003

Introduction Methods of robust speech recognition can be classified into two approaches Focuses on finding more robust parameters, which are minimally affected by the noise Formants and their movements (but no effective to estimate) Focused on compensate for the noise effect Error rates of automatic could decrease if noise type is known and well trained in the HMM model But there are countless different kinds of noises in real speech conditions Training-Testing mismatch always occurs when unknown noise is involved

Introduction (cont.) Human auditory properties and noise masking theory show that spectral peaks (formants) could be used successfully to discriminate speech from the noise The aim of this paper is to introduce noise reduction and spectral emphasis techniques It does not require retraining the acoustic models It utilizes gain coefficients estimated in noise reduction for spectral emphasis

Noise Reduction Spectral subtraction The basic principle is to subtract the magnitude of spectral noise from the noisy speech Assuming an additive noise model, the input noisy speech signal y(t) is expressed as time-domain frequency-domain Inverse Fourier Transform

Noise Reduction & Spectral Emphasis Thus a noisy signal does not affect the speech signal uniformly over the entire spectrum Multi-band or non-linear spectral subtraction is better Over-subtraction factor is frequency dependent In this paper, gain function is computed as a function of the instantaneous estimated SNRs at each frequency band on a frame-by-frame basis r : forgetting factor Noise estimate b,c : flooring factors a : scale factor gain function Spectral emphasis m : weight factor

Spectral emphasis G(t,f) is an SNR related value Higher G(t,f) value at the spectral peaks (formants) than that of spectral valleys (buried by noise) In order to compensate spectral mismatch between clean and noisy speech, some methods have been proposed peak-to-vally ratio locking, SNR normalization (manipulate values of the spectral valleys) But this paper is to emphasize values of spectral peaks It is a trade-off to select a proper weight factor suggest to use a larger factor to emphasize the formants to discriminate them from noise

Voice Activity Detector Non-speech parts will cause insertion error deleting non-speech frames will improve the performance The estimated gain function can be a good detector when a frame with a flooring value in all frequency bands is set to be a non-speech frame Leave 10-frame non-speech segment before/after each speech segment, and delete any other non-speech frame

Experiment Evaluated on the Aurora 2 Four experiments Noise reduction Total relative performance improvement of 37.76% can be obtained by the proposed noise reduction method only Spectral emphasis A more aggressive factor can cause a bigger gain for noisy speech, but it is also harmful to clean speech Voice activity detector It works well especially in low SNR condition (5% improvement) Noise reduction + Spectral emphasis + Voice activity detector

Recognition results – Noise Reduction From 38.61% to 67.02%

Recognition results – Spectral Emphasis

Recognition results

Recognition results

Conclusions Noise reduction and spectral emphasis techniques are used to improve ASR performance in noisy conditions The proposed algorithms can easily be embedded in a standard front-end MFCC calculation program With a low computational load and for real-time operation It works well for all 8 types of test noises as well as noise plus channel distortion