MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, 2007. ASRU. IEEE Workshop on Author.

Slides:



Advertisements
Similar presentations
Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08.
Advertisements

Modulation Spectrum Factorization for Robust Speech Recognition Wen-Yi Chu 1, Jeih-weih Hung 2 and Berlin Chen 1 Presenter : 張庭豪.
Advanced Speech Enhancement in Noisy Environments
Characterizing Non- Gaussianities or How to tell a Dog from an Elephant Jesús Pando DePaul University.
Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented.
Introduction The aim the project is to analyse non real time EEG (Electroencephalogram) signal using different mathematical models in Matlab to predict.
統計圖等化法於雜訊語音辨識之進一步研究 An Improved Histogram Equalization Approach for Robust Speech Recognition 2012/05/22 報告人:汪逸婷 林士翔、葉耀明、陳柏琳 Department of Computer Science.
Histogram-based Quantization for Distributed / Robust Speech Recognition Chia-yu Wan, Lin-shan Lee College of EECS, National Taiwan University, R. O. C.
Distribution-Based Feature Normalization for Robust Speech Recognition Leveraging Context and Dynamics Cues Yu-Chen Kao and Berlin Chen Presenter : 張庭豪.
An Energy Search Approach to Variable Frame Rate Front-End Processing for Robust ASR Julien Epps and Eric H. C. Choi National ICT Australia Presenter:
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Survey of INTERSPEECH 2013 Reporter: Yi-Ting Wang 2013/09/10.
Advances in WP1 Nancy Meeting – 6-7 July
HIWIRE MEETING Torino, March 9-10, 2006 José C. Segura, Javier Ramírez.
Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.
HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR.
Communications & Multimedia Signal Processing Formant Track Restoration in Train Noisy Speech Qin Yan Communication & Multimedia Signal Processing Group.
Speech Recognition in Noise
Advances in WP1 and WP2 Paris Meeting – 11 febr
Handwritten Thai Character Recognition Using Fourier Descriptors and Robust C-Prototype Olarik Surinta Supot Nitsuwat.
Communications & Multimedia Signal Processing Analysis of Effects of Train/Car noise in Formant Track Estimation Qin Yan Department of Electronic and Computer.
(1) A probability model respecting those covariance observations: Gaussian Maximum entropy probability distribution for a given covariance observation.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
Topics covered in this chapter
1 The Fourier Series for Discrete- Time Signals Suppose that we are given a periodic sequence with period N. The Fourier series representation for x[n]
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.
Noise Compensation for Speech Recognition with Arbitrary Additive Noise Ji Ming School of Computer Science Queen’s University Belfast, Belfast BT7 1NN,
Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.
LOG-ENERGY DYNAMIC RANGE NORMALIZATON FOR ROBUST SPEECH RECOGNITION Weizhong Zhu and Douglas O’Shaughnessy INRS-EMT, University of Quebec Montreal, Quebec,
R. Ray and K. Chen, department of Computer Science engineering  Abstract The proposed approach is a distortion-specific blind image quality assessment.
Experimental Results ■ Observations:  Overall detection accuracy increases as the length of observation window increases.  An observation window of 100.
1 Multiple Classifier Based on Fuzzy C-Means for a Flower Image Retrieval Keita Fukuda, Tetsuya Takiguchi, Yasuo Ariki Graduate School of Engineering,
ICASSP Speech Discrimination Based on Multiscale Spectro–Temporal Modulations Nima Mesgarani, Shihab Shamma, University of Maryland Malcolm Slaney.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Yi-zhang Cai, Jeih-weih Hung 2012/08/17 報告者:汪逸婷 1.
Round-Robin Discrimination Model for Reranking ASR Hypotheses Takanobu Oba, Takaaki Hori, Atsushi Nakamura INTERSPEECH 2010 Min-Hsuan Lai Department of.
Hsin-Ju Hsieh 謝欣汝, Wen-hsiang Tu 杜文祥, Jeih-weih Hung 洪志偉 暨南國際大學電機工程學系 報告者:汪逸婷 2012/03/20.
Quantile Based Histogram Equalization for Noise Robust Speech Recognition von Diplom-Physiker Florian Erich Hilger aus Bonn - Bad Godesberg Berichter:
Robust Feature Extraction for Automatic Speech Recognition based on Data-driven and Physiologically-motivated Approaches Mark J. Harvilla1, Chanwoo Kim2.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Reading Report: A unified approach for assessing agreement for continuous and categorical data Yingdong Feng.
Subband Feature Statistics Normalization Techniques Based on a Discrete Wavelet Transform for Robust Speech Recognition Jeih-weih Hung, Member, IEEE, and.
ICASSP 2006 Robustness Techniques Survey ShihHsiang 2006.
A New Approach to Utterance Verification Based on Neighborhood Information in Model Space Author :Hui Jiang, Chin-Hui Lee Reporter : 陳燦輝.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Content Based Color Image Retrieval vi Wavelet Transformations Information Retrieval Class Presentation May 2, 2012 Author: Mrs. Y.M. Latha Presenter:
January 2001RESPITE workshop - Martigny Multiband With Contaminated Training Data Results on AURORA 2 TCTS Faculté Polytechnique de Mons Belgium.
Feature Transformation and Normalization Present by Howard Reference : Springer Handbook of Speech Processing, 3.3 Environment Robustness (J. Droppo, A.
A Tutorial on Speaker Verification First A. Author, Second B. Author, and Third C. Author.
1 LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING ON AURORA 2.0 Chia-Ping Chen, Jeff Bilmes and Katrin Kirchhoff SSLI Lab Department of Electrical Engineering.
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
Speech Enhancement with Binaural Cues Derived from a Priori Codebook
Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing
A Tutorial on Bayesian Speech Feature Enhancement
Missing feature theory
DCT-based Processing of Dynamic Features for Robust Speech Recognition Wen-Chi LIN, Hao-Teng FAN, Jeih-Weih HUNG Wen-Yi Chu Department of Computer Science.
Survey of Robust Techniques
Source: Pattern Recognition Letters, Article In Press, 2007
Survey of Robust Techniques
Presented by Chen-Wei Liu
Presenter: Shih-Hsiang(士翔)
Measuring the Similarity of Rhythmic Patterns
Combination of Feature and Channel Compensation (1/2)
Da-Rong Liu, Kuan-Yu Chen, Hung-Yi Lee, Lin-shan Lee
Presentation transcript:

MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, ASRU. IEEE Workshop on Author : Liang-che Sun, Chang-wen Hsu, and Lin-shan Lee Professor : 陳嘉平 Reporter : 楊治鏞

Introduction The performance of speech recognition systems is very often degraded due to the mismatch between the acoustic conditions of the training and testing environments. In this paper, we propose a new approach for modulation spectrum equalization in which the modulation spectra of noisy speech utterances are equalized to those of clean speech.

Introduction The first is to equalize the cumulative density functions (CDFs) of the modulation spectra of clean and noisy speech, such that the differences between them are reduced. The second is to equalize the magnitude ratio of lower to higher components in the modulation spectrum.

Modulation spectrum (1/2) Given a sequence of feature vectors for an utterance, each including D feature parameters, where n is the time index, and is the parameter index.

Modulation spectrum (2/2) The modulation spectrum of the -th time trajectory can be obtained by applying discrete Fourier transform:

Spectral Histogram Equalization We first calculate the cumulative distribution function (CDF) of the magnitudes of the modulation spectra,, for all utterances in the clean training data of AURORA 2 to be used as the reference CDF,. For any test utterance, the CDF for its modulation spectrum magnitude,, can be similarly obtained as.

Spectral Histogram Equalization Hence the equalized magnitude of modulation spectrum is

Magnitude Ratio Equalization We first define a magnitude ratio (MR) for lower to higher frequency components for each parameter index as follows: where is the cut-off frequency used here, is the order of the discrete Fourier transform.

Magnitude Ratio We can observe from this figure that the mean value of is degraded when SNR is degraded, and thus is highly correlated with SNR. It is therefore reasonable to equalize the value of for a noisy utterance to a reference value obtained from clean training data.

Magnitude Ratio Equalization We first calculate the average of for all utterances in the clean training data of AURORA 2 as the reference value. We then calculate the value of for each test utterance as.

Magnitude Ratio Equalization We equalize the magnitude of the modulation spectrum for the test utterance as where is the weighted-power for the scaling factor.

EXPERIMENTAL SETUP AURORA 2 The speech features were extracted by the AURORA WI007 front-end. Figure 4

Integration of MRE with Other Feature Normalization Techniques We only consider MRE here because the additional improvements obtainable with SHE+MRE as shown in Table 1 were found to be limited, and indeed involved much higher computational costs.