Subband Feature Statistics Normalization Techniques Based on a Discrete Wavelet Transform for Robust Speech Recognition Jeih-weih Hung, Member, IEEE, and.

Slides:



Advertisements
Similar presentations
Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08.
Advertisements

Modulation Spectrum Factorization for Robust Speech Recognition Wen-Yi Chu 1, Jeih-weih Hung 2 and Berlin Chen 1 Presenter : 張庭豪.
University of Ioannina - Department of Computer Science Wavelets and Multiresolution Processing (Background) Christophoros Nikou Digital.
Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented.
2004 COMP.DSP CONFERENCE Survey of Noise Reduction Techniques Maurice Givens.
Filter implementation of the Haar wavelet Multiresolution approximation in general Filter implementation of DWT Applications - Compression The Story of.
Distribution-Based Feature Normalization for Robust Speech Recognition Leveraging Context and Dynamics Cues Yu-Chen Kao and Berlin Chen Presenter : 張庭豪.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Resolution Mosaic EM Algorithm for Medical Image Segmentation Mohammed A-Megeed Salem, Beate Meffert High Performance Computing & Simulation(HPCS)2009.
Time and Frequency Representations Accompanying presentation Kenan Gençol presented in the course Signal Transformations instructed by Prof.Dr. Ömer Nezih.
HIWIRE MEETING Torino, March 9-10, 2006 José C. Segura, Javier Ramírez.
Communications & Multimedia Signal Processing Meeting 7 Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 23 November,
MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, ASRU. IEEE Workshop on Author.
Undecimated wavelet transform (Stationary Wavelet Transform)
7th IEEE Technical Exchange Meeting 2000 Hybrid Wavelet-SVD based Filtering of Noise in Harmonics By Prof. Maamar Bettayeb and Syed Faisal Ali Shah King.
7.1 背景介紹 7.2 多解析度擴展 7.3 一維小波轉換 7.4 快速小波轉換 7.5 二維小波轉換 7.6 小波封包
Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.
Speech Enhancement Based on a Combination of Spectral Subtraction and MMSE Log-STSA Estimator in Wavelet Domain LATSI laboratory, Department of Electronic,
The VLSI Design for Discrete Wavelet Transform 微電子電路系 : 宋志雲 博士.
Speech Recognition in Noise
Wavelet-based Coding And its application in JPEG2000 Monia Ghobadi CSC561 project
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
The Wavelet Tutorial: Part3 The Discrete Wavelet Transform
Details, details… Intro to Discrete Wavelet Transform The Story of Wavelets Theory and Engineering Applications.
1 Wavelets, Ridgelets, and Curvelets for Poisson Noise Removal 國立交通大學電子研究所 張瑞男
Presented by Tienwei Tsai July, 2005
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.
Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.
R. Ray and K. Chen, department of Computer Science engineering  Abstract The proposed approach is a distortion-specific blind image quality assessment.
Steganalysis of audio: attacking the Steghide
Wavelet-based Coding And its application in JPEG2000 Monia Ghobadi CSC561 final project
Experimental Results ■ Observations:  Overall detection accuracy increases as the length of observation window increases.  An observation window of 100.
Basics of Neural Networks Neural Network Topologies.
ON REAL-TIME MEAN-AND-VARIANCE NORMALIZATION OF SPEECH RECOGNITION FEATURES Pere Pujol, Dušan Macho, and Climent NadeuNational ICT TALP Research Center.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Yi-zhang Cai, Jeih-weih Hung 2012/08/17 報告者:汪逸婷 1.
ECE472/572 - Lecture 13 Wavelets and Multiresolution Processing 11/15/11 Reference: Wavelet Tutorial
1 Using Wavelets for Recognition of Cognitive Pattern Primitives Dasu Aravind Feature Group PRISM/ASU 3DK – 3DK – September 21, 2000.
Wavelets and Multiresolution Processing (Wavelet Transforms)
Perceptual Linear Predictive Analysis of Speech Hynek Hermansky, Speech Technology Laboratory, J. Acoustical Society of America, April 1990 報告 : 張志豪.
NTIT1 A chaos-based robust wavelet- domain watermarking algorithm Source: Chaos, Solitions and Fractals, Vol. 22, 2004, pp Authors: Zhao Dawei,
Robust Feature Extraction for Automatic Speech Recognition based on Data-driven and Physiologically-motivated Approaches Mark J. Harvilla1, Chanwoo Kim2.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
MINIMUM WORD CLASSIFICATION ERROR TRAINING OF HMMS FOR AUTOMATIC SPEECH RECOGNITION Yueng-Tien, Lo Speech Lab, CSIE National.
Subband Coding Jennie Abraham 07/23/2009. Overview Previously, different compression schemes were looked into – (i)Vector Quantization Scheme (ii)Differential.
Wavelet Transform Yuan F. Zheng Dept. of Electrical Engineering The Ohio State University DAGSI Lecture Note.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
The Discrete Wavelet Transform for Image Compression Speaker: Jing-De Huang Advisor: Jian-Jiun Ding Graduate Institute of Communication Engineering National.
Sub-Band Coding Multimedia Systems and Standards S2 IF Telkom University.
Multi resolution Watermarking For Digital Images Presented by: Mohammed Alnatheer Kareem Ammar Instructor: Dr. Donald Adjeroh CS591K Multimedia Systems.
By Dr. Rajeev Srivastava CSE, IIT(BHU)
Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.
Effective Variation Management for Pseudo Periodical Streams SIGMOD’07.
Adv DSP Spring-2015 Lecture#11 Spectrum Estimation Parametric Methods.
1 LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING ON AURORA 2.0 Chia-Ping Chen, Jeff Bilmes and Katrin Kirchhoff SSLI Lab Department of Electrical Engineering.
Fourier series With coefficients:.
Multiresolution Analysis (Chapter 7)
Increasing Watermarking Robustness using Turbo Codes
فصل هفتم: موجک و پردازش چند رزلوشنی
Missing feature theory
DCT-based Processing of Dynamic Features for Robust Speech Recognition Wen-Chi LIN, Hao-Teng FAN, Jeih-Weih HUNG Wen-Yi Chu Department of Computer Science.
INTRODUCTION TO THE SHORT-TIME FOURIER TRANSFORM (STFT)
Presented by Chen-Wei Liu
Lec.6:Discrete Fourier Transform and Signal Spectrum
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
An image adaptive, wavelet-based watermarking of digital images
Presentation transcript:

Subband Feature Statistics Normalization Techniques Based on a Discrete Wavelet Transform for Robust Speech Recognition Jeih-weih Hung, Member, IEEE, and Hao-Teng Fan Wen-Yi Chu Department of Computer Science & Information Engineering National Taiwan Normal University

2 Outline Introduction Subband Feature Statistics Normalization Method Experimental Setup Experimental Results And Discussions Concluding Remarks And Feature Works

Introduction This letter proposes a novel scheme that applies feature statistics normalization techniques for robust speech recognition. Partially motivated by the above observations, we propose decomposing the feature stream into subband streams and then performing the normalization process on some or all of the subband streams separately. The new feature stream is reconstructed by properly integrating all substreams. In particular, the above decomposition and reconstruction procedures are based on the well-known discrete wavelet transform (DWT). 3

Subband Feature Statistics Normalization Method(1/4) Discrete Wavelet Transform (DWT) x[n] :離散的輸入信號 g[n] : low pass filter 低通濾波器,可以將輸入信號的高頻部份濾掉而輸出低頻部份。 h[n] : high pass filter 高通濾波器,與低通濾波器相反,濾掉低頻部份而輸出高頻部份。 Q : downsampling filter 降頻濾波器,使輸出信號的頻率變成輸入信號頻率的 1/Q 。此處舉例 Q=2 。 4

Subband Feature Statistics Normalization Method(2/4) We consider the mel-scaled filter-bank cepstral coefficients (MFCC) for speech recognition. 5

Subband Feature Statistics Normalization Method(3/4) Given that the frame rate of is in Hz, and that is within the modulation spectral band, the band range of the subband stream can be approximately represented as If MVN is selected as the normalization method, then the relationship between and is If HEQ is selected as the normalization method, then the relationship between and is 6

Subband Feature Statistics Normalization Method(4/4) Finally, we reconstruct the new feature stream for the utterance from the updated subband streams together with the other unchanged streams using the -level inverse discrete wavelet transform (IDWT), as depicted on the right side of Fig. 1. In SB-MVN, the streams corresponding to different subbands have different target means and variances. A similar condition holds for SB-HEQ : the streams for different subbands employ different target distribution functions. In the proposed methods, more subbands with a narrower bandwidth are at the lower frequencies. Due to the down-sampling operation in DWT, the total number of data points of all of the subband streams is approximately equivalent to that of the original stream. 7

Experimental Setup Each feature sequence for each utterance in both the training and testing sets is decomposed into L subband streams. For each subband, the features of all of the utterances in the training set are used to estimate the required target statistics, which will be used for each utterance in the training and testing sets. The parameter L is preliminarily set to 4, which indicates that a three- level DWT is performed, and the frequency ranges for the four octave subband streams are approximately,, and, respectively. 8

Experimental Results And Discussions(1/3) The results in Fig. 2 indicate that, all the normalization methods provide significant accuracy improvement for all noise types. 9

Experimental Results And Discussions (2/3) These results are somewhat consistent with the observation in past research that the modulation frequency components between 1 Hz and 16 Hz are particularly important for speech recognition. These results imply that, given a fixed number of subbands, placing more subbands in lower frequencies is more helpful in the proposed methods. 10

Experimental Results And Discussions(3/3) 11

Concluding Remarks And Feature Works In this letter, we propose performing a normalization process on the subband feature streams and show that the subband MVN and HEQ are superior to the conventional full-band MVN and HEQ. In future works, we will integrate other normalization techniques such as HOCMN and CSN in the subband processing scheme to determine if better performance can be achieved. Besides, we will apply other types of wavelet functions in the DWT and IDWT processes of our approach to investigate if a different analysis/synthesis operation will influence the recognition accuracy. 12