HIWIRE MEETING Nancy, July 6-7, 2006 José C. Segura, Ángel de la Torre.

Slides:



Advertisements
Similar presentations
1 MAXENT 2007 R. F. Astudillo, D. Kolossa and R. Orglmeister.
Advertisements

Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
Advances in WP1 Trento Meeting January
© Fraunhofer FKIE Corinna Harwardt Automatic Speaker Recognition in Military Environment.
1 A Spectral-Temporal Method for Pitch Tracking Stephen A. Zahorian*, Princy Dikshit, Hongbing Hu* Department of Electrical and Computer Engineering Old.
Pitch Prediction From MFCC Vectors for Speech Reconstruction Xu shao and Ben Milner School of Computing Sciences, University of East Anglia, UK Presented.
統計圖等化法於雜訊語音辨識之進一步研究 An Improved Histogram Equalization Approach for Robust Speech Recognition 2012/05/22 報告人:汪逸婷 林士翔、葉耀明、陳柏琳 Department of Computer Science.
Histogram-based Quantization for Distributed / Robust Speech Recognition Chia-yu Wan, Lin-shan Lee College of EECS, National Taiwan University, R. O. C.
Distribution-Based Feature Normalization for Robust Speech Recognition Leveraging Context and Dynamics Cues Yu-Chen Kao and Berlin Chen Presenter : 張庭豪.
HIWIRE MEETING Paris, February 11, 2005 JOSÉ C. SEGURA LUNA GSTC UGR.
ECE 472/572 - Digital Image Processing Lecture 8 - Image Restoration – Linear, Position-Invariant Degradations 10/10/11.
Advances in WP1 Turin Meeting – 9-10 March
Model-Based Fusion of Bone and Air Sensors for Speech Enhancement and Robust Speech Recognition John Hershey, Trausti Kristjansson, Zhengyou Zhang, Alex.
Advances in WP1 Nancy Meeting – 6-7 July
HIWIRE MEETING Torino, March 9-10, 2006 José C. Segura, Javier Ramírez.
HIWIRE MEETING Chania, May 10-11, 2007 José C. Segura.
Signal Modeling for Robust Speech Recognition With Frequency Warping and Convex Optimization Yoon Kim March 8, 2000.
HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR.
Communications & Multimedia Signal Processing 1 Speech Communication for Mobile and Hands-Free Devices in Noisy Environments EPSRC Project GR/S30238/01.
HIWIRE Progress Report Trento, January 2007 Presenter: Prof. Alex Potamianos Technical University of Crete Presenter: Prof. Alex Potamianos Technical University.
Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.
LORIA Irina Illina Dominique Fohr Chania Meeting May 9-10, 2007.
Advances in WP1 and WP2 Paris Meeting – 11 febr
HIWIRE MEETING Trento, January 11-12, 2007 José C. Segura, Javier Ramírez.
HIWIRE Progress Report – July 2006 Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Alex Potamianos Technical University.
Advances in WP1 Chania Meeting – May
HIWIRE meeting ITC-irst Activity report Marco Matassoni, Piergiorgio Svaizer March Torino.
HIWIRE MEETING Athens, November 3-4, 2005 José C. Segura, Ángel de la Torre.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Example Clustered Transformations MAP Adaptation Resources: ECE 7000:
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
Cepstral Vector Normalization based On Stereo Data for Robust Speech Recognition Presenter: Shih-Hsiang Lin Luis Buera, Eduardo Lleida, Antonio Miguel,
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.
Survey of ICASSP 2013 section: feature for robust automatic speech recognition Repoter: Yi-Ting Wang 2013/06/19.
Compensating speaker-to-microphone playback system for robust speech recognition So-Young Jeong and Soo-Young Lee Brain Science Research Center and Department.
Ekapol Chuangsuwanich and James Glass MIT Computer Science and Artificial Intelligence Laboratory,Cambridge, Massachusetts 02139,USA 2012/07/2 汪逸婷.
Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.
LOG-ENERGY DYNAMIC RANGE NORMALIZATON FOR ROBUST SPEECH RECOGNITION Weizhong Zhu and Douglas O’Shaughnessy INRS-EMT, University of Quebec Montreal, Quebec,
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
Basics of Neural Networks Neural Network Topologies.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
ICASSP Speech Discrimination Based on Multiscale Spectro–Temporal Modulations Nima Mesgarani, Shihab Shamma, University of Maryland Malcolm Slaney.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.
Hsin-Ju Hsieh 謝欣汝, Wen-hsiang Tu 杜文祥, Jeih-weih Hung 洪志偉 暨南國際大學電機工程學系 報告者:汪逸婷 2012/03/20.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
ICASSP 2006 Robustness Techniques Survey ShihHsiang 2006.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Normal Equations The Orthogonality Principle Solution of the Normal Equations.
Multi-target Detection in Sensor Networks Xiaoling Wang ECE691, Fall 2003.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Speaker Verification Using Adapted GMM Presented by CWJ 2000/8/16.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,
Exploring the Use of Speech Features and Their Corresponding Distribution Characteristics for Robust Speech Recognition Shih-Hsiang Lin, Berlin Chen, Yao-Ming.
Feature Transformation and Normalization Present by Howard Reference : Springer Handbook of Speech Processing, 3.3 Environment Robustness (J. Droppo, A.
Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.
1 LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING ON AURORA 2.0 Chia-Ping Chen, Jeff Bilmes and Katrin Kirchhoff SSLI Lab Department of Electrical Engineering.
Speech Enhancement Summer 2009
PATTERN COMPARISON TECHNIQUES
Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing
A Tutorial on Bayesian Speech Feature Enhancement
Missing feature theory
Speech / Non-speech Detection
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

HIWIRE MEETING Nancy, July 6-7, 2006 José C. Segura, Ángel de la Torre

2 HIWIRE Meeting – Nancy, 6 -7 June, 2006 Schedule  Non-linear feature normalization for mobile platform  Integration scheme  Results and discussion  Rapid speaker adaptation  Combination of adaptation at signal level and acoustic model level  Results and discussion  Assessment of two non-linear techniques for feature normalization  Non-linear parametric equalization  Model based feature compensation (VTS)  New improvements in robust VAD  Model based VAD

HIWIRE MEETING Nancy, July 6-7, 2006 José C. Segura, Ángel de la Torre

4 HIWIRE Meeting – Nancy, 6 -7 June, 2006 Schedule  Non-linear feature normalization for mobile platform  Integration scheme  Results and discussion  Rapid speaker adaptation  Combination of adaptation at signal level and acoustic model level  Results and discussion  Assessment of two non-linear techniques for feature normalization  Non-linear parametric equalization  Model based feature compensation (VTS)  New improvements in robust VAD  Model based VAD

5 HIWIRE Meeting – Nancy, 6 -7 June, 2006 Non-linear Parametric Equalization  Feature normalization  Motivation of PEQ:  Limitation of linear methods:  Cepstral Mean Normalization  Cepstral Mean and Variance Normalization  Limitation of non-linear methods (HEQ, OSEQ):  Speech/non-speech ratio  Estimation problems  Parametric Equalization PEQ:  Two Gaussian Model (speech / non-speech)  Training of clean Gaussians; estimation of noisy Gaussians  Non-linear transformation: combination of two linear transformations (one for speech, one for non-speech)

6 HIWIRE Meeting – Nancy, 6 -7 June, 2006 Non-linear Parametric Equalization  Aurora-2 results: Aver. WERRelative improv. BASELINE34.1 %0.0 % OSEQ17.5 %48.6 % PEQ18.6 %45.3 %  Aurora-4 results: Aver. WERRelative improv. BASELINE45.6 %0.0 % OSEQ37.5 %17.8 % PEQ31.5 %30.1 %

7 HIWIRE Meeting – Nancy, 6 -7 June, 2006 Non-linear Parametric Equalization  Additional problem of non-linear transformations:  Once the transformation is estimated, it is an “instantaneous transformation”  Temporal correlations are not exploited  Temporal Smoothing (TES):  Each equalized cepstrum is time-filtered with an ARMA filter that restores autocorrelation of clean data

8 HIWIRE Meeting – Nancy, 6 -7 June, 2006 Non-linear Parametric Equalization  Aurora-2 results: Aver. WERImprov.Aver. WERImprov. BASELINE34.1 %0.0 %31.6 %6.5 % OSEQ17.5 %48.6 %15.5 %54.3 % PEQ18.6 %45.3 %---  Aurora-4 results: TES Aver. WERImprov.Aver. WERImprov. BASELINE45.6 %0.0 %43.4 %4.9 % OSEQ37.5 %17.8 %35.5 %22.2 % PEQ31.5 %30.1 %30.7 %32.6 % TES

9 HIWIRE Meeting – Nancy, 6 -7 June, 2006 Model Based Feature Compensation (VTS)  VTS feature normalization:  Performed in log-FBE domain, (previous to DCT)  Based on a Gaussian mixture model trained with clean speech  Allows feature compensation and uncertainty estimation  Summary of VTS (vector Taylor series approach): 1. Given the noisy conditions, VTS provides a noisy Gaussian from each clean Gaussian 2. The noisy Gaussian mixture model allow the computation of the probabilities P(k|y) 3. An estimation of the clean speech x is then possible 4. An estimation of the uncertainty is also possible

10 HIWIRE Meeting – Nancy, 6 -7 June, 2006 Model Based Feature Compensation (VTS)  Step 1: Estimation of a noisy Gaussian from a clean Gaussian: where the function g 0, f 0 and h 0 are evaluated at the mean of the clean Gaussian and at the mean of the noise:

11 HIWIRE Meeting – Nancy, 6 -7 June, 2006 Model Based Feature Compensation (VTS)  Step 2: Estimation of P(k|y): is the k-th Gaussian evaluated at the noisy speech y, and P(k) is the a-priori probability of the Gaussian. where:  Step 3: Estimation of clean speech:

12 HIWIRE Meeting – Nancy, 6 -7 June, 2006 Model Based Feature Compensation (VTS)  Step 4: Estimation of uncertainty: the uncertainty of the clean speech can be estimated as: and from the estimation of the clean speech: assuming small values of the variance of the noise:

13 HIWIRE Meeting – Nancy, 6 -7 June, 2006  Aurora-2 results: Aver. WERRelative improv. BASELINE34.1 %0.0 % VTS + MVN14.0 %58.9 % VTS + MVN + UNCERT.13.5 %60.0 % Model Based Feature Compensation (VTS)  Some considerations about VTS:  Computational load  Better than HEQ, PEQ, etc., but only valid for additive noise or channel distortion  Estimation of noise is critical  There are some approximations in the formulation  Uncertainty: small improvement (insert., substit., delet.)  Alternative: model-based compensation based on numerical integration of pdfs

14 HIWIRE Meeting – Nancy, 6 -7 June, 2006 Schedule  Non-linear feature normalization for mobile platform  Integration scheme  Results and discussion  Rapid speaker adaptation  Combination of adaptation at signal level and acoustic model level  Results and discussion  Assessment of two non-linear techniques for feature normalization  Non-linear parametric equalization  Model based feature compensation (VTS)  New improvements in robust VAD  Model based VAD

15 HIWIRE Meeting – Nancy, 6 -7 June, 2006 Model-based VAD  Fundamentals of model-based VAD:  Gaussian mixture model in log-FBE domain  Gaussian mixture model trained with clean speech  VTS provides a noisy version of the GMM  From the noisy GMM, P(k|y) can be estimated for each observation y and each Gaussian k  A-priori probability of k th Gaussian being speech P(V|k) can be estimated from the training data  Then, the probability P(V|y) of the noisy observation y being speech is given by:

16 HIWIRE Meeting – Nancy, 6 -7 June, 2006 Model-based VAD  Some considerations about model-based VAD:  VAD decision relies on a Gaussian mixture model trained with clean speech (based on speech events observed in the training database)  Not based on energy....  Based on observations in the log-FBE domain  VTS adapts the Gaussian mixture to noisy conditions: the performance of the VAD is expected to be stable for a wide range of SNRs  Computational load

17 HIWIRE Meeting – Nancy, 6 -7 June, 2006 Model-based VAD  Model-based VAD for different SNRs:

18 HIWIRE Meeting – Nancy, 6 -7 June, 2006 Model-based VAD Comparison with other VADs: HR1 and HR0 evaluated for AURORA-2

19 HIWIRE Meeting – Nancy, 6 -7 June, 2006 Model-based VAD Comparison with other VADs: HR1 and HR0 evaluated for AURORA-2

20 HIWIRE Meeting – Nancy, 6 -7 June, 2006  Aurora-2 recognition results (WAcc): Model-based VAD WFWF+FD G %57.8 % AMR %65.0 % AMR %78.5 % AFE75.3 %79.0 % VTS-VAD78.4 %80.2 % Baseline: 60.5 % (no VAD, no WF, no FD)

HIWIRE MEETING Nancy, July 6-7, 2006 José C. Segura, Ángel de la Torre