Subproject II: Robustness in Speech Recognition. Members (1/2) Hsiao-Chuan Wang (PI) National Tsing Hua University Jeih-Weih Hung (Co-PI) National Chi.

Slides:



Advertisements
Similar presentations
Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08.
Advertisements

Noise & Data Reduction. Paired Sample t Test Data Transformation - Overview From Covariance Matrix to PCA and Dimension Reduction Fourier Analysis - Spectrum.
Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
Modulation Spectrum Factorization for Robust Speech Recognition Wen-Yi Chu 1, Jeih-weih Hung 2 and Berlin Chen 1 Presenter : 張庭豪.
1 Bayesian Adaptation in HMM Training and Decoding Using a Mixture of Feature Transforms Stavros Tsakalidis and Spyros Matsoukas.
Microphone Array Post-filter based on Spatially- Correlated Noise Measurements for Distant Speech Recognition Kenichi Kumatani, Disney Research, Pittsburgh.
Data preprocessing before classification In Kennedy et al.: “Solving data mining problems”
Histogram-based Quantization for Distributed / Robust Speech Recognition Chia-yu Wan, Lin-shan Lee College of EECS, National Taiwan University, R. O. C.
Distribution-Based Feature Normalization for Robust Speech Recognition Leveraging Context and Dynamics Cues Yu-Chen Kao and Berlin Chen Presenter : 張庭豪.
Advances in WP1 Turin Meeting – 9-10 March
Signal Processing Institute Swiss Federal Institute of Technology, Lausanne 1 Feature selection for audio-visual speech recognition Mihai Gurban.
Speaker Adaptation in Sphinx 3.x and CALO David Huggins-Daines
HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR.
Advances in WP1 and WP2 Paris Meeting – 11 febr
Why is ASR Hard? Natural speech is continuous
Major Tasks in Data Preprocessing(Ref Chap 3) By Prof. Muhammad Amir Alam.
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 30, (2014) BERLIN CHEN, YI-WEN CHEN, KUAN-YU CHEN, HSIN-MIN WANG2 AND KUEN-TYNG YU Department of Computer.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Introduction to Automatic Speech Recognition
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Example Clustered Transformations MAP Adaptation Resources: ECE 7000:
1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu.
Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition Bing Zhang and Spyros Matsoukas BBN Technologies Present.
Next Generation Speech Science and Technologies - A Cross-Country Joint Project for Collaboration between Speech Research Labs in Taiwan and in Japan Lin-shan.
ECE 8443 – Pattern Recognition LECTURE 03: GAUSSIAN CLASSIFIERS Objectives: Normal Distributions Whitening Transformations Linear Discriminants Resources.
By: Meghal Bhatt.  Sphinx4 is a state of the art speaker independent, continuous speech recognition system written entirely in java programming language.
11.0 Robustness for Acoustic Environment References: , 10.6 of Huang 2. “Robust Speech Recognition in Additive and Convolutional Noise Using Parallel.
15.0 Robustness for Acoustic Environment References: , 10.6 of Huang 2. “Robust Speech Recognition in Additive and Convolutional Noise Using Parallel.
17.0 Distributed Speech Recognition and Wireless Environment References: 1. “Quantization of Cepstral Parameters for Speech Recognition over the World.
Survey of ICASSP 2013 section: feature for robust automatic speech recognition Repoter: Yi-Ting Wang 2013/06/19.
1 Improved Speaker Adaptation Using Speaker Dependent Feature Projections Spyros Matsoukas and Richard Schwartz Sep. 5, 2003 Martigny, Switzerland.
LOG-ENERGY DYNAMIC RANGE NORMALIZATON FOR ROBUST SPEECH RECOGNITION Weizhong Zhu and Douglas O’Shaughnessy INRS-EMT, University of Quebec Montreal, Quebec,
Yun-Nung (Vivian) Chen, Yu Huang, Sheng-Yi Kong, Lin-Shan Lee National Taiwan University, Taiwan.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Definitions Random Signal Analysis (Review) Discrete Random Signals Random.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Yi-zhang Cai, Jeih-weih Hung 2012/08/17 報告者:汪逸婷 1.
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
Summary  Extractive speech summarization aims to automatically select an indicative set of sentences from a spoken document to concisely represent the.
Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments 張智星
In-car Speech Recognition Using Distributed Microphones Tetsuya Shinde Kazuya Takeda Fumitada Itakura Center for Integrated Acoustic Information Research.
Yuya Akita , Tatsuya Kawahara
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
Latent Topic Modeling of Word Vicinity Information for Speech Recognition Kuan-Yu Chen, Hsuan-Sheng Chiu, Berlin Chen ICASSP 2010 Hao-Chin Chang Department.
Performance Comparison of Speaker and Emotion Recognition
ICASSP 2006 Robustness Techniques Survey ShihHsiang 2006.
ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
1 Voicing Features Horacio Franco, Martin Graciarena Andreas Stolcke, Dimitra Vergyri, Jing Zheng STAR Lab. SRI International.
Spoken Language Group Chinese Information Processing Lab. Institute of Information Science Academia Sinica, Taipei, Taiwan
Survey of Robust Speech Techniques in ICASSP 2009 Shih-Hsiang Lin ( 林士翔 ) 1Survey of Robustness Techniques in ICASSP 2009.
A Hybrid Model of HMM and RBFN Model of Speech Recognition 길이만, 김수연, 김성호, 원윤정, 윤아림 한국과학기술원 응용수학전공.
APPLICATIONS OF DIRICHLET PROCESS MIXTURES TO SPEAKER ADAPTATION Amir Harati and Joseph PiconeMarc Sobel Institute for Signal and Information Processing,
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
1 LOW-RESOURCE NOISE-ROBUST FEATURE POST-PROCESSING ON AURORA 2.0 Chia-Ping Chen, Jeff Bilmes and Katrin Kirchhoff SSLI Lab Department of Electrical Engineering.
Speech Enhancement with Binaural Cues Derived from a Priori Codebook
A Tutorial on Bayesian Speech Feature Enhancement
Missing feature theory
Wiener Filtering: A linear estimation of clean signal from the noisy signal Using MMSE criterion.
Speech / Non-speech Detection
Presented by Chen-Wei Liu
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Speech Enhancement Based on Nonparametric Factor Analysis
Presentation transcript:

Subproject II: Robustness in Speech Recognition

Members (1/2) Hsiao-Chuan Wang (PI) National Tsing Hua University Jeih-Weih Hung (Co-PI) National Chi Nan University Sin-Horng Chen National Chiao Tung University Jen-Tzung Chien (Co-PI) National Cheng Kung University Lin-shan Lee National Taiwan University Hsin-min Wang Academia Sinica

Yih-Ru Wang National Chiao Tung University Yuan-Fu Liao National Taipei University of Technology Berlin Chen National Taiwan Normal University Members (2/2)

Lexical level Signal Level Model Level Signal Processing Feature Extraction & Transformation Speech Decoding including Word Graph Rescoring Adaptive Language Models Adaptive Pronunciation Lexicon Adaptive HMM Models Output Recognition Results Input speech Research Theme

Research Roadmap Current Achievements   Future Directions & Applications Speech enhancement & wavelet processing Speech recognition in different adverse environments, e.g. car, home, etc. Microphone array and noise cancellation approaches Robust broadcast news transcription Lecture speech recognition Spontaneous speech recognition Next generation automatic speech recognition Powerful machine learning approaches for complicated robustness problems Cepstral moment normalization & temporal filtering & Discriminative adaptation for acoustic and linguistic models Maximum entropy modeling & data mining algorithm Robust language modeling

Speech Enhancement –Harmonic retaining, perceptual factor analysis, etc. Robust Feature Representation –Higher-order cepstral moment normalization, data-driven temporal filtering, etc. Microphone Array Processing –Microphone array with post-filtering, etc. Missing-Feature Approach –Sub-space missing feature imputation and environment sniffing, mismatch-aware stochastic matching, etc. Signal Level Approaches

Higher-Order Cepstral Moment Normalization (HOCMN) (1/3) Cepstral Feature Normalization Widely Used –CMS: normalizing the first moment –CMVN: normalizing the first and second moments –HEQ: normalizing the full distribution (all order moments) –How about normalizing a few higher order moments only? –Disturbances of larger magnitudes may be the major sources of recognition errors, which are better reflected in higher order moments

Higher-Order Cepstral Moment Normalization (HOCMN) (2/3) Experimental results : Aurora 2, clean condition training, word accuracy averaged over 0~20dB and all types of noise (sets A,B,C) (a) (b) CMVN CMVN (L=86) (1 st and N-th moments normalized)

Higher-Order Cepstral Moment Normalization (HOCMN) (3/3) Experimental Results : Aurora 2, clean condition training, word accuracy averaged over 0~20dB for each type of noise condition Set BSet ASet C HOCMN is significantly better than CMVN for all types of noise HOCMN is better than HEQ in most types of noise except for the “Subway” and “Street” noise

 Developed filters were performed on the temporal domain of the original features  These filters can be derived in a data-driven manner according to the criteria of PCA/LDA/MCE  They can be integrated with Cepstral mean and variance normalization (CMVN) to achieve further performance Data-Driven Temporal Filtering

Microphone Array Processing (1/3) Integrated with Model Level Approaches (MLLR) Using Time Domain Coherence Measure (TDCM)

Microphone Array Processing (2/3) Further Improved with Wiener Filtering and Spectral Weighting Function (SWF)

Microphone Array Processing (3/3) Applications for In-Car Speech Recognition –Power Spectral Coherence Measure (PSCM) used to estimate the time delay Physical configuration Configuration in car

Model Level Approaches Improved Parallel Model Combination Bayesian Learning of Speech Duration Models Aggregate a Posteriori Linear Regression Adaptation

Aggregate a Posteriori Linear Regression (AAPLR) (1/3) Discriminative Linear Regression Adaptation Prior Density of Regression Matrix is Incorporated to Construct Bayesian Learning Capabilities Closed-form Solution Obtained for Rapid Adaptation Prior information of regression matrix Discriminative criterion AAPLR Bayesian Learning Closed form solution

Aggregate a Posteriori Linear Regression (AAPLR) (2/3) MAPLR AAPLR Discriminative Training ─ aggregated over all model classes m with probabilities P m

Estimation CriterionDiscriminative adaptation Bayesian learning Closed- form solution MLMAPMCEMMIAAP MLLR ○ No Yes MAPLR ○ NoYes MCELR ○ YesNo CMLLR ○○ YesNoYes AAPLR ○○ Yes Comparison with Other Approaches Aggregate a Posteriori Linear Regression (AAPLR) (3/3)

Lexical Level Approaches Pronunciation Modeling for Spontaneous Mandarin Speech Language Model Adaptation –Latent Semantic Analysis and Smoothing –Maximum Entropy Principle Association Pattern Language Model

Pronunciation Modeling for Spontaneous Mandarin Speech Automatically Constructing Multiple-pronunciation Lexicon using a Three-stage Framework to Reduce Confusion Introduced by the Added Pronunciations Automatically generating possible surface forms but avoiding confusion across different words Ranking the pronunciations to avoid confusion across different words Keeping only the necessary pronunciations to avoid confusion across different words

Association Pattern Language Model (1/5) N-grams Consider only Local Relations Trigger pairs Consider Long-distance Relations, but only for Two Associated Words Word Associations Can Be Expanded for More than Two Distant Words A New Algorithm to Discover Association Patterns via Data Mining Techniques

Bigram & Trigram Trigger Pairs Association Pattern Language Model (2/5)

Association Patterns Association Pattern Language Model (3/5)

Association Pattern Mining Procedure Association Pattern Language Model (4/5)

Association Pattern Language Model (5/5) Association Pattern Set  AS Covering Different Association Steps Constructed Merge Mutual Information of All Association Patterns Association Pattern n-gram Estimated

Future Directions Robustness in Detecting Speech Attributes and Events –Detection-based processing for Next Generation Automatic Speech Recognition –Robustness in sequential hypothesis test for acoustic and linguistic detectors Beyond Current Robustness Approaches –Maximum entropy framework is useful for building robust speech and linguistic models –Develop new machine learning approaches, e.g. ICA, LDA, etc, for speech technologies –Build powerful technologies to handle complicated robustness problem Application of Robustness Techniques in Spontaneous Speech Recognition –Robustness issue is ubiquitous in speech areas –Towards robustness in different levels –Robustness in establishing applications and systems