A New Approach to Utterance Verification Based on Neighborhood Information in Model Space Author :Hui Jiang, Chin-Hui Lee Reporter : 陳燦輝.

Slides:

Advertisements

Similar presentations

Discriminative Training in Speech Processing Filipp Korkmazsky LORIA.

Advertisements

Speech Recognition with Hidden Markov Models Winter 2011

Linear Model Incorporating Feature Ranking for Chinese Documents Readability Gang Sun, Zhiwei Jiang, Qing Gu and Daoxu Chen State Key Laboratory for Novel.

Lattices Segmentation and Minimum Bayes Risk Discriminative Training for Large Vocabulary Continuous Speech Recognition Vlasios Doumpiotis, William Byrne.

Confidence Measures for Speech Recognition Reza Sadraei.

Present by: Fang-Hui Chu A Survey of Large Margin Hidden Markov Model Xinwei Li, Hui Jiang York University.

Acoustical and Lexical Based Confidence Measures for a Very Large Vocabulary Telephone Speech Hypothesis-Verification System Javier Macías-Guarasa, Javier.

MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION Source: Automatic Speech Recognition & Understanding, ASRU. IEEE Workshop on Author.

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Scale Invariant Feature Transform (SIFT)

Effective Gaussian mixture learning for video background subtraction Dar-Shyang Lee, Member, IEEE.

Visual Recognition Tutorial

Linear and generalised linear models

HIWIRE Progress Report – July 2006 Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Alex Potamianos Technical University.

Inferences About Process Quality

Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.

Soft Margin Estimation for Speech Recognition Main Reference: Jinyu Li, " SOFT MARGIN ESTIMATION FOR AUTOMATIC SPEECH RECOGNITION," PhD thesis, Georgia.

Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.

Isolated-Word Speech Recognition Using Hidden Markov Models

1 Bayesian Learning for Latent Semantic Analysis Jen-Tzung Chien, Meng-Sun Wu and Chia-Sheng Wu Presenter: Hsuan-Sheng Chiu.

Improving Utterance Verification Using a Smoothed Na ï ve Bayes Model Reporter : CHEN, TZAN HWEI Author :Alberto Sanchis, Alfons Juan and Enrique Vidal.

Utterance Verification for Spontaneous Mandarin Speech Keyword Spotting Liu Xin, BinXi Wang Presenter: Kai-Wun Shih No.306, P.O. Box 1001,ZhengZhou,450002,

Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition Bing Zhang and Spyros Matsoukas BBN Technologies Present.

Table 3:Yale Result Table 2:ORL Result Introduction System Architecture The Approach and Experimental Results A Face Processing System Based on Committee.

Naive Bayes Classifier

Hierarchical Dirichlet Process (HDP) A Dirichlet process (DP) is a discrete distribution that is composed of a weighted sum of impulse functions. Weights.

REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.

ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.

Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者：郝柏翰 2013/01/28.

LOG-ENERGY DYNAMIC RANGE NORMALIZATON FOR ROBUST SPEECH RECOGNITION Weizhong Zhu and Douglas O’Shaughnessy INRS-EMT, University of Quebec Montreal, Quebec,

DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.4 Analyzing Dependent Samples.

Estimation of Number of PARAFAC Components

1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.

Round-Robin Discrimination Model for Reranking ASR Hypotheses Takanobu Oba, Takaaki Hori, Atsushi Nakamura INTERSPEECH 2010 Min-Hsuan Lai Department of.

Speaker Verification Speaker verification uses voice as a biometric to determine the authenticity of a user. Speaker verification systems consist of two.

Expectation-Maximization (EM) Case Studies

CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.

Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.

Paper Reading Dalong Du Nov.27, Papers Leon Gu and Takeo Kanade. A Generative Shape Regularization Model for Robust Face Alignment. ECCV08. Yan.

MINIMUM WORD CLASSIFICATION ERROR TRAINING OF HMMS FOR AUTOMATIC SPEECH RECOGNITION Yueng-Tien, Lo Speech Lab, CSIE National.

Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.

ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.

Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.

Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.

Speaker Verification Using Adapted GMM Presented by CWJ 2000/8/16.

1 Chapter 8 Interval Estimation. 2 Chapter Outline  Population Mean: Known  Population Mean: Unknown  Population Proportion.

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Statistical Significance Hypothesis Testing.

Statistical Models for Automatic Speech Recognition Lukáš Burget.

1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.

A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.

Bayesian Hierarchical Clustering Paper by K. Heller and Z. Ghahramani ICML 2005 Presented by David Williams Paper Discussion Group ( )

Author :K. Thambiratnam and S. Sridharan DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING Reporter.

Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.

Chapter 7: Hypothesis Testing. Learning Objectives Describe the process of hypothesis testing Correctly state hypotheses Distinguish between one-tailed.

We propose an accurate potential which combines useful features HP, HH and PP interactions among the amino acids Sequence based accessibility obtained.

Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.

Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.

A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.

1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ； Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.

Objectives: Loss Functions Risk Min. Error Rate Class. Resources: DHS – Chap. 2 (Part 1) DHS – Chap. 2 (Part 2) RGO - Intro to PR MCE for Speech MCE for.

Virtual University of Pakistan

Statistical Models for Automatic Speech Recognition

Hidden Markov Models Part 2: Algorithms

Statistical Models for Automatic Speech Recognition

Parametric Methods Berlin Chen, 2005 References:

Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,

Presentation transcript:

A New Approach to Utterance Verification Based on Neighborhood Information in Model Space Author :Hui Jiang, Chin-Hui Lee Reporter : 陳燦輝

2 Reference [1] Hui Jiang, Chin-Hui Lee, “A new approach to utterance verification based on neighborhood information in model space”,Speech and Audio Processing, IEEE Transactions on, Vol. 11, No. 5. (2003), pp [2] H. Jiang, K. Hirose, and Q. Huo, “Robust speech recognition based on Bayesian prediction approach,” IEEE Trans. Speech Audio Processing,vol. 7, pp. 426–440, July [3] N. Merhav and C.-H. Lee, “A minimax classification approach with application To robust speech recognition,” IEEE Trans. Speech Audio Processing, vol. 1, pp. 90–100, 1993.

3 Outline Introduction UV based on neighborhood information Bayes factors : a bayesian tool for verification problems. Experiments Summary and Conclusions

4 Introduction The major difficulty with likelihood ration test-based in utterance verification is how to model the alternative hypothesis. It is very important to know the properties of competing source distributions. In this paper, we are going to investigate a novel idea to perform utterance verification based on neighborhood information in model space.

5 UV based on neighborhood information Nested neighborhoods in model space :

6 UV based on neighborhood information (cont) Nested neighborhoods in model space (cont) : Fig. 1. Illustration of the structure of nested neighborhoods in HMM model space.

7 UV based on neighborhood information (cont) Nested neighborhoods in model space (cont) :

8 UV based on neighborhood information (cont) Nested neighborhoods in model space (cont) :

9 UV based on neighborhood information (cont) For a given speech segment X, assume that a ASR system recognizes it as word W which is represented by an HMM model Traditionally, We usually formulate UV as a statistical hypothesis testing problem. Here, we translate the above hypothesis testing into the following ones

10 UV based on neighborhood information (cont) Fig. 2. Illustration of hypothesis testing in the scenario of detecting speech recognition errors based on the neighborhood information.

11 Bayes factors The Bayesian approach to hypothesis testing involves the calculation and evaluation of the so- called Bayes factor. Given the observation X along with two hypotheses and, Bayes factors is computed as

12 Bayes factors (cont) In order to use Bayes factors to solve the hypothesis testing problem, i.e., two important issue must be addressed How to properly choose prior distribution p(.) of HMM model parameter for each hypothesis. How to quantitatively define neighborhoods

13 Bayes factors (cont)

14 Bayes factors (cont)

15 Bayes factors (cont)

16 Bayes factors (cont)

17 Bayes factors (cont)

18 Bayes factors (cont)

19 Bayes factors (cont)

20 Bayes factors (cont)

21 Bayes factors (cont)

22 Bayes factors (cont)

23 Bayes factors (cont)

24 Bayes factors (cont) In this paper, in order to balance contribution from different models in the neighborhood, we introduce an exponential scale factor into the integral calculation. The exponential scale factor is important equalize the contributions from different models in the neighborhood during the computation of Bayes factor. If we choose, the models with large likelihood values are emphasized. On the other hand if the models with smaller likelihood values will be put more weight.

25 Bayes factors (cont)

26 Bayes factors (cont)

27 Bayes factors (cont)

28 Bayes factors (cont)

29 Experiments We evaluate proposed methods on Bell Labs communicator system In our recognition system, we used a 38-dimension feature vector, consisting of 12 Mel LPCCEP, 12 delta CEP, 12 delta-delta CEP, delta and delta-delta log-energy The acoustic models are state-tied, tri-phone CDHMM models, which consist of roughly 4K distinct HMM states with an average 13.2 Gaussian mixture per state.

30 Experiments (cont) A class-based, tri-gram LM including 2600 words is used. The ASR system achieves 15.8% WER in our independent evaluation set, which includes in total 1395 utterances. Based on the word and phoneme segmentations generated by the recognizer, we calculate a confidence score for every recognized word.

31 Experiments (cont) Baseline system : likelihood ratio test. New approach with settings in Case I We choose neighborhood and constrained uniform prior distribution. Since we use static, delta and delta-delta feature, we slightly modify the neighborhood definition in (2) as

32 Experiments (cont) New approach with settings in Case I (cont) For the state-dependent setting, we first set up to a small value, and to a large value. According to (26) we have manually checked the range and New approach with settings in Case || We choose delta priors in (27) and (28) in the level of HMM state. At first, for each distinct state, we calculate its distance from all other states. The distance between two HMM states is computed as the minimum euclidean distance between every possible pair of Gaussian components from these states

33 Experiments (cont) New approach with settings in Case || (cont) For each state, we sort all other states according to their distances form the underlying state. In the first case, denoted as Case II-A, for each underlying HMM state, we choose neighborhood sizes to include exactly other states in and in In the second case, denoted as Case II-B, from the top 1500 sorted states, we choose neighborhood sizes for to include all other states with distance less than and one’s distance between and

34 Experiments (cont) TABLE I VERIFICATION PERFORMANCE COMPARISON (EQUAL ERROR RATE IN %) OF BASELINE UV METHOD (LRT + ANTI- MODELS) WITH THE PROPOSED NEW APPROACH IN SEVERAL DIFFERENT SETTINGS. IN EACH CASE, THE BEST PERFORMANCE OF THE NEW APPROACH AND ITS CORRESPONDING PARAMETER SETTING ARE GIVEN. HERE WE ALWAYS FIX = 1.2

35 Experiments (cont) Fig. 3. Comparison of ROC curves for different methods when verifying mis-recognized words against correctly recognized words in ASR outputs.

36 Summary and Conclusions The basic idea is to assume that all competing models of a given model sit inside one neighborhood of the underlying model. More research works are still need to search for a better neighborhood definition in high- dimension HMM model space. Another possible research direction for future works, in stead of Bayes factors, such as generalized likelihood ratio testing (GLRT) can also be used to implement the neighborhood based UV