Confidence Measures for Speech Recognition Reza Sadraei.

Slides:



Advertisements
Similar presentations
Introduction to Hypothesis Testing
Advertisements

Chapter 7 Hypothesis Testing
Introduction to Hypothesis Testing
Classification. Introduction A discriminant is a function that separates the examples of different classes. For example – IF (income > Q1 and saving >Q2)
NEURAL NETWORKS Perceptron
Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Statistics.  Statistically significant– When the P-value falls below the alpha level, we say that the tests is “statistically significant” at the alpha.
Learning Algorithm Evaluation
Pattern Recognition and Machine Learning
15.0 Utterance Verification and Keyword/Key Phrase Spotting References: 1. “Speech Recognition and Utterance Verification Based on a Generalized Confidence.
Chap 9: Testing Hypotheses & Assessing Goodness of Fit Section 9.1: INTRODUCTION In section 8.2, we fitted a Poisson dist’n to counts. This chapter will.
Hypothesis testing Week 10 Lecture 2.
Data Mining Techniques Outline
QM Spring 2002 Business Statistics Introduction to Inference: Hypothesis Testing.
Sample size computations Petter Mostad
Understanding Research Results. Effect Size Effect Size – strength of relationship & magnitude of effect Effect size r = √ (t2/(t2+df))
UNDERSTANDING RESEARCH RESULTS: STATISTICAL INFERENCE © 2012 The McGraw-Hill Companies, Inc.
1 Software Testing and Quality Assurance Lecture 5 - Software Testing Techniques.
The Mann-Whitney U test Peter Shaw. Introduction We meet our first inferential test. You should not get put off by the messy-looking formulae – it’s usually.
Inferential Statistics
AM Recitation 2/10/11.
F-Test ( ANOVA ) & Two-Way ANOVA
Gesture Recognition Using Laser-Based Tracking System Stéphane Perrin, Alvaro Cassinelli and Masatoshi Ishikawa Ishikawa Namiki Laboratory UNIVERSITY OF.
Hypothesis Testing – Introduction
Jeopardy Hypothesis Testing T-test Basics T for Indep. Samples Z-scores Probability $100 $200$200 $300 $500 $400 $300 $400 $300 $400 $500 $400.
Sections 8-1 and 8-2 Review and Preview and Basics of Hypothesis Testing.
Presented by: Fang-Hui, Chu Automatic Speech Recognition Based on Weighted Minimum Classification Error Training Method Qiang Fu, Biing-Hwang Juang School.
Improving Utterance Verification Using a Smoothed Na ï ve Bayes Model Reporter : CHEN, TZAN HWEI Author :Alberto Sanchis, Alfons Juan and Enrique Vidal.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Overview Basics of Hypothesis Testing
Jan 17,  Hypothesis, Null hypothesis Research question Null is the hypothesis of “no relationship”  Normal Distribution Bell curve Standard normal.
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Chapter 8 Introduction to Hypothesis Testing
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
Lecture 16 Dustin Lueker.  Charlie claims that the average commute of his coworkers is 15 miles. Stu believes it is greater than that so he decides to.
Computational Intelligence: Methods and Applications Lecture 12 Bayesian decisions: foundation of learning Włodzisław Duch Dept. of Informatics, UMK Google:
IE241: Introduction to Hypothesis Testing. We said before that estimation of parameters was one of the two major areas of statistics. Now let’s turn to.
Introduction to the Practice of Statistics Fifth Edition Chapter 6: Introduction to Inference Copyright © 2005 by W. H. Freeman and Company David S. Moore.
Lecture 18 Dustin Lueker.  A way of statistically testing a hypothesis by comparing the data to values predicted by the hypothesis ◦ Data that fall far.
Lecture 17 Dustin Lueker.  A way of statistically testing a hypothesis by comparing the data to values predicted by the hypothesis ◦ Data that fall far.
METHODS IN BEHAVIORAL RESEARCH NINTH EDITION PAUL C. COZBY Copyright © 2007 The McGraw-Hill Companies, Inc.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.3 Two-Way ANOVA.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Two-Way (Independent) ANOVA. PSYC 6130A, PROF. J. ELDER 2 Two-Way ANOVA “Two-Way” means groups are defined by 2 independent variables. These IVs are typically.
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
A New Approach to Utterance Verification Based on Neighborhood Information in Model Space Author :Hui Jiang, Chin-Hui Lee Reporter : 陳燦輝.
Software Quality Assurance and Testing Fazal Rehman Shamil.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Example The strength of concrete depends, to some extent on the method used for drying it. Two different drying methods were tested independently on specimens.
Inferential Statistics Psych 231: Research Methods in Psychology.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.
Sridhar Raghavan and Joseph Picone URL:
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
Objectives: Loss Functions Risk Min. Error Rate Class. Resources: DHS – Chap. 2 (Part 1) DHS – Chap. 2 (Part 2) RGO - Intro to PR MCE for Speech MCE for.
Software Testing.
Introduction to Machine Learning
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Hypothesis Testing – Introduction
P-value Approach for Test Conclusion
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
I. Statistical Tests: Why do we use them? What do they involve?
UNDERSTANDING RESEARCH RESULTS: STATISTICAL INFERENCE
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Speaker Identification:
Discriminative Training
STA 291 Spring 2008 Lecture 17 Dustin Lueker.
Statistical inference
Presentation transcript:

Confidence Measures for Speech Recognition Reza Sadraei

Speech Recognition (Spring 2008)Page 2 Confidence Measures In speech recognition, confidence measures (CM) are used to evaluate reliability of recognition results. Confidence measures can help ASR systems to migrate from laboratory to real world.

Speech Recognition (Spring 2008)Page 3 Overview The approaches for computing CM can be presented as three major categories: –CM as a combination of predictor features –CM as a posterior probability –CM as utterance verification

Speech Recognition (Spring 2008)Page 4 Utterance Verification Utterance verification (UV) is a procedure used to verified how reliable are the results. Usually utterance verification is considered as a statistical hypothesis testing problem.

Speech Recognition (Spring 2008)Page 5 Utterance Verification Posing the problem in this fashion: –Does the input speech (X) contain the keyword corresponding to the most likely keyword (W) as determined by the speech recognizer?

Speech Recognition (Spring 2008)Page 6 Utterance Verification For a typical pattern classifier, given an observation X as input, we always get a pattern class W as output. X could come from several sources: –X actually comes from the class W –X comes from other classes instead of W –X is an outlier

Speech Recognition (Spring 2008)Page 7 Neyman-Pearson Lemma According to Neyman-Pearson lemma, an optimal test is to evaluate a likelihood ratio between two hypothesis H 0 and H 1. –H 0 : X is correctly recognized (Null Hypothesis) –H 1 : X is wrongly recognized (Alternative Hypothesis) –Where τ is the decision threshold.

Speech Recognition (Spring 2008)Page 8 Difficulty Computing null hypothesis is straightforward but the alternative hypothesis is a composite one, so that it is always very difficult to model H 1.

Speech Recognition (Spring 2008)Page 9 First Approach The likelihood ratio can be written as: –Where L(X|W) is the likelihood of the observation X given pattern class W.

Speech Recognition (Spring 2008)Page 10 First Approach The models that are used for computing alternative hypothesis are called competing models. Computing LRT as it is defined, is required the evaluation of the likelihood of speech segment X for each of the models in the model set. To reduce the computational complexity, we can consider smaller number of competing models.

Speech Recognition (Spring 2008)Page 11 Second Approach The competing set (or cohort set) for a given W is defined to be a fixed number (K) of pattern classes that are most confusable with W.

Speech Recognition (Spring 2008)Page 12 Second Approach The likelihood ratio can be written as:

Speech Recognition (Spring 2008)Page 13 Third Approach: UV based on Rival Model For a typical pattern classifier, given an observation X as input, we always get a pattern class W as output. X could come from several sources: –X actually comes from the class W –X comes from other classes instead of W –X is an outlier If an observation X is classified as W but it actually does not belong to the class W, we simply call it as a rival of the class W.

Speech Recognition (Spring 2008)Page 14 Third Approach: UV based on Rival Model The set of all rivals of W: The set of observations from W:

Speech Recognition (Spring 2008)Page 15 Third Approach: UV based on Rival Model The capability of utterance verification depends on how well we can distinguish S c (W) from S r (W). Statistical hypothesis testing can still adopted as a tool to separate S c (W) from S r (W) statistically. The simplest way to model S c (W) and S r (W) is that we estimate two different models Λ c and Λ r for S c (W) and S r (W), respectively, based on all possible training data from each of the sets.

Speech Recognition (Spring 2008)Page 16 Third Approach: UV based on Rival Model Once Λ c and Λ r are given, utterance verification is operated as the following likelihood ratio:

Speech Recognition (Spring 2008)Page 17 Third Approach: UV based on Rival Model It is straightforward to define S c (W) and S r (W) for every isolated word W. But for continuous speech recognition, it is very hard to associate a definite part of data to the rival set, because numerous boundaries are possible.

Speech Recognition (Spring 2008)Page 18 Using UV in Search Procedure It is possible to use utterance verification to correct some some possible recognition errors made by recognizer during search. At every time instant t, likelihood ratio testing is conducted for current path, if its score is below some threshold, this path will be rejectted. A wrong path with high likelihood but low verification score probability can be rejected during search.

Speech Recognition (Spring 2008)Page 19 Using UV in Search Procedure Another advantage of the above method is that likelihood ratio based confidence measure is calculated and attached with every phone in all possible paths. These phone scores can be easily put together to get the confidence measures for word, phrase, or the whole sentence.

Speech Recognition (Spring 2008)Page 20 Representing Results