ASR Evaluation Julia Hirschberg CS 4706. Outline Intrinsic Methods –Transcription Accuracy Word Error Rate Automatic methods, toolkits Limitations –Concept.

Slides:



Advertisements
Similar presentations
5/10/20151 Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706.
Advertisements

CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 3: ASR: HMMs, Forward, Viterbi.
Confidence Measures for Speech Recognition Reza Sadraei.
1 LSA 352 Summer 2007 LSA 352: Speech Recognition and Synthesis Dan Jurafsky Lecture 5: Intro to ASR+HMMs: Forward, Viterbi, Baum-Welch IP Notice:
CS 4705 Automatic Speech Recognition Opportunity to participate in a new user study for Newsblaster and get $25-$30 for hours of time respectively.
Detecting missrecognitions Predicting with prosody.
The Use of Speech in Speech-to-Speech Translation Andrew Rosenberg 8/31/06 Weekly Speech Lab Talk.
Towards Natural Clarification Questions in Dialogue Systems Svetlana Stoyanchev, Alex Liu, and Julia Hirschberg AISB 2014 Convention at Goldsmiths, University.
CS 224S / LINGUIST 281 Speech Recognition, Synthesis, and Dialogue
Search is not only about the Web An Overview on Printed Documents Search and Patent Search Walid Magdy Centre for Next Generation Localisation School of.
Semantic and phonetic automatic reconstruction of medical dictations STEFAN PETRIK, CHRISTINA DREXEL, LEO FESSLER, JEREMY JANCSARY, ALEXANDRA KLEIN,GERNOT.
Some Voice Enable Component Group member: CHUAH SIONG YANG LIM CHUN HEAN Advisor: Professor MICHEAL Project Purpose: For the developers,
Lightly Supervised and Unsupervised Acoustic Model Training Lori Lamel, Jean-Luc Gauvain and Gilles Adda Spoken Language Processing Group, LIMSI, France.
Speech Recognition. My computer doesn’t understand me……….. Software is now mainstream Many people use it within office/home setting for inputting text.
Twenty-First Century Automatic Speech Recognition: Meeting Rooms and Beyond ASR 2000 September 20, 2000 John Garofolo
® Automatic Scoring of Children's Read-Aloud Text Passages and Word Lists Klaus Zechner, John Sabatini and Lei Chen Educational Testing Service.
TTS Evaluation Julia Hirschberg CS TTS Evaluation Intelligibility Tests Mean Opinion Scores Preference Tests 9/7/20152 Speech and Language Processing.
Evaluation in NLP Zdeněk Žabokrtský. Intro The goal of NLP evaluation is to measure one or more qualities of an algorithm or a system Definition of proper.
Speech and Language Processing
Author: James Allen, Nathanael Chambers, etc. By: Rex, Linger, Xiaoyi Nov. 23, 2009.
Privacy Protection for Life-log Video Jayashri Chaudhari, Sen-ching S. Cheung, M. Vijay Venkatesh Department of Electrical and Computer Engineering Center.
1 Computational Linguistics Ling 200 Spring 2006.
Evaluation of SDS Svetlana Stoyanchev 3/2/2015. Goal of dialogue evaluation Assess system performance Challenges of evaluation of SDS systems – SDS developer.
Resources: Problems in Evaluating Grammatical Error Detection Systems, Chodorow et al. Helping Our Own: The HOO 2011 Pilot Shared Task, Dale and Kilgarriff.
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
1 Sentence-extractive automatic speech summarization and evaluation techniques Makoto Hirohata, Yosuke Shinnaka, Koji Iwano, Sadaoki Furui Presented by.
DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.
22CS 338: Graphical User Interfaces. Dario Salvucci, Drexel University. Lecture 10: Advanced Input.
Tuning Your Application: The Job’s Not Done at Deployment Monday, February 3, 2003 Developing Applications: It’s Not Just the Dialog.
1 Boostrapping language models for dialogue systems Karl Weilhammer, Matthew N Stuttle, Steve Young Presenter: Hsuan-Sheng Chiu.
Using traveling salesman problem algorithms for evolutionary tree construction Chantal Korostensky and Gaston H. Gonnet Presentation by: Ben Snider.
SPEECH RECOGNITION Presented to Dr. V. Kepuska Presented by Lisa & Za ECE 5526.
Round-Robin Discrimination Model for Reranking ASR Hypotheses Takanobu Oba, Takaaki Hori, Atsushi Nakamura INTERSPEECH 2010 Min-Hsuan Lai Department of.
Dirk Van CompernolleAtranos Workshop, Leuven 12 April 2002 Automatic Transcription of Natural Speech - A Broader Perspective – Dirk Van Compernolle ESAT.
Speech and Language Processing Chapter 9 of SLP Automatic Speech Recognition (II)
1 DUTIE Speech: Determining Utility Thresholds for Information Extraction from Speech John Makhoul, Rich Schwartz, Alex Baron, Ivan Bulyko, Long Nguyen,
Conditional Random Fields for ASR Jeremy Morris July 25, 2006.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
公司 標誌 Question Answering System Introduction to Q-A System 資訊四 B 張弘霖 資訊四 B 王惟正.
A New Approach to Utterance Verification Based on Neighborhood Information in Model Space Author :Hui Jiang, Chin-Hui Lee Reporter : 陳燦輝.
Statistical Machine Translation Part II: Word Alignments and EM Alex Fraser Institute for Natural Language Processing University of Stuttgart
Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy.
1 Dynamic Time Warping and Minimum Distance Paths for Speech Recognition Isolated word recognition: Task : Want to build an isolated ‘word’ recogniser.
Integrating Multiple Knowledge Sources For Improved Speech Understanding Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering,
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Misrecognitions and Corrections in Spoken Dialogue Systems Diane Litman AT&T Labs -- Research (Joint Work With Julia Hirschberg, AT&T, and Marc Swerts,
1 Spoken Dialogue Systems Error Detection and Correction in Spoken Dialogue Systems.
Correcting Misuse of Verb Forms John Lee, Stephanie Seneff Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge ACL 2008.
Spoken Language Group Chinese Information Processing Lab. Institute of Information Science Academia Sinica, Taipei, Taiwan
PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.
Bayes Risk Minimization using Metric Loss Functions R. Schlüter, T. Scharrenbach, V. Steinbiss, H. Ney Present by Fang-Hui, Chu.
Speech Recognition and Synthesis
Conditional Random Fields for ASR
Why Study Spoken Language?
Spoken Dialogue Systems
Automatic Speech Recognition
Mohamed Kamel Omar and Lidia Mangu ICASSP 2007
Why Study Spoken Language?
Advanced NLP: Speech Research and Technologies
Speech Processing August 4, /2/2018.
Advanced NLP: Speech Research and Technologies
Spoken Dialogue Systems
PROJ2: Building an ASR System
Lecture 10: Speech Recognition (II) October 28, 2004 Dan Jurafsky
Spoken Dialogue Systems: System Overview
Automatic Speech Recognition
Automatic Speech Recognition
Hsien-Chin Lin, Chi-Yu Yang, Hung-Yi Lee, Lin-shan Lee
Presenter : Jen-Wei Kuo
Spoken Language Processing
Presentation transcript:

ASR Evaluation Julia Hirschberg CS 4706

Outline Intrinsic Methods –Transcription Accuracy Word Error Rate Automatic methods, toolkits Limitations –Concept Accuracy Limitations Extrinsic Methods

Evaluation How to evaluate the ‘goodness’ of a word string output by a speech recognizer? Terms: –R 6/23/20153 Speech and Language Processing Jurafsky and Martin

Evaluation How to evaluate the ‘goodness’ of a word string output by a speech recognizer? Terms: –ASR hypothesis: ASR output –Reference transcription: ground truth – what was actually said

Transcription Accuracy Word Error Rate (WER) –Minimum Edit Distance: Distance in words between the ASR hypothesis and the reference transcription Edit Distance: = (Substitutions+Insertions+Deletions)/N For ASR, usually all weighted equally but different weights can be used to minimize difference types of errors –WER = Edit Distance * 100

WER Calculation Word Error Rate = 100 (Insertions+Substitutions + Deletions) Total Word in Correct Transcript Alignment example: REF: portable **** PHONE UPSTAIRS last night so HYP: portable FORM OF STORES last night so Eval I S S WER = 100 (1+2+0)/6 = 50% 6/23/20156 Speech and Language Processing Jurafsky and Martin

Word Error Rate = 100 (Insertions+Substitutions + Deletions) Total Word in Correct Transcript Alignment example: REF: portable **** phone upstairs last night so *** HYP: preferable form of stores next light so far Eval S I S S S S I WER = 100 (1+5+1)/6 = 117%

NIST sctk-1.3 scoring softare: Computing WER with sclite Sclite aligns a hypothesized text (HYP) (from the recognizer) with a correct or reference text (REF) (human transcribed) id: (2347-b-013) Scores: (#C #S #D #I) REF: was an engineer SO I i was always with **** **** MEN UM and they HYP: was an engineer ** AND i was always with THEM THEY ALL THAT and they Eval: D S I I S S 6/23/20158 Speech and Language Processing Jurafsky and Martin

Sclite output for error analysis CONFUSION PAIRS Total (972) With >= 1 occurances (972) 1: 6 -> (%hesitation) ==> on 2: 6 -> the ==> that 3: 5 -> but ==> that 4: 4 -> a ==> the 5: 4 -> four ==> for 6: 4 -> in ==> and 7: 4 -> there ==> that 8: 3 -> (%hesitation) ==> and 9: 3 -> (%hesitation) ==> the 10: 3 -> (a-) ==> i 11: 3 -> and ==> i 12: 3 -> and ==> in 13: 3 -> are ==> there 14: 3 -> as ==> is 15: 3 -> have ==> that 16: 3 -> is ==> this 6/23/20159 Speech and Language Processing Jurafsky and Martin

Sclite output for error analysis 17: 3 -> it ==> that 18: 3 -> mouse ==> most 19: 3 -> was ==> is 20: 3 -> was ==> this 21: 3 -> you ==> we 22: 2 -> (%hesitation) ==> it 23: 2 -> (%hesitation) ==> that 24: 2 -> (%hesitation) ==> to 25: 2 -> (%hesitation) ==> yeah 26: 2 -> a ==> all 27: 2 -> a ==> know 28: 2 -> a ==> you 29: 2 -> along ==> well 30: 2 -> and ==> it 31: 2 -> and ==> we 32: 2 -> and ==> you 33: 2 -> are ==> i 34: 2 -> are ==> were 6/23/ Speech and Language Processing Jurafsky and Martin

Other Types of Error Analysis What speakers are most often misrecognized (Doddington ’98) –Sheep: speakers who are easily recognized –Goats: speakers who are really hard to recognize –Lambs: speakers who are easily impersonated –Wolves: speakers who are good at impersonating others

What (context-dependent) phones are least well recognized? –Can we predict this? What words are most confusable (confusability matrix)? –Can we predict this?

Are there better metrics than WER? WER useful to compute transcription accuracy But should we be more concerned with meaning (“semantic error rate”)? –Good idea, but hard to agree on approach –Applied mostly in spoken dialogue systems, where semantics desired is clear –What ASR applications will be different? Speech-to-speech translation? Medical dictation systems? 6/23/ Speech and Language Processing Jurafsky and Martin

Concept Accuracy Spoken Dialogue Systems often based on recognition of Domain Concepts Input: I want to go to Boston from Baltimore on September 29. Goal: Maximize concept accuracy (total number of domain concepts in reference transcription of user input) ConceptValue Source City Baltimore Target City Boston Travel Date Sept. 29

–CA Score: How many domain concepts were correctly recognized of total N mentioned in reference transcription Reference: I want to go from Boston to Baltimore on September 29 Hypothesis: Go from Boston to Baltimore on December 29 2 concepts correctly recognized/3 concepts in ref transcription * 100 = 66% Concept Accuracy –What is the WER? 3 Ins+2 Subst+0Del/11 * 100 = 45% WER (55% Word Accuracy)

Sentence Error Rate Percentage of sentences with at least one error –Transcription error –Concept error

Which Metric is Better? Transcription accuracy? Semantic accuracy?

Next Class Human speech perception