Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech Keith Vertanen July 30 th, 2004.

Slides:



Advertisements
Similar presentations
Building an ASR using HTK CS4706
Advertisements

15.0 Utterance Verification and Keyword/Key Phrase Spotting References: 1. “Speech Recognition and Utterance Verification Based on a Generalized Confidence.
Linguist Module in Sphinx-4 By Sonthi Dusitpirom.
Development of Automatic Speech Recognition and Synthesis Technologies to Support Chinese Learners of English: The CUHK Experience Helen Meng, Wai-Kit.
Confidence Measures for Speech Recognition Reza Sadraei.
Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech Keith Vertanen Inference Group August 4th, 2004.
ITCS 6010 Spoken Language Systems: Architecture. Elements of a Spoken Language System Endpointing Feature extraction Recognition Natural language understanding.
ASR Evaluation Julia Hirschberg CS Outline Intrinsic Methods –Transcription Accuracy Word Error Rate Automatic methods, toolkits Limitations –Concept.
Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.
Speech User Interfaces
© 2001 CUHK Recognition Software Building Block Overview 1. Task specification — What is the function of the system ? 2. Recognizer Design — Vocabulary.
® Automatic Scoring of Children's Read-Aloud Text Passages and Word Lists Klaus Zechner, John Sabatini and Lei Chen Educational Testing Service.
Chapter 11: Interaction Styles. Interaction Styles Introduction: Interaction styles are primarily different ways in which a user and computer system can.
Chapter 5 Models and theories 1. Cognitive modeling If we can build a model of how a user works, then we can predict how s/he will interact with the interface.
November 2005CSA3180: Statistics III1 CSA3202: Natural Language Processing Statistics 3 – Spelling Models Typing Errors Error Models Spellchecking Noisy.
Arthur Kunkle ECE 5525 Fall Introduction and Motivation  A Large Vocabulary Speech Recognition (LVSR) system is a system that is able to convert.
Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.
Speech and Language Processing
 Feature extractor  Mel-Frequency Cepstral Coefficients (MFCCs) Feature vectors.
Probabilistic Context Free Grammars for Representing Action Song Mao November 14, 2000.
Chapter 5. Probabilistic Models of Pronunciation and Spelling 2007 년 05 월 04 일 부산대학교 인공지능연구실 김민호 Text : Speech and Language Processing Page. 141 ~ 189.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
1 CSA4050: Advanced Topics in NLP Spelling Models.
1 An Assessment of a Speech-Based Programming Environment Andrew Begel Microsoft Research (formerly UC Berkeley)
Using Speech Recognition Copyright 2006 South-Western/Thomson Learning.
1M4 speech recognition University of Sheffield M4 speech recognition Vincent Wan, Martin Karafiát.
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
Sequence Models With slides by me, Joshua Goodman, Fei Xia.
22CS 338: Graphical User Interfaces. Dario Salvucci, Drexel University. Lecture 10: Advanced Input.
Tuning Your Application: The Job’s Not Done at Deployment Monday, February 3, 2003 Developing Applications: It’s Not Just the Dialog.
NFIRS Data Entry Browser Interface (DEBI)
Microsoft Assistive Technology Products Brought to you by... Jill Hartman.
Speech Recognition MIT SMA 5508 Spring 2004 Larry Rudolph (MIT)
Programming with Microsoft Visual Basic th Edition
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Introduction to Neural Networks and Example Applications in HCI Nick Gentile.
Large Vocabulary Continuous Speech Recognition. Subword Speech Units.
Stentor A new Computer-Aided Transcription software for French language.
Conditional Random Fields for ASR Jeremy Morris July 25, 2006.
Problems of Modeling Phone Deletion in Conversational Speech for Speech Recognition Brian Mak and Tom Ko Hong Kong University of Science and Technology.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Supervised Learning Resources: AG: Conditional Maximum Likelihood DP:
Letter to Phoneme Alignment Using Graphical Models N. Bolandzadeh, R. Rabbany Dept of Computing Science University of Alberta 1 1.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
People today are limited to a mouse and keyboard when using a computer There are little to no alternatives out in the market at this moment Natural human.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences Recurrent Neural Network-based Language Modeling for an Automatic.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
COGNITIVE WALKTHROUGH Vigneshwar Poojar. The cognitive walkthrough is a formalized way of imagining people’s thoughts and actions when they use an interface.
January 2012Spelling Models1 Human Language Technology Spelling Models.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Author :K. Thambiratnam and S. Sridharan DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING Reporter.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
Minimum Edit Distance Definition of Minimum Edit Distance.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Tight Coupling between ASR and MT in Speech-to-Speech Translation Arthur Chan Prepared for Advanced Machine Translation Seminar.
NFIRS Data Entry Browser Interface (DEBI) nfirs. fema
Reza Yazdani Albert Segura José-María Arnau Antonio González
Linguistic knowledge for Speech recognition
Speaker : chia hua Authors : Long Qin, Ming Sun, Alexander Rudnicky
Statistical Models for Automatic Speech Recognition
Franklin (Mingzhe) Li, Mingming Fan & Khai N. Truong
NFIRS Data Entry Browser Interface (DEBI) nfirs. fema
Document Expansion for Speech Retrieval (Singhal, Pereira)
Statistical Models for Automatic Speech Recognition
NFIRS Data Entry Browser Interface (DEBI) nfirs. fema
NFIRS Data Entry Browser Interface (DEBI) nfirs. fema
NFIRS Data Entry Browser Interface (DEBI) nfirs. fema
NFIRS Data Entry Browser Interface (DEBI) nfirs. fema
Presentation transcript:

Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech Keith Vertanen July 30 th, 2004

The problem Speech recognizers make mistakes Correcting mistakes is inefficient  140 WPM Uncorrected dictation  14 WPMCorrected dictation, mouse/keyboard  32 WPMCorrected typing, mouse/keyboard Voice-only correction is even slower and more frustrating

Research overview Make correction of dictation:  More efficient  More fun  More accessible Approach:  Build a word lattice from a recognizer’s n-best list  Expand lattice to cover likely recognition errors  Make a language model from expanded lattice  Use model in a continuous gesture interface to perform confirmation and correction

Building lattice Example n-best list: 1: jack studied very hard 2: jack studied hard 3: jill studied hard 4: jill studied very hard 5: jill studied little

Insertion errors

Acoustic confusions Given a word, find words that sound similar Look pronunciation up in dictionary: studieds t ah d iy d Use observed phone confusions to generate alternative pronunciations:s t ah d iy d s ao d iy s t ah d iy … Map pronunciation back to words: s t ah d iy d studied s ao d iysaudi s t ah d iystudy

Acoustic confusions: “Jack studied hard”

Language model confusions: “Jack studied hard” Look at words before or after a node, add likely alternate words based on n-gram LM

Expansion results (on WSJ1)

Probability model Our confirmation and correction interface requires probability of a letter given prior letters:

Probability model Keep track of possible paths in lattice Prediction based on next letter on paths Interpolate with default language model Example, user has entered “the_cat”:

Handling word errors Use default language model during entry of erroneous word Rebuild paths allowing for an additional deletion or substitution error Example, user has entered “the_cattle_”:

Evaluating expansion Assume a good model requires as little information from the user as possible

Results on test set Model evaluated on held out test set (Hub1) Default language model  2.4 bits/letter  User decides between 5.3 letters Best speech-based model  0.61 bits/letter  User decides between 1.5 letters

“To the mouse snow means freedom from want and fear”

Questions?