Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech Keith Vertanen Inference Group August 4th, 2004.

Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech Keith Vertanen Inference Group August 4th, 2004

The problem Speech recognizers make mistakes Correcting mistakes is inefficient  140 WPM Uncorrected dictation  14 WPMCorrected dictation, mouse/keyboard  32 WPMCorrected typing, mouse/keyboard Voice-only correction is even slower and more frustrating

Research overview Make correction of dictation:  More efficient  More fun  More accessible Approach:  Build a word lattice from a recognizer’s n-best list  Expand lattice to cover likely recognition errors  Make a language model from expanded lattice  Use model in a continuous gesture interface to perform confirmation and correction

Building lattice Example n-best list: 1: jack studied very hard 2: jack studied hard 3: jill studied hard 4: jill studied very hard 5: jill studied little

Insertion errors

Acoustic confusions Given a word, find words that sound similar Look pronunciation up in dictionary: studieds t ah d iy d Use observed phone confusions to generate alternative pronunciations:s t ah d iy d s ao d iy s t ah d iy … Map pronunciation back to words: s t ah d iy d studied s ao d iysaudi s t ah d iystudy

Acoustic confusions: “Jack studied hard”

Morphology confusions Given a word, find words that share the same “root”. Using the Porter stemmer: jacking jacks jack jacked study studying studied studies studi jack

Morphology confusions: “Jack studied hard”

Language model confusions: “Jack studied hard” Look at words before or after a node, add likely alternate words based on n-gram LM

Expansion results (on WSJ1)

Probability model Our confirmation and correction interface requires probability of a letter given prior letters:

Probability model Keep track of possible paths in lattice Prediction based on next letter on paths Interpolate with default language model Example, user has entered “the_cat”: 1.00

Handling word errors Use default language model during entry of erroneous word Rebuild paths allowing for an additional deletion or substitution error Example, user has entered “the_cattle_”: 0.25 0.0625

Using expanded lattice Paths using arcs added during lattice expansion are penalized. Example, user has entered “jack_”: 0.04 1.00

Evaluating expansion Assume a good model requires as little information from the user as possible

Results on test set Model evaluated on held out test set (Hub1) Default language model  2.4 bits/letter  User decides between 5.3 letters Best speech-based model  0.61 bits/letter  User decides between 1.5 letters

“To the mouse snow means freedom from want and fear”

“The hibernating skunk curled up in his deep den uncurls himself and ventures forth to prowl the world”

Conclusions One-third of recognition errors covered by expanding lattice. Only insertion error expansion improves efficiency. Speech-based model significantly improves efficiency (2.4 bits -> 0.61 bits). A good correction interface is possible using Dasher and an off-the-shelf recognizer.

Future work Update Speech Dasher to use lattice-based probability model. Incorporate hypothesis probabilities into lattice (or even better get at recognizer’s lattice). Improve efficiency on sentences with few or no errors. User trials to validate numeric results.

Questions?

Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech Keith Vertanen Inference Group August 4th, 2004.

Similar presentations

Presentation on theme: "Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech Keith Vertanen Inference Group August 4th, 2004."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech Keith Vertanen Inference Group August 4th, 2004.

Similar presentations

Presentation on theme: "Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech Keith Vertanen Inference Group August 4th, 2004."— Presentation transcript:

Similar presentations

About project

Feedback