Download presentation
1
Speech User Interfaces
include material from UPA workshop and CHI 2000 tutorial by Jennifer Lai include more speech UI examples & less of Speechacts video (show only usage parts). Show an example of a multimodal UI in action
2
Outline Motivation for speech UIs Speech recognition
UI problems with speech UIs SpeechActs: Guidelines for speech UIs Speech UI design tools Multimodal UIs
3
Motivation for Speech UIs: Pervasive Information Access
I-Land vision by Streitz, et. al. Information & Services
4
UIs in the Pervasive Computing Era
Future computing devices won’t have the same UI as current PCs wide range of devices small or embedded in environment often w/ “alternative” I/O & w/o screens information appliances I-Land vision by Streitz, et. al. pens, speech, vision...
5
Information Access via Speech
Read my important How to Design?
6
Industry Leaders Nuance Corporation Applications: TellMe, …
Users: Government, Computers- Microsoft, IBM,
7
Speech UI Motivation Smaller devices -> difficult I/O
people can talk at ~ 90 wpm -> high speed “Virtually unlimited” set of commands Freedom for other body parts imagine you are working on your car & need to know something from the manual Natural evolutionarily selected for reading, writing, & typing are not (too new)
8
Why are Speech UIs Hard to Get Right?
Speech recognition far from perfect imagine inputting commands w/ the mouse & getting the wrong result 5-20% of the time Speech UIs have no visible state can’t see what you have done before or what affect your commands have had Speech UIs are hard to learn how do you explore the interface? how do you find out what you can say?
9
Speech UIs Require Speech recognition Speech production (or synthesis)
the computer understanding what the customer is saying Speech production (or synthesis) the computer talking to the customer
10
Speech Recognition Continuous vs. non-continuous
Speaker independent vs. dependent Speech often misunderstood by people feedback via speech, facial expressions, & gesture Recognizers trained with real samples often get gender-based problems Based on probabilities (HMMs - Bayes) trigrams of sounds or words Several popular recognizers Nuance, SpeechWorks, IBM ViaVoice
11
Speech Production Three frequency regions of great intensity visible on oscilloscope come from larynx, throat, mouth Two needed for recognition but “tinny” Can generate emotion affect in speech Demo anger, disgust, gladness, sadness, fear, & surprise
12
Recognition Problems Good recognition Background noise Speed
humans < 1% error rate on dictation top recognition systems get <1-X% error rates computers don’t use much context Key is to be application specific for lower error rates Background noise even worse recognition rates (20-40% error) Speed Better as hardware getting faster in 10 years gone from 5 high-end workstations required to some speech systems running on laptops or even PDAs
13
More Recognition Problems
Isolated, short words difficult common words become short Segmentation silly versus sill lea Spelling mail vs. male -> need to understand language
14
Speech UI Problems Speech UI no-nos
modes (no feedback) certain commands only work when in specific states deep hierarchies (aka voice mail hell) Verbose feedback wastes time/patience only confirm consequential things use meaningful, short cues Interruption half-duplex communication (i.e., no barge-in support) Too much speech on the part of customer is tiring Speech takes up space in working memory can cause problems when problem solving
15
SpeechActs: Guidelines for Speech UIs
Speech interface to computer tools , calendar, weather, stock quotes Establish common ground & shared context make sure people know where they are in the conversation Pacing recog. delays are unnatural, make it clear when this occurs barge-in lets user interrupt like in real conversations tapering of prompts progressive assistance: short errors messages at first, longer when user needs more help implicit confirmation: include confirm in next command
16
One Vision of Future User Interfaces
Star Trek style UI verbally ask the computer for information may be common in mobile/hands-busy situations problem: hard to design, build, & use! requires perfect speech recognition & language understanding
17
Our Vision of Future User Interfaces
Multimodal, Context-aware UIs multimodal uses multiple input modalities (speech & gesture) to disambiguate user says “move it to this screen” while pointing context-aware apps can be aware of location, user, what they are doing, … people are talking -> don’t rely on speech I/O Problem: how to prototype & test new ideas? Informal UI Design Tools! combine Wizard of Oz & informal storyboarding At a recent conference on user interface software and technology, Jim Foley, a pioneer in the HCI field, noted that in the 20 years since “Put That There,” there are still no tools that would let application designers build it multimodal UIs allow computers to be used in more situations & places and by more diverse people Ame’s boyfriend got a new Audi & the manual is on CD – try to repair from CD – takes 2 people & a laptop These previous projects have influenced this new work! WoZ from SUEDE, Storyboarding from DENIM, etc.
18
Multimodal Error Correction
Dictation error correction study found users are better at correcting recognition errors with a different input modality recognizer got it wrong the first time -> it will get it wrong the second time hyperarticulating aggravates Correct dictation errors with vocal spelling, writing, typing, etc
19
Summary Speech UIs UI tools are needed for speech UI design
may permit more natural computer access allow us to use computers in more situations are hard to get to work well lack of visible state, tax working memory, recognition problems, etc. UI tools are needed for speech UI design Multimodal UIs address some of the problems with pure speech UIs help disambiguate help w/ correction
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.