Speech User Interface 10/26/2010. Pervasive Information Access Information & Services I-Land vision by Streitz, et. al.

Speech User Interface 10/26/2010

Pervasive Information Access Information & Services I-Land vision by Streitz, et. al.

Motivations Smaller devices  difficult I/O People can talk at 90 words/minute. “Virtually unlimited” set of commands Freedom for other body parts People drive and talk on the phone all the time Natural: evolutionarily selected for

Why are Speech UI Hard to Get Right? Speech recognition is far from perfect: imagine inputting commands w/ the mouse & getting the wrong result 5-20% of the time Speech UIs have no visible state: can ’ t see what you have done before or what affect your commands have had Speech UIs are hard to learn: how do you explore the interface? how do you find out what you can say?

Key Components Speech recognition the computer understanding what the customer is saying Speech production (or synthesis) the computer talking to the customer

Speech Recognition Continuous vs. non-continuous Speaker independent vs. dependent Speech often misunderstood by people feedback via speech, facial expressions, & gesture Recognizers trained with real samples often get gender-based problems Based on probabilities (HMMs - Bayes) trigrams of sounds or words Several popular recognizers Nuance, Dragon Naturally Speaking, IBM ViaVoice

Speech Production Also known as text-to-speech (TTS) TTS Demo (Mandarin) NTHU MIR Lab NTU CSIE GUTTS Bell Lab Demo 工研院資通所科大訊飛

TTS Demo (English) AT & T Natural Voices Good evening, class. Today we are going to discuss an important type of human- computer interface: speech UI, also known as voice UI. We will demonstrate a TTS engine developed by AT & T, which, in my opinion, is the best TTS so far. Good evening, class. Today we are going to discuss an important type of human- computer interface: speech UI, also known as voice UI. We will demonstrate a TTS engine developed by AT & T, which, in my opinion, is the best TTS so far.

Recognition Problems Poor recognition humans < 1% error rate on dictation top recognition systems get 5-10% error rates computers don ’ t use much context Background noise even worse recognition rates (20-40% error) Slow simple matter of hardware getting faster in 10 years gone from 5 high-end workstations required to some speech systems running on laptops or even PDAs

More Recognition Problems Isolated, short words difficult common words become short Segmentation silly versus sill lea Spelling mail vs. male -> need to understand language What about Mandarin?

Speech UI Problems Major problems: modes (no feedback) certain commands only work when in specific states deep hierarchies (also known as voice mail hell) Verbose feedback wastes time/patience only confirm consequential things use meaningful, short cues Interruption half-duplex communication (i.e., no barge-in support) Too much speech on the part of customer is tiring Speech takes up space in working memory can cause problems when problem solving

Developing VUI VoiceXML VoiceXML (VXML) is the W3C's standard XML format for specifying interactive voice dialogues between a human and a computer.W3CXML

Speech User Interface 10/26/2010. Pervasive Information Access Information & Services I-Land vision by Streitz, et. al.

Similar presentations

Presentation on theme: "Speech User Interface 10/26/2010. Pervasive Information Access Information & Services I-Land vision by Streitz, et. al."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Speech User Interface 10/26/2010. Pervasive Information Access Information & Services I-Land vision by Streitz, et. al.

Similar presentations

Presentation on theme: "Speech User Interface 10/26/2010. Pervasive Information Access Information & Services I-Land vision by Streitz, et. al."— Presentation transcript:

Similar presentations

About project

Feedback