The Universal Speech Interface (USI) PDG Progress Report Thomas Harris, Stefanie Tomko, Arthur Toth, James Sanders, Alex Rudnicky, Roni Rosenfeld School.

The Universal Speech Interface (USI) PDG Progress Report Thomas Harris, Stefanie Tomko, Arthur Toth, James Sanders, Alex Rudnicky, Roni Rosenfeld School of Computer Science Carnegie Mellon University 4 June 2003

Outline USI Project Summary USI Device Control USI User Studies Tech Transfer Initiative –USI Application Generator

Program Goals and Plan Overall program goal: –Design a universal (i.e. device-independent) interface for speech-based interaction with wearable and home devices Program plan & milestones: –Q1: analysis, interaction principles –Q2: build device-simulation environment –Q3: build first device prototype –Q4: initial user studies; development tools

Program Deliverables A novel universal design for speech-based interaction with wearable- and home- devices At least one demonstration system exemplifying the new interface A set of tools for rapid prototyping of compliant applications

The Universal Speech Interface (USI) In a Nutshell Unifying approach to human-machine speech communication Unified “look and feel” across all applications –analogous to the Xerox/Macintosh/Windows GUI look-and-feel Stylized, semi-natural interaction –analogous to the “Graffiti” alphabet for the Palm PDA

Existing Speech Paradigm 1: Command-and-control Systems Specialized language, optimized for a given application –each application has its own interface Intensive training of each user Daily use helps retain knowledge

Existing Speech Paradigm 2: Unconstrained Dialog Systems “Off-the-street” users, no training required System models existing human behavior But this comes at a cost: –each application requires a great deal of data, labor, human expertise –Speech Recognition technology is pushed to the limit –user does not easily grasp the application’s functional limits Out-Of-Vocabulary words (OOV) Out-Of-Domain concepts, requests

Is a Third Paradigm Needed? In practice, people are likely to use: –a handful of apps daily: scheduler, contact manager, email,... –many apps occasionally: weather, restaurants,... To exploit this, we need: –flexible, powerful interface for familiar applications. –immediate engagement with occasional or new applications.

Our Approach Identify application-independent universals: –user-side –machine-side Find suitable, general solutions –Human and machine meeting halfway Design a stylized, universal “look and feel” Teach it in 5 minutes

Universal Semantic primitives Help primitives –what can the machine do? how do I do X? what can I say? Speech channel primitives –detect & correct ASR errors; finished talking? Interaction primitives –turn taking; question answering; session management; undo Application primitives –environment variables: query, set –objects (e.g. lists): describe, navigate, create, modify, delete

USI Systems Developed Information Access –MovieLine –FlightLine –ApartmentLine Device Control –Stereo system –X-10 control (e.g., lights) –Alarm Clock applet –Digital Video Camera –Windows Media Player

USI Demonstration MovieLine –Experimental subject

USI Device Control

Device Interaction Analysis Analysis was done on multiple devices –alarm clock / radio –VCR –cell phone –MP3 player –memo pad / email / vmail –copier/fax

USI/Device Design Issues Confirmation strategy Error handling strategy Exploration Navigation Disambiguation / context mgmt Orientation Querying state variables

USI/Device Design Issues Confirmation strategy: restate-&-execute Error handling strategy: ignore Exploration: “OPTIONS” Navigation: use concept of ‘focus’ Disambiguation / context mgmt: implicit Orientation: “STATUS” Querying state variables: “WHAT IS THE...?”

Hooking up with the PUC project Fits within the PUC project’s vision of automatically generated interfaces with different modalities and form factors But, can also be used as a standalone speech interface Compatibility with visual design is desirable, but not always natural: –nameless states (speech interface must have name for everything!) –speech interface can have shortcuts (“MODE: CD” vs. “CD”)

Meshing with the PUC project Device capabilities specified by XML doc States vs. Action dichotomy of the visual interface does not always conform to speech interface intuition. For now, creating our own interface specification document Ultimately, will augment XML DTD, so both interfaces can co-exist

USI Device control (a.k.a. James the Butler) Hardware hacking courtesy of the PUC project

USI Demonstration Device Control –Alarm Clock Example

User Studies

User study Compared Speech Graffiti (SG) & natural language MovieLines How does Speech Graffiti compare to a natural language interface? –Subjective user satisfaction –Task completion rates –Word error rates How do well do users "get" Speech Graffiti? –How often do they speak within the grammar? –In what ways do they deviate from the grammar?

Subjective user satisfaction 17 of 23 preferred Speech Graffiti (SG) SG user satisfaction ratings higher than NL in all categories SG ratings positive except in annoyance & habitability

Computer experience & training Computer Science / Engineering backgrounds and / or programming experience –Higher user satisfaction ratings –Better task completion rates Training in-domain vs. out-of-domain –No differences in user satisfaction or task completion rates

Task completion Overall –67.9% SG tasks –67.4% NL tasks Individual means –5.43 of 8 SG tasks –5.30 of 8 NL tasks

Time-to-completion Completed tasks –67.9 seconds SG –73.4 seconds NL Incomplete tasks: time, in seconds “best case”“real world” 27.3 43.5 76.0 23.0 38.0 103.8 (inc) 81.5 34.0 (inc) 103.0 28.0 59 incompletes SGML NLML

Turns-to-completion Completed tasks –8.2 turns SG –3.9 turns NL Incomplete tasks: 35 5 20 (inc) 4 5 9.75 1 2 5 10 4 5 “best case”“real world” # of turns SG-ML NL-ML 59 incompletes 2

Word error rates Very high for both systems –On "cleaned" set (on-task, non-noisy utts) Concept error is lower for USI –SG: –29.2% from WER –NL: +0.8% from WER Low error rate is key to acceptance –6 who preferred NL-ML had highest SG WER

WER & user satisfaction Good correlation for SG SG-ML % word-error rate user satisfaction rating NL-ML

How often do users speak within the Speech Graffiti grammar? Actually, pretty often! … and grammaticality leads to user satisfaction

How do users deviate from the grammar?

Future Interface Design Work Redesign Help facility –SG works best for those who "get it" –Current system provides no assistance to "clueless user" Error analysis –Compare failure cases in SG and NL interfaces –Compare user recovery attempts in SG and NL Address issues of generalizability –Promoting transparency of slot set and response sets –Accessing information sets rather than single items Adjust grammar components

Future Architecture Work Integrate current USI environments –Information Access –Device Control Improve interface between PUC and USI components Identify USI-specific techniques to achieve lower WER Improved documentation and distribution packaging

Tech Transfer Initiative

Tools for creating new USI apps –3 days to create a new application –prior exposure to speech technology highly beneficial –decided to further reduce the barrier –  create an application generator

From 3 Days to a Few Hours A USI Application Generator New USI applications w/out programming! XML document fully specifies the application –slot names –accepted inputs –data types –slot properties –...

From a Few Hours to 15 minutes? Created a Web interface to generating the XML document Form filling, pulldown menus Strong effort to further simplify the process, minimize complexity of form –many defaults –for less common choices, edit the XML doc. More importantly, no computer savvy needed

Web Application Generator Repository and tool for creating USI database applications Abundant online help to guide users through process Accessible to anyone with an Internet connection

Web Application Generator Two step process: –General specification –Slot-by-slot specification choose datatype from built-in list, or create own Fully featured system with save, copy, delete functionality Hides intricacies of XML document writing Advanced users have ability to further alter the final XML document

General Specification screen with help box displayed.

Web Application Generator Built-in generic voice; can record own voice DB backend –Postgres –Oracle –ODBC (including ASCII files) –Ultimately: web tables Platform: –originally: mixed Unix/Windows, telephone based –converted to: pure Windows, telephone or laptop

Transferring USI to PDG members We do house calls! –Carnegie Mellon will install USI developer environment for each interested member and will train member staff in the use of the developer environment –Provide a short tutorial on USI principles and interface design

Thank you! Pittsburgh Digital Greenhouse

The Universal Speech Interface (USI) PDG Progress Report Thomas Harris, Stefanie Tomko, Arthur Toth, James Sanders, Alex Rudnicky, Roni Rosenfeld School.

Similar presentations

Presentation on theme: "The Universal Speech Interface (USI) PDG Progress Report Thomas Harris, Stefanie Tomko, Arthur Toth, James Sanders, Alex Rudnicky, Roni Rosenfeld School."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Universal Speech Interface (USI) PDG Progress Report Thomas Harris, Stefanie Tomko, Arthur Toth, James Sanders, Alex Rudnicky, Roni Rosenfeld School.

Similar presentations

Presentation on theme: "The Universal Speech Interface (USI) PDG Progress Report Thomas Harris, Stefanie Tomko, Arthur Toth, James Sanders, Alex Rudnicky, Roni Rosenfeld School."— Presentation transcript:

Similar presentations

About project

Feedback