The Universal Speech Interface (USI) PDG Progress Report Thomas Harris, Stefanie Tomko, Arthur Toth, James Sanders, Alex Rudnicky, Roni Rosenfeld School.

Slides:



Advertisements
Similar presentations
CONCEPTUAL WEB-BASED FRAMEWORK IN AN INTERACTIVE VIRTUAL ENVIRONMENT FOR DISTANCE LEARNING Amal Oraifige, Graham Oakes, Anthony Felton, David Heesom, Kevin.
Advertisements

Online Student Success: Teaching the ABCs of Online Proficiency to Produce As, Bs, and Cs in Online Classes.
                      Digital Audio 1.
ELearning Solutions eLearning Solutions The business of education is learning.
Alternate Software Development Methodologies
Case Tools Trisha Cummings. Our Definition of CASE  CASE is the use of computer-based support in the software development process.  A CASE tool is a.
To Err is Human Computational Limits to Human Thinking : Implications for the Design of Human Centered Interfaces Raj Reddy Carnegie Mellon University.
TAC Vista Security. Target  TAC Vista & Security Integration  Key customer groups –Existing TAC Vista users Provide features and hardware for security.
Reference and Instruction Automated Statistics Gathering and Reporting System Members: Patrick Chen (pyc7) Soo-Yung Cho (sc444) Gregg Herlacher (gah24)
© 2009 Research In Motion Limited Methods of application development for mobile devices.
File Systems and Databases
Computer Skills By Ian Cole Lecturer in C&IT (Communications and Information Technology) University of York Department of Health Sciences Presentation.
Requirements Analysis 5. 1 CASE b505.ppt © Copyright De Montfort University 2000 All Rights Reserved INFO2005 Requirements Analysis CASE Computer.
Help and Documentation CSCI324, IACT403, IACT 931, MCS9324 Human Computer Interfaces.
Lab 03 Windows Operating Systems (Cont.). PYP002 Preparatory Computer ScienceWindows Operating System2 Objectives Develop a good understanding of 1. The.
EValid Getting Started. Agenda Introduction to eValid First experience of using eValid Recording and Site Analysis in eValid.
8 Systems Analysis and Design in a Changing World, Fifth Edition.
Supplement 02CASE Tools1 Supplement 02 - Case Tools And Franchise Colleges By MANSHA NAWAZ.
Introduction to eValid Presentation Outline What is eValid? About eValid, Inc. eValid Features System Architecture eValid Functional Design Script Log.
XP New Perspectives on Microsoft Access 2002 Tutorial 71 Microsoft Access 2002 Tutorial 7 – Integrating Access With the Web and With Other Programs.
User Interface Design Chapter 11. Objectives  Understand several fundamental user interface (UI) design principles.  Understand the process of UI design.
Mobile Multimodal Applications. Dr. Roman Englert, Gregor Glass March 23 rd, 2006.
Guide to MCSE , Second Edition, Enhanced 1 Objectives Understand and use the Control Panel applets Describe the versatility of the Microsoft Management.
CASE Tools And Their Effect On Software Quality Peter Geddis – pxg07u.
TrendReader Standard 2 This generation of TrendReader Standard software utilizes the more familiar Windows format (“tree”) views of functions and file.
Windows XP 101: Using Windows XP Professional in the Classroom.
VoiceXML Builder Arturo Ramirez ACS 494 Master’s Graduate Project May 04, 2001.
Introduction to Interactive Media 02. The Interactive Media Development Process.
A VERY USEFUL E-LEARNING TOOL FOR TEACHERS, RESEARCHERS, AND STUDENTS.
Overview of SQL Server Alka Arora.
A+ Guide to Managing and Maintaining Your PC Fifth Edition Chapter 15 Installing and Using Windows XP Professional.
Tech Tools to Support Literacy in the Content Area ATEN Region 2 July 2005 July 2005.
LBTO IssueTrak User’s Manual Norm Cushing version 1.3 August 8th, 2007.
Microsoft Office 2013 ®® Access Tutorial 4 Creating Forms and Reports.
Functions of a Database Management System
Author: James Allen, Nathanael Chambers, etc. By: Rex, Linger, Xiaoyi Nov. 23, 2009.
11.10 Human Computer Interface www. ICT-Teacher.com.
User interface design. Recap OOD is an approach to design so that design components have their own private state and operations Objects should have constructor.
CSC 480 Software Engineering Lecture 19 Nov 11, 2002.
Introduction to Interactive Media The Interactive Media Development Process.
Gorman, Stubbs, & CEP Inc. 1 Introduction to Operating Systems Lesson 4 Microsoft Windows XP.
Object-Oriented Software Engineering Practical Software Development using UML and Java Chapter 7: Focusing on Users and Their Tasks.
MULTIMEDIA DEFINITION OF MULTIMEDIA
CHAPTER TEN AUTHORING.
PowerPoint Presentation for Dennis, Wixom, & Tegarden Systems Analysis and Design with UML, 3rd Edition Copyright © 2009 John Wiley & Sons, Inc. All rights.
 Product › Description › Prototype › Web site › Purchasing › Customer Service › Operations / Manufacturing  Target Market  Sale Channels  Financials.
Tutorial 4 Creating Forms and Reports
Key Applications Module Lesson 21 — Access Essentials
Expert System Job Offer Evaluation Software May Abstract The project’s focus is to decide what criteria should be used to determine which job offer.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
E.g.: MS-DOS interface. DIR C: /W /A:D will list all the directories in the root directory of drive C in wide list format. Disadvantage is that commands.
Syntellect Outbound Communicator. Copyright© 2008 Syntellect Inc. All Rights Reserved 2 Agenda Syntellect Outbound Communicator Overview What is Predictive.
BioRAT: Extracting Biological Information from Full-length Papers David P.A. Corney, Bernard F. Buxton, William B. Langdon and David T. Jones Bioinformatics.
3 Copyright © 2004, Oracle. All rights reserved. Working in the Forms Developer Environment.
Controlling Computer Using Speech Recognition (CCSR) Creative Masters Group Supervisor : Dr: Mounira Taileb.
Hosted Voice & Hosted Contact Center
Introduction to Interactive Media Interactive Media Tools: Authoring Applications.
1 Technical & Business Writing (ENG-715) Muhammad Bilal Bashir UIIT, Rawalpindi.
CASE (Computer-Aided Software Engineering) Tools Software that is used to support software process activities. Provides software process support by:- –
Lti Shaping Spoken Input in User-Initiative Systems Stefanie Tomko and Roni Rosenfeld Language Technologies Institute School of Computer Science Carnegie.
Chapter – 8 Software Tools.
Speech Processing 1 Introduction Waldemar Skoberla phone: fax: WWW:
Stanford hci group / cs376 u Jeffrey Heer · 19 May 2009 Speech & Multimodal Interfaces.
Chapter 2 – Introduction to Windows Operating System II Manipulating Windows GUI 1CMPF112 Computing Skills for Engineers.
Architecture Review 10/11/2004
System Design Ashima Wadhwa.
                      Digital Audio 1.
Chapter 10 Development of Multimedia Project
Introducing Windows Operating Systems
Presentation transcript:

The Universal Speech Interface (USI) PDG Progress Report Thomas Harris, Stefanie Tomko, Arthur Toth, James Sanders, Alex Rudnicky, Roni Rosenfeld School of Computer Science Carnegie Mellon University 4 June 2003

Outline USI Project Summary USI Device Control USI User Studies Tech Transfer Initiative –USI Application Generator

Program Goals and Plan Overall program goal: –Design a universal (i.e. device-independent) interface for speech-based interaction with wearable and home devices Program plan & milestones: –Q1: analysis, interaction principles –Q2: build device-simulation environment –Q3: build first device prototype –Q4: initial user studies; development tools

Program Deliverables A novel universal design for speech-based interaction with wearable- and home- devices At least one demonstration system exemplifying the new interface A set of tools for rapid prototyping of compliant applications

The Universal Speech Interface (USI) In a Nutshell Unifying approach to human-machine speech communication Unified “look and feel” across all applications –analogous to the Xerox/Macintosh/Windows GUI look-and-feel Stylized, semi-natural interaction –analogous to the “Graffiti” alphabet for the Palm PDA

Existing Speech Paradigm 1: Command-and-control Systems Specialized language, optimized for a given application –each application has its own interface Intensive training of each user Daily use helps retain knowledge

Existing Speech Paradigm 2: Unconstrained Dialog Systems “Off-the-street” users, no training required System models existing human behavior But this comes at a cost: –each application requires a great deal of data, labor, human expertise –Speech Recognition technology is pushed to the limit –user does not easily grasp the application’s functional limits Out-Of-Vocabulary words (OOV) Out-Of-Domain concepts, requests

Is a Third Paradigm Needed? In practice, people are likely to use: –a handful of apps daily: scheduler, contact manager, ,... –many apps occasionally: weather, restaurants,... To exploit this, we need: –flexible, powerful interface for familiar applications. –immediate engagement with occasional or new applications.

Our Approach Identify application-independent universals: –user-side –machine-side Find suitable, general solutions –Human and machine meeting halfway Design a stylized, universal “look and feel” Teach it in 5 minutes

Universal Semantic primitives Help primitives –what can the machine do? how do I do X? what can I say? Speech channel primitives –detect & correct ASR errors; finished talking? Interaction primitives –turn taking; question answering; session management; undo Application primitives –environment variables: query, set –objects (e.g. lists): describe, navigate, create, modify, delete

USI Systems Developed Information Access –MovieLine –FlightLine –ApartmentLine Device Control –Stereo system –X-10 control (e.g., lights) –Alarm Clock applet –Digital Video Camera –Windows Media Player

USI Demonstration MovieLine –Experimental subject

USI Device Control

Device Interaction Analysis Analysis was done on multiple devices –alarm clock / radio –VCR –cell phone –MP3 player –memo pad / / vmail –copier/fax

USI/Device Design Issues Confirmation strategy Error handling strategy Exploration Navigation Disambiguation / context mgmt Orientation Querying state variables

USI/Device Design Issues Confirmation strategy: restate-&-execute Error handling strategy: ignore Exploration: “OPTIONS” Navigation: use concept of ‘focus’ Disambiguation / context mgmt: implicit Orientation: “STATUS” Querying state variables: “WHAT IS THE...?”

Hooking up with the PUC project Fits within the PUC project’s vision of automatically generated interfaces with different modalities and form factors But, can also be used as a standalone speech interface Compatibility with visual design is desirable, but not always natural: –nameless states (speech interface must have name for everything!) –speech interface can have shortcuts (“MODE: CD” vs. “CD”)

Meshing with the PUC project Device capabilities specified by XML doc States vs. Action dichotomy of the visual interface does not always conform to speech interface intuition. For now, creating our own interface specification document Ultimately, will augment XML DTD, so both interfaces can co-exist

USI Device control (a.k.a. James the Butler) Hardware hacking courtesy of the PUC project

USI Demonstration Device Control –Alarm Clock Example

User Studies

User study Compared Speech Graffiti (SG) & natural language MovieLines How does Speech Graffiti compare to a natural language interface? –Subjective user satisfaction –Task completion rates –Word error rates How do well do users "get" Speech Graffiti? –How often do they speak within the grammar? –In what ways do they deviate from the grammar?

Subjective user satisfaction 17 of 23 preferred Speech Graffiti (SG) SG user satisfaction ratings higher than NL in all categories SG ratings positive except in annoyance & habitability

Computer experience & training Computer Science / Engineering backgrounds and / or programming experience –Higher user satisfaction ratings –Better task completion rates Training in-domain vs. out-of-domain –No differences in user satisfaction or task completion rates

Task completion Overall –67.9% SG tasks –67.4% NL tasks Individual means –5.43 of 8 SG tasks –5.30 of 8 NL tasks

Time-to-completion Completed tasks –67.9 seconds SG –73.4 seconds NL Incomplete tasks: time, in seconds “best case”“real world” (inc) (inc) incompletes SGML NLML

Turns-to-completion Completed tasks –8.2 turns SG –3.9 turns NL Incomplete tasks: (inc) “best case”“real world” # of turns SG-ML NL-ML 59 incompletes 2

Word error rates Very high for both systems –On "cleaned" set (on-task, non-noisy utts) Concept error is lower for USI –SG: –29.2% from WER –NL: +0.8% from WER Low error rate is key to acceptance –6 who preferred NL-ML had highest SG WER

WER & user satisfaction Good correlation for SG SG-ML % word-error rate user satisfaction rating NL-ML

How often do users speak within the Speech Graffiti grammar? Actually, pretty often! … and grammaticality leads to user satisfaction

How do users deviate from the grammar?

Future Interface Design Work Redesign Help facility –SG works best for those who "get it" –Current system provides no assistance to "clueless user" Error analysis –Compare failure cases in SG and NL interfaces –Compare user recovery attempts in SG and NL Address issues of generalizability –Promoting transparency of slot set and response sets –Accessing information sets rather than single items Adjust grammar components

Future Architecture Work Integrate current USI environments –Information Access –Device Control Improve interface between PUC and USI components Identify USI-specific techniques to achieve lower WER Improved documentation and distribution packaging

Tech Transfer Initiative

Tools for creating new USI apps –3 days to create a new application –prior exposure to speech technology highly beneficial –decided to further reduce the barrier –  create an application generator

From 3 Days to a Few Hours A USI Application Generator New USI applications w/out programming! XML document fully specifies the application –slot names –accepted inputs –data types –slot properties –...

From a Few Hours to 15 minutes? Created a Web interface to generating the XML document Form filling, pulldown menus Strong effort to further simplify the process, minimize complexity of form –many defaults –for less common choices, edit the XML doc. More importantly, no computer savvy needed

Web Application Generator Repository and tool for creating USI database applications Abundant online help to guide users through process Accessible to anyone with an Internet connection

Web Application Generator Two step process: –General specification –Slot-by-slot specification choose datatype from built-in list, or create own Fully featured system with save, copy, delete functionality Hides intricacies of XML document writing Advanced users have ability to further alter the final XML document

General Specification screen with help box displayed.

Web Application Generator Built-in generic voice; can record own voice DB backend –Postgres –Oracle –ODBC (including ASCII files) –Ultimately: web tables Platform: –originally: mixed Unix/Windows, telephone based –converted to: pure Windows, telephone or laptop

Transferring USI to PDG members We do house calls! –Carnegie Mellon will install USI developer environment for each interested member and will train member staff in the use of the developer environment –Provide a short tutorial on USI principles and interface design

Thank you! Pittsburgh Digital Greenhouse