Presentation on theme: "Designing Systems for Next-Generation I/O Devices Mitchell Tsai, Peter Reiher, Jerry Popek UCLA May 20, 1999."— Presentation transcript:
Designing Systems for Next-Generation I/O Devices Mitchell Tsai, Peter Reiher, Jerry Popek UCLA May 20, 1999
Problem Next-Generation I/O performs poorly with existing applications and operating systems. –Examples of next-generation sensors/actuators Speech, vision, handwriting, physical location… –AI meets real General-Purpose Systems. Not in the sandbox anymore! –What should OSs provide for these technologies?
Current Systems Keyboard & Mouse GUI Interface OS & Applications 80-99% accuracy Requires 100% accuracy in critical situations One input at a time, from one source TextRange.Font.Color = ppAccent1 Sounds Speech Enabler Speech Recognition Engine OS & Applications Grammar Best PhraseCommand “Make the text blue”
Noise & Errors Existing Metrics (Accuracy & Speed) are not good enough. Dictation: 99% accuracy at 150 wpm 10 sec/error = 20% time correcting errors! TypeTime (sec)Speed (wpm)% Total Time T speech 38 160 16% T delay 33 85 14% T corrections 131 30 57% T proof-reading 29 26 13% T total 230 26100% T total = T speech + T delay + T corrections + T proof-reading 4882 9 40 X
Command & Control Errors 1) Most programs have No Undo capability 2) One Keystroke Loss –Cancel in MS Money –Paste instead of Copy on PalmPilot 3) Undo requires advanced knowledge –MS Word accidental shift to outline mode 4) Undo is inconsistent between programs –One text selection (Outlook Mail) or two (Netscape Mail)
From Dictation to Commands Commands are worse than dictation –Con: Errors can be irreversible and/or dangerous –Con: Dictation delays processing to increase accuracy –Pro: Smaller grammars produce higher accuracy Error handling “ad hoc” & insufficient –Handled twice by sensor processor & application –Programmers design custom interfaces (or programs!) –Users confused by inconsistencies How to leverage new inputs? –Context-sensitive and ambiguous commands
Outline Problems of Next-Generation Sensors BabySteps: Some Dialogue Management Services Related Work Design Issues for Post-GUI Environments
Next-Generation Sensors Direct –speech, handwriting, vision (eye gaze, pointing, gesture) Indirect –vision (head and eye focus), geographic location, identification badges, emotions (affective computing). Traditional –network connectivity, computer resources.
4 Main Problems of Next-Generation Sensors 1) Noise –“Make this b… red”, Sporadic incorrect GPS readings 2) Errors –Accidental user errors, Sensor processor mistakes 3) Ambiguity –“Make this box red”: Which box? 4) Fragmentation –Simultaneous inputs from speech, pointing, & vision
Sequences of Errors Series of commands –“cd thisdir; mv foo..; rm *” Linear Undo Stack problems –Accidentally undo a few operations (X, Y, Z) –Type “A” –Lose all operations on the stack (X, Y, Z) Quit without Save, Accidental Command Mode –Oops!, Confirmed a “Yes/No/Cancel” box.
BabySteps Context Management Command Processing Modules Sounds Command Processing Speech Interpreter OS & Applications Dangerous commands Safe commands “We are in context 7 now.” Grammar for context 7 Command Properties for context 7.
Command Management 1) Command Services must be provided by OS –Recording, editing, filtering,... 2) Command Properties must be communicated to OS –Ambiguous, context-sensitive events (from sensors) –Safety, reversibility, usage patterns, cost (from applications) 3) Command Processing Modules –Safety Filter, Usage Tracker, Cost Evaluator
How Speech Recognition Works Aisle I’ll I loathe of love you view Hugh I’ll of view I loathe you I love you I love Hugh Acoustic Model Best Match Language Model Best Match Two Model Best Match Best in different context 4 Models in Current Systems: Acoustic, Language, Vocabulary, Topic
Methods for Better Accuracy Speech Engines can produce scored output Score (Phrase | Sound) = –100 to 100 Combine sensor information with application or OS information using likelihoods(L). L(Command | Sound, Context) = L(Command | Context) * L(Command | Phrase, Context) * L(Phrase | Sound) where L(A) = F(A) / ( A F(A) – F(A)) and F(A) can be P(A) or some other scoring function
Explicit Contexts From User Behavior Analysis Example: –Context A = a priori probabilities for “editing” commands –Context B = a priori probabilities for “viewing” commands Other Types of Explicit Contexts –Variations on Least Recently Used (LRU) –Simple Markov Models –Hidden Markov Models (HMMs) –Bayesian Networks
Probabilistic Context-Sensitive Events 90% Region X, 10% Region YSelect “box 3”, “line 4”, and “box 10” Fuzzy Mouse Movement Low-level Events Mid-level Events High-level Events
Probabilistic Objects in Events Type = Speech P Clarification = 0.6 N Commands = 3 Command = “Thicken line 11”, L = 0.61 Command = “Thicken line 13”, L = 0.24 Command = “Quit”, L = 0.15 “Thicken”
User Clarification Consider P Clarification, the probability that we should clarify the command with the user: P Clarification = [1-L(Command ML, Context)] * L Reversible (Command ML, Context) * L Cost (Command ML, Context) Command ML is the Most Likely command. L Reversible = 0 to 1 (1 means fully reversible) L Cost = 0 to 1 (a normalized version of cost) Reversibility and cost can reduce seriousness of errors, but they may increase the total time required to finish a task! What is the relative utility of different types of clarification?
BabySteps: Additional Factors Performance Evaluation –Error Hierarchy –New Commands –“Ambiguity is a Strength, not a Problem” “Transparency is not the best policy.” –How to get Feedback from the user? Passive/Active –Different Types of “Cancel” “Oops”, “Wrong”, “Backtrack”
Application Performance: Error Types Desired Effect 2% Inaction13% Confirmation 0% Minor 0% –Undoable Medium –Fixable (1 command) –Fixable (Few commands) –Unrecoverable (Many commands) 8% Major 2% –Exit without Save, Application Crash/Freeze 1% 8% 1% 9% 5% 8%
Extended Benefits for Applications Combining speech & mouse commands –Speech: “Make these arrows red.” –Mouse: Move around arrows and other objects. Command Processing Modules Sound Command Processing Speech Interpreters OS & Apps Mouse: Fuzzy Pointing Ambiguity & Context = Convenience
Ambiguity can be a Strength Ambiguity is usually considered a problem. –If the user makes a precise command, and sensors provide perfect interpretation, then the application should know exactly what to do. Exact precision by the user may be impossible or extremely time-consuming. Consider PowerPoint: –Moving the cursor to change modes Select Object Move Object => Resize Object Copy Object –Selecting objects (and groups of objects) Very close and/or overlapping (esp. with invisible boundaries) From layers of different groups –Making object A identical with object B in size, shape, color, etc...
BabySteps Summary New sensors & user inputs present a family of problems –Noise, Errors, Ambiguity, Fragmentation BabySteps: Some Dialogue Management Services 1) Command Management - Command Services & Command Properties 2) Context Management - Analyze Behavior Patterns, Explicit Contexts 3) Communicate Ambiguous Information - Probabilistic, Richer Performance Evaluation –New Metrics: Total Task Time, Error Hierarchy –New Commands: Will they pass usability threshold? –Transparency vs. Communication (User Feedback & Control) –Ambiguity is a Strength
BabySteps approach to 4 Main Problems 1) Noise –Facilitate closer interaction between sensor processors & applications –Reduce impact of errors through command & context management 2) Errors –Use user behavior analysis to detect, fix, and/or override errors. –Ask user for help based on context and command properties 3) Ambiguity –Limited context-sensitive speech and mouse 4) Fragmentation –Probabilistic, temporal multimodal grammars not handled yet
Related Work Context-Handling Infrastructures –Context Toolkit: Georgia Tech Provides context widgets for reusable solutions to context handling [Salber, Dey, Abowd 1998, 1999]. Multimodal Architectures (Human-Computer Interfaces) –QuickSet: Oregon Graduate Institute First robust approach to reusable scalable architecture which integrates gesture and voice. [Cohen, Oviatt, et al. 1992, 1997, 1999]. Context Advantages for Operating Systems –File System Actions: UC Santa Cruz Uses Prediction by Partial Match (PPM) to track sequences of File System Events for a predictive cache [Kroeger 1996, 1999].
Related Work CHI-99 –“Nomadic Radio: Scaleable and Contextual Notification for Wearable Audio Messaging”: MIT Priority, Usage Level, & Conversations [Sawney, Schamandt 1999]. –LookOut, “Principles of Mixed-Initiative User Interfaces”: MSFT Utility of Action vs. Non-action vs. Dialog. [Horvitz 1999]. –“Patterns of Entry and Correction in Large Vocabulary Continuous Speech Recognition Systems”: IBM/Univ. of Michigan Compares Dragon, IBM, & L&H. Speech 14 cwpm (vs. keyboard 32 cwpm). [Karat, Halverson, Horn, Karat 1999]. –“Model-based and Empirical Evaluation of Multimodal Interactive Error Correction”: CMU/ Universität Karlsruhe Models multimodal error correction attempts using T Attempt = T Overhead + R*T Input [Suhm, Myers, Waibel 1999].
Related Work Multimodal Grammars –Oregon Graduate Institute [Cohen, Oviatt, et al. 1992, 1997]. –CMU [Vo & Waibel 1995, 1997]. Command Management –Universal Undo [Microsoft] –Task-Based Windows UI [Microsoft] Context Management (CONTEXT-97, CONTEXT-99) –AAAI-99 Context Workshop “Operating Systems Services for Managing Context” [Tsai 1999] –AAAI-99 Mixed-Initiative Intelligence “Baby Steps Towards Dialogue Management” [Tsai 1999] Probabilistic & Labeled Information in OS –Eve [Microsoft]
Post-GUI Systems Operating Systems Artificial IntelligenceUser Interfaces Real People Computer People Special People General Public Next-Generation Sensors/Actuators
Design Issues for Post-GUI Environments Performance may be driven by mobility & ubiquity. –Hard to beat desktop performance, except for specialized tasks –But why not design good macros? Or use 2+ pointers/mice? –Even with no video screen or keyboard, use buttons (e.g. PalmPilot) –Speech and video good for rapid acquisition of data What are new tasks for smart mobile environments? –Summarize ongoing tasks (e.g. “Car, what was I doing?) –Real dialogue is mixed-initiative (All commands are backgrounded!) –Control of multiple applications (Consider JAWS. Is this needed?) –Context-sensitive communication (Where’s the nearest pizza?)
Possible Changes Explicit Contexts for Communication –For users, or for system services –What format for communicating events & contexts? –What command properties should applications support? Database-like Rollback/Transactions for Application Commands –In addition to Elephant File System (HotOS 1999) –Making the entire computer more bulletproof, temporal history –Support dialogue management rather than linear commands Command and Task History –How to handle? Databases? Trees? Human conversation? –Real Dialogue Management
Possible Changes II “Faster is not better.” –“Courteous Computing” (Horvitz, Microsoft) –Pre-executing tasks works best in MS Outlook with 1 sec delay –Alternative to “Yes/No” dialog = Announce action & wait 1 sec User I/O must be buffered, filtered, & managed –Normal dialog is a series of background commands –Speech-only output may be a queue of application output requests –Variable environment conditions low/high bandwidth connections & Video/PalmPilot –What if user must switch modalities midstream? Separate SAPI, GUI may not work - Need Multimodal API
Possible Changes III Applications not designed for multiple commands. –Currently submenus & dialog box sequences help narrow context. –Procedures GUI event loops Post-GUI dialogue Windows event systems aren’t either. I/O not designed for rapid interactive haptic/visual systems. –1/3 sec (300 ms) responses good for conscious responses –But not for unconscious actions 1 ms visual tracking, 70 ms haptic responses, 150 ms visual responses Cost/Delay of sensor processors extremely high –How to give e-mail system priority responsiveness? Unified resource management, Soft Real-Time Systems –Governed by new Command Properties and Context Knowledge
Possible Changes IV Use Probabilistic & Multi-faceted Info throughout OS –Task-based file identification –Multiple configuration setups (NT dialup) Applications could be designed for ambiguous and context- sensitive commands Context-based Adaptive Computing, Active Networks Will a more context-aware system provide resiliency? –Rather than super-slow AI learning?
Possible Changes V How do we support transition to real English dialogue? “Computerese” may co-exist with –natural human spoken & gestural languages –command-line & GUI computer interfaces Can other protocol learn from human languages? –Use ambiguity, synonyms. –Different Types of ACKs, NACKs
Future Directions If the System & Algorithm people can provide X, can the UI people design good ways use this information? If the UI or Device has characteristic Y, what must the system and algorithm people provide? New sensors & user inputs present a family of problems –Noise, Errors, Ambiguity, Fragmentation User I/O may need a whole family of User Dialogue services, similar to networking, file management, or process control.