Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stanford hci group / cs376 u Jeffrey Heer · 19 May 2009 Speech & Multimodal Interfaces.

Similar presentations


Presentation on theme: "Stanford hci group / cs376 u Jeffrey Heer · 19 May 2009 Speech & Multimodal Interfaces."— Presentation transcript:

1 stanford hci group / cs376 http://cs376.stanford.ed u Jeffrey Heer · 19 May 2009 Speech & Multimodal Interfaces

2 Project Progress Meetings Monday May 25, 30 minute meetings Open times: 9am – 6pm (except 12:30-2pm) To-do ASAP:  Email cs376 with any times you can not make, plus any time preferences Submit online by 7am Friday:  Any materials you want to discuss 2

3 Final Project Presentations Tuesday June 9, 3:30-6:30pm, 104 Gates 8 minute presentations  6 minutes to present your research  2 minutes for questions  More details on course website 3

4 4 Some hci definitions  Multimodal generally refers to an interface that can accept input from two or more combined modes  Multimedia generally refers to an interface that produces output in two or more modes  The vast majority of multimodal systems have been speech + pointing (pen or mouse) input, with graphical (and sometimes voice) output

5 5 Canonical App: Maps Why are maps so well-suited? A visual artifact for computation (Hutchins)

6 6 What is an interface?  Is it an interface if there’s no method for a user to tell if they’ve done something?  What might an example be?  Is it an interface if there’s no method for explicit user input?  example: health monitoring apps

7 7 Multimodal vs. Sensor Fusion multimodal = multiple human channels sensor fusion = multiple sensor channels  Example: Tracking people (1 human channel)  might use: RFID + vision + keyboard activity + …  Disagree with the Oviatt paper?  Speech + lips: multimodality or sensor fusion

8 8 What constitutes a modality?  To some extent, it’s a matter of semantics  Is pen a different modality than a mouse?  Is a captured modality the same as an input modality?  How does an audio notebook fit into this?

9 9 Input modalities mouse pen: recognized or unrecognized speech non-speech audio tangible object manipulation gaze, posture, body-tracking  Each of these experiences has different implementing technologies

10 10 Output modalities Visual displays  Raster graphics, oscilloscope, paper printer, … Haptics, e.g. force feedback Audio Smell Taste

11 11 Why multimodal?  Hands busy / eyes busy  Mutual disambiguation  Faster / higher bandwidth communication  “More natural”

12 Example Systems  Dual-Purpose Speech [Georgia Tech]  Active Capture [UC Berkeley] 12

13 27 April 2004 13 System Direction of Human Action Applications: “Directive” systems Now SCREAM!!! MEDIA TRAINING SAFETY

14 14 Multimodal Software Architectures  OAA: Open Agent Architecture  AAA: Adaptive Agent Architecture  OOPS: Organized Option Pruning System

15 How to handle error & ambiguity? Avoidance  Adopt strategies that reduce probability of error or simplify recognition tasks Repetition  Elicit new, less ambiguous input  How to vary prompt to improve recognition? Choice  Let user choose from a ranked list of alternatives 15

16 Anticipation  Anticipate common errors before they happen. Actively seek out and address problems before they disrupt interaction.

17 External Aids  Use physical props or other external aids to guide actions and provide feedback.

18 Confirmation  Explicitly query the user to ensure they are in the expected state.

19 Progressive Assistance  Provide “successively more informative error messages which consider the probable context of misunderstanding” [Yankelovich95].

20 Modality Shifts  When a particular direction approach repeatedly fails, switch or augment the modalities of communication, e.g., use visual rather than auditory cues.

21 Level of Discourse  Simplify vocabulary and language when people have difficulty understanding.

22 Backtracking  When grounding is lost, backtrack to the last state of mutual understanding.

23 Graceful Failure  When all else fails, provide natural exits from the interaction. JUMP!

24 Project Progress Meetings Monday May 25, 30 minute meetings Open times: 9am – 6pm (except 12:30- 2pm) To-do ASAP:  Email cs376 with any times you can not make, plus any time preferences Submit online by 7am Friday:  Any materials you want to discuss 24


Download ppt "Stanford hci group / cs376 u Jeffrey Heer · 19 May 2009 Speech & Multimodal Interfaces."

Similar presentations


Ads by Google