Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multi-Modal Dialogue in Personal Navigation Systems Arthur Chan.

Similar presentations


Presentation on theme: "Multi-Modal Dialogue in Personal Navigation Systems Arthur Chan."— Presentation transcript:

1 Multi-Modal Dialogue in Personal Navigation Systems Arthur Chan

2 Introduction The term “multi-modal”  General description of an application that could be operated in multiple input/output modes.  E.g Input: voice, pen, gesture, face expression. Output: voice, graphical output

3 Multi-modal Dialogue (MMD) in Personal Navigation System Motivation of this presentation  Navigation System provides MMD an interesting scenario a case why MMD is useful Structure of this presentation  3 system papers AT&T MATCH speech and pen input with pen gesture Speechworks Walking Direction System speech and stylus input Univ. of Saarland REAL Speech and pen input Both GPS and a magnetic tracker were used.

4 Multi-modal Language Processing for Mobile Information Access

5 Overall Function A working city guide and navigation system  Easy access restaurant and subway information Runs on a Fujitsu pen computer Users are free to  give speech command  draw on display with stylus

6 Types of Inputs Speech Input  “show cheap italian restaurants in chelsea” Simultaneous Speech and Pen Input  Circle and area  Say “show cheap italian restaurants in neighborhood” at the same time. Functionalities include  Review  Subway routine

7 Input Overview Speech Input  Use AT&T Watson speech recognition engine Pen Input (electron Ink)  Allow usage of pen gesture.  It could be a complex, pen input Use special aggregation techniques for all this gesture. Inputs would be combined using lattice combination.

8 Pen Gesture and Speech Input For example:  U: “How do I get to this place?”  S: “Where do you want to go from?”  U “25 th St & 3 rd Avenue” 

9 Summary Interesting aspects of the system  Illustrate the real life scenario where multi- modal inputs could be used  Design issue: how different inputs should be used together?  Algorithmic issue: how different inputs should be combined together?

10 Multi-modal Spoken Dialog with Wireless Devices

11 Overview Work by Speechworks  Jointly conducted by speech recognition and user interface folks  Two distinct elements Speech recognition In a embedded domain, which speech recognition paradigm should be used?  embedded speech recognition?  network speech recognition?  distributed speech recognition? User interface How to “situationlize” the application?

12 Overall Function Walking Directions Application  Assume user walking in an unknown city  Compaq iPAQ 3765 PocketPC  Users could Select a city, start-end addresses Display a map Control the display Display directions Display interactive directions in the form of list of steps.  Accept speech input and stylus input Not pen gesture.

13 Choice of speech recognition paradigm Embedded speech recognition  Only simple commands could be used due to computation limits. Network speech recognition  Bandwidth is required  Sometimes network would be cut-off Distributed speech recognition  Client takes care of front-end  Server takes care of decoding 

14 User Interface Situationalization  Potential scenario Sitting at a desk Getting out of a cab, building, subway and preparing to walk somewhere Walking somewhere with hands free Walking somewhere carrying things Driving somewhere in heavy traffic Driving somewhere in light traffic Being the passenger in a car Being in highly noisy environment.

15 Their conclusion Balances of audio and visual information  Could be reduced to 4 complementary components Single-modal 1, Visual Mode 2, Audio Mode Multi-modal 3, Visual dominant 4, Visual dominant

16 A glance of UI

17 Summary Interesting aspects  Great discussion on how speech recognition could be used in an embedded domain how the user would use the dialogue application

18 Multi-modal Dialog in a Mobile Pedestrian Navigation System

19 Overview Pedestrian Navigation System  Two components: IRREAL : indoor navigation system Use magnetic tracker ARREAL: outdoor navigation system Use GPS

20 Speech Input/Output Speech Input:  HTK / IBM Viavoice embedded and Logox was being evaluated Speech Output:  Festival

21 Visual output Both 2D and 3D spatialization supported

22 Interesting aspects Tailor the system for elderly people  Speaker clustering to improve recognition rate for elderly people  Model selection Choose from two models based on likelihood Elderly models Normal adult models

23 Conclusion Aspects of multi-modal dialogue  What kind of inputs should be used?  How speech and other inputs could be combined/interacted?  How users would use the system?  How the system should respond to the users?


Download ppt "Multi-Modal Dialogue in Personal Navigation Systems Arthur Chan."

Similar presentations


Ads by Google