Multi-Modal Dialogue in Personal Navigation Systems Arthur Chan.

Slides:



Advertisements
Similar presentations
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Advertisements

Map of Human Computer Interaction
Manuela Veloso, Anthony Stentz, Alexander Rudnicky Brett Browning, M. Bernardine Dias Faculty Thomas Harris, Brenna Argall, Gil Jones Satanjeev Banerjee.
An overview of EMMA— Extensible MultiModal Annotation Michael Johnston AT&T Labs Research 8/9/2006.
Chapter 5 Input and Output. What Is Input? What is input? p. 166 Fig. 5-1 Next  Input device is any hardware component used to enter data or instructions.
XISL language XISL= eXtensible Interaction Sheet Language or XISL=eXtensible Interaction Scenario Language.
Richard Yu.  Present view of the world that is: Enhanced by computers Mix real and virtual sensory input  Most common AR is visual Mixed reality virtual.
Dieter Kopp Dieter Kopp Alcatel Research & Innovation Distributed Speech Recognition ETSI STQ Aurora Distributed.
Ubiquitous Computing Computers everywhere. Agenda Old future videos
Stanford hci group / cs376 research topics in human-computer interaction Multimodal Interfaces Scott Klemmer 15 November 2005.
ISTD 2003, Audio / Speech Interactive Systems Technical Design Seminar work: Audio / Speech Ville-Mikko Rautio Timo Salminen Vesa Hyvönen.
Support for Palm Pilot Collaboration Including Handwriting Recognition.
WP1 UGOT demos 2nd year review Saarbrucken Mar 2006.
Multi-Modal Dialogue in Personal Navigation Systems Arthur Chan.
CHAPTER 2 Input & Output Prepared by: Mrs.sara salih 1.
1 / 23 Microsoft Tablet PC Technology Thomas Dodds Declan O’Gorman David Pickles Stephen Pond An overview of Microsoft Tablet PC technology and current.
Mobile Multimodal Applications. Dr. Roman Englert, Gregor Glass March 23 rd, 2006.
Finding Nearby Wireless Hotspots CSE 403 LCA Presentation Team Members: Chris Scoville Tessa MacDuff Matt Mohebbi Aiman Erbad Khalil El Haitami.
Some Voice Enable Component Group member: CHUAH SIONG YANG LIM CHUN HEAN Advisor: Professor MICHEAL Project Purpose: For the developers,
BlindAid Semester Final Presentation Sandra Mau, Nik Melchior, and Maxim Makatchev.
Chapter 5 Input. What Is Input? What are the input devices? Input device is any hardware component used to enter data or instructions Data or instructions.
1. Human – the end-user of a program – the others in the organization Computer – the machine the program runs on – often split between clients & servers.
ASSISTIVE TECHNOLOGY PRESENTED BY ABDUL BARI KP. CONTENTS WHAT IS ASSISTIVE TECHNOLOGY? OUT PUT: Screen magnifier Speech to Recogonizing system Text to.
Center for Human Computer Communication Department of Computer Science, OG I 1 Designing Robust Multimodal Systems for Diverse Users and Mobile Environments.
Mobile HCI Presented by Bradley Barnes. Mobile vs. Stationary Desktop – Stationary Users can devote all of their attention to the application. Very graphical,
1 Usability Challenges for Newer Technologies Lawrence Najjar
11.10 Human Computer Interface www. ICT-Teacher.com.
GOOGLE GLASS Contents:-  Introduction  Technologies used  How it works?  Advantages  Disadvantages  Future scope  Conclusion.
Ubiquitous Computing Computers everywhere. Where are we going? What happens when the input is your car pulls into the garage, and the output is the heat.
Spoken dialog for e-learning supported by domain ontologies Dario Bianchi, Monica Mordonini and Agostino Poggi Dipartimento di Ingegneria dell’Informazione.
Multimodal Information Access Using Speech and Gestures Norbert Reithinger
Mobile Interfaces What is different – Users are on the go Attention often needs to be paid to surroundings (walking, driving, etc.) rather than the interface.
Section 4.2 AQA Computing A2 © Nelson Thornes 2009 Types of Operating System Unit 3 Section 4.1.
Voice User Interface
Human Computer Interaction © 2014 Project Lead The Way, Inc.Computer Science and Software Engineering.
Human-Computer Interaction
Introduction to Software Development. Systems Life Cycle Analysis  Collect and examine data  Analyze current system and data flow Design  Plan your.
E.g.: MS-DOS interface. DIR C: /W /A:D will list all the directories in the root directory of drive C in wide list format. Disadvantage is that commands.
School of something FACULTY OF OTHER Facing Complexity Using AAC in Human User Interface Design Lisa-Dionne Morris School of Mechanical Engineering
Referring to Objects with Spoken and Haptic Modalities Frédéric LANDRAGIN Nadia BELLALEM & Laurent ROMARY LORIA Laboratory Nancy, FRANCE.
Mixed Reality: A Model of Mixed Interaction Céline Coutrix, Laurence Nigay User Interface Engineering Team CLIPS-IMAG Laboratory, University of Grenoble.
1 Human Computer Interaction Week 5 Interaction Devices and Input-Output.
© Paradigm Publishing, Inc. 4-1 Chapter 4 System Software Chapter 4 System Software.
USER INTERFACE DESIGN (UID). Introduction & Overview The interface is the way to communicate with a product Everything we interact with an interface Eg.
The Project-Sony Self-governing Observer with Navigation by Yourself Student: Clion Jean-Baptiste Supervisor: Dr Christophe Meudec Presentation.
VoiceXML Version 2.0 Jon Pitcherella. What is it? A W3C standard for specifying interactive voice dialogues. Uses a “voice” browser to interpret documents,
Human Factors in Mobile Computing By: Ed Leland EEL
CARE properties Chris Vandervelpen
Multimodal SIG © 2007 IBM Corporation Position Paper on W3C Workshop on Multimodal Architecture and Interfaces - Application control based on device modality.
The “Spatial Turing Test” Stephan Winter, Yunhui Wu
Chatter Box Daniel Dunham Nick Noack Mike Nelson.
5. 2Object-Oriented Analysis and Design with the Unified Process Objectives  Describe the activities of the requirements discipline  Describe the difference.
Pen Based User Interface II CSE 481b January 25, 2005.
Stanford hci group / cs376 u Jeffrey Heer · 19 May 2009 Speech & Multimodal Interfaces.
W3C Multimodal Interaction Activities Deborah A. Dahl August 9, 2006.
Software Architecture for Multimodal Interactive Systems : Voice-enabled Graphical Notebook.
Speech and multimodal Jesse Cirimele. papers “Multimodal interaction” Sharon Oviatt “Designing SpeechActs” Yankelovich et al.
A seminar by Ramesh Kumar Raju S CSSE 07121A1547.
Finite State Machines ENGR 110 #7 2016
HARDWARE The hardware is the part you can see the computer, ie all components of their physical structure. The screen, keyboard, and mouse tower of the.
Types of Operating System
Ubiquitous Computing and Augmented Realities
3.0 Map of Subject Areas.
Multimodal Interfaces
(No need of Desktop computer)
Optimizing Multimodal Interfaces for Speech Systems in the Automobile
Multimodal Human-Computer Interaction New Interaction Techniques 22. 1
Implementation support
Map of Human Computer Interaction
Implementation support
Presentation transcript:

Multi-Modal Dialogue in Personal Navigation Systems Arthur Chan

Introduction The term “multi-modal”  General description of an application that could be operated in multiple input/output modes.  E.g Input: voice, pen, gesture, face expression. Output: voice, graphical output

Multi-modal Dialogue (MMD) in Personal Navigation System Motivation of this presentation  Navigation System provides MMD an interesting scenario a case why MMD is useful Structure of this presentation  3 system papers AT&T MATCH speech and pen input with pen gesture Speechworks Walking Direction System speech and stylus input Univ. of Saarland REAL Speech and pen input Both GPS and a magnetic tracker were used.

Multi-modal Language Processing for Mobile Information Access

Overall Function A working city guide and navigation system  Easy access restaurant and subway information Runs on a Fujitsu pen computer Users are free to  give speech command  draw on display with stylus

Types of Inputs Speech Input  “show cheap italian restaurants in chelsea” Simultaneous Speech and Pen Input  Circle and area  Say “show cheap italian restaurants in neighborhood” at the same time. Functionalities include  Review  Subway routine

Input Overview Speech Input  Use AT&T Watson speech recognition engine Pen Input (electron Ink)  Allow usage of pen gesture.  It could be a complex, pen input Use special aggregation techniques for all this gesture. Inputs would be combined using lattice combination.

Pen Gesture and Speech Input For example:  U: “How do I get to this place?”  S: “Where do you want to go from?”  U “25 th St & 3 rd Avenue” 

Summary Interesting aspects of the system  Illustrate the real life scenario where multi- modal inputs could be used  Design issue: how different inputs should be used together?  Algorithmic issue: how different inputs should be combined together?

Multi-modal Spoken Dialog with Wireless Devices

Overview Work by Speechworks  Jointly conducted by speech recognition and user interface folks  Two distinct elements Speech recognition In a embedded domain, which speech recognition paradigm should be used?  embedded speech recognition?  network speech recognition?  distributed speech recognition? User interface How to “situationlize” the application?

Overall Function Walking Directions Application  Assume user walking in an unknown city  Compaq iPAQ 3765 PocketPC  Users could Select a city, start-end addresses Display a map Control the display Display directions Display interactive directions in the form of list of steps.  Accept speech input and stylus input Not pen gesture.

Choice of speech recognition paradigm Embedded speech recognition  Only simple commands could be used due to computation limits. Network speech recognition  Bandwidth is required  Sometimes network would be cut-off Distributed speech recognition  Client takes care of front-end  Server takes care of decoding 

User Interface Situationalization  Potential scenario Sitting at a desk Getting out of a cab, building, subway and preparing to walk somewhere Walking somewhere with hands free Walking somewhere carrying things Driving somewhere in heavy traffic Driving somewhere in light traffic Being the passenger in a car Being in highly noisy environment.

Their conclusion Balances of audio and visual information  Could be reduced to 4 complementary components Single-modal 1, Visual Mode 2, Audio Mode Multi-modal 3, Visual dominant 4, Visual dominant

A glance of UI

Summary Interesting aspects  Great discussion on how speech recognition could be used in an embedded domain how the user would use the dialogue application

Multi-modal Dialog in a Mobile Pedestrian Navigation System

Overview Pedestrian Navigation System  Two components: IRREAL : indoor navigation system Use magnetic tracker ARREAL: outdoor navigation system Use GPS

Speech Input/Output Speech Input:  HTK / IBM Viavoice embedded and Logox was being evaluated Speech Output:  Festival

Visual output Both 2D and 3D spatialization supported

Interesting aspects Tailor the system for elderly people  Speaker clustering to improve recognition rate for elderly people  Model selection Choose from two models based on likelihood Elderly models Normal adult models

Conclusion Aspects of multi-modal dialogue  What kind of inputs should be used?  How speech and other inputs could be combined/interacted?  How users would use the system?  How the system should respond to the users?