Download presentation
Presentation is loading. Please wait.
1
MUSCLE Multimodal e-team related activity Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Prof. Alex Potamianos Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Prof. Alex Potamianos
2
Goals Develop domain-independent algorithms and tools for rapid development by non-experts of state-of-the-art multi-modal dialogue systems Investigate the optimal modality mix (optimal = maximize UI efficiency and user satisfaction) Demonstrate the synergies between modalities and built a state-of-the-art MM-UI module
3
Multi-Modal User Interface Emphasis on synergies between modalities: Value(s) of attributes are displayed graphically Erroneous values can be easily corrected via the GUI Focus (aka context) of speech modality is highlighted Position and value ambiguity are shown (and typically resolved) via the GUI Voice prompts are significantly shorter GUI takes full advantage of intelligence of voice UI Three interaction modes implemented: click-to-talk, open-mike and modality selection
4
GUI examples Button Disabled
5
GUI Ambiguity Resolution
6
Click-to-Talk Examples Click to Talk Speech Interface Enabled GUI Disabled Beginning of Next Turn GUI Enabled
7
Open-Mike Examples Waiting for input via Speech or GUI (mouse and keyboard) Speech has been detected Beginning of Next turn
8
Modality Tracking Examples Click To Talk Mode Open Mike Mode
9
Experiments 15 naïve non-native users with varying level of English language knowledge and accent Application: form-filling, travel reservation (flight, hotel, car) 5 scenarios: one/two/three leg flight, round-trip flight with car, round-trip with hotel 5 systems: speech, GUI, click-to-talk, open-mike, modality selection 5x5 = 25 runs per user Scenarios and system tested in random order
10
Results: Objective Metrics
11
Results: Subjective Metrics
12
Conclusions UI efficiency (task completion, task duration) and subjective metrics : GUI-only is the most efficient mode Speech-only is the least efficient mode No differences in efficiency among the three multi-modal modes Repeating experiments on PDA Different ASR recognition rates Different ASR recognition speed
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.