CMPUT 301: Lecture 31 Out of the Glass Box Martin Jagersand Department of Computing Science University of Alberta.

CMPUT 301: Lecture 31 Out of the Glass Box Martin Jagersand Department of Computing Science University of Alberta

2 Overview Idea: –why only use the sense of vision in user interfaces? –increase the bandwidth of the interaction by using multiple sensory channels, instead of overloading the visual channel

3 Overview Multi-sensory systems: –use more than one sensory channel in interaction –e.g., sound, video, gestures, physical actions etc.

4 Overview Usable senses: –sight, sound, touch, taste, smell, –Haptics, proprioception and accelerations –each is important on its own –together, they provide a fuller interaction with the natural world

5 Overview Usable senses: –computers rarely offer such a rich interaction –we can use sight, sound, and sometimes touch –Flight simulators and some games uses accelerations to create a multimodal immersion experience. –we cannot (yet) use taste or smell

6 Overview Multi-modal systems: –use more than one sense in the interaction –e.g., sight and sound: a word processor that speaks the words as well as rendering them on the screen

7 Overview Multi-media systems: –use a number of different media to communicate information –e.g., a computer-based teaching system with video, animation, text, and still images

8 Speech Human speech: –natural mastery of language –instinctive, taken for granted –difficult to appreciate the complexities –potentially a useful way to extend human- computer interaction

9 Speech Structure: –phonemes (English) –40 (24 consonant and 16 vowel sounds) –basic atomic units of speech –sound slightly different depending on context …

10 Speech Structure: –allophones: –120 to 130 –all the sounds in the language –count depends on accents

11 Speech Structure: –morphemes –basic atomic units of language –part or whole words –formed into sentences using the rules of grammar

12 Speech Prosody: –variations in emphasis, stress, pauses, and pitch to impart more meaning to sentences Co-articulation: –the effect of context on the sound –transforms phonemes into allophones

13 Speech Recognition Problems: –different people speak differently (e.g., accent, stress, volume, etc.) –background noises –“ummm …” and “errr …” –speech may conflict with complex cognition

14 Speech Recognition Issues: –recognizing words is not enough –need to extract meaning –understanding a sentence requires context, such as information about the subject and the speaker

15 Speech Recognition Phonetic typewriter: –developed for Finnish (a phonetic language) –trained on one speaker, tries to generalize to others –uses neural network that clusters similar sounds together, for a character –poor performance on speakers it has not been trained on –requires a large dictionary of minor variations

16 Speech Recognition Currently: –single user, limited vocabulary systems can work satisfactorily –no general user, general vocabulary systems are commercial successful, yet Current commercial examples: –Simple telephone based UI such as Train schedule information systems

17 Speech Recognition Potential: –for users with physical disabilities –for lightweight, mobile devices –for when user’s hands are already occupied with a manual task (auto mechanic, surgeon)

18 Speech Synthesis What: –computer-generated speech –natural and familiar way of receiving information

19 Speech Synthesis Problems: –human find it difficult to adjust to monotonic, non-prosodic speech –computer needs to understand natural language and the domain –Speech is transient (hard to review or browse) –produces noise in the workplace or requires headphones (intrusive)

20 Speech Synthesis Potential: –screen readers –read a textual display to a visually impaired person –warning signals –spoken information especially for aircraft pilots whose visual and haptic channels are busy

21 Speech Synthesis Virtual newscaster (Ananova)

22 Uninterpreted Speech What: –fixed, recorded speech –e.g., played back in airport announcements –e.g., attached as voice annotation to files

23 Uninterpreted Speech Digital processing: –change playback speed without changing pitch –to quickly scan phone messages –to manually transcribe voice to text –to figure out the lyrics and chords of a song –spatialization and environmental effects

24 Non-Speech Sound What: –boings, bangs, squeaks, clicks, etc. –commonly used in user interfaces to provide warnings and alarms

25 Non-Speech Sound Why: –fewer typing mistakes with key clicks –video games harder without sound

26 Non-Speech Sound? D’oh!

27 Non-Speech Sound Dual mode displays: –information presented along two different sensory channels –e.g., sight and sound –allows for redundant presentation –user uses whichever they find easiest –allows for resolution of ambiguity in one mode through information in the other

28 Non-Speech Sound Dual mode displays: –humans can react faster to auditory than visual stimuli –sound is especially good for transient information that would otherwise clutter a visual display –sound is more language and culture independent (unlike speech)

29 Non-Speech Sound Auditory icons: –use natural sounds to represent different types of objects and actions in the user interface –e.g., breaking glass sound when deleting a file –direction and volume of sounds can indicate position and importance/size –SonicFinder –not all actions have an intuitive sound

30 Non-Speech Sound Earcons: –synthetic sounds used to convey information –structured combinations of motives (musical notes) to provide rich information

31 Non-Speech Sound Earcons:

32 Handwriting Recognition Handwriting: –text and graphic input –complex strokes and spaces –natural

33 Handwriting Recognition Problems: –variation in handwriting between users –variation from day to day and over years for a single user –variation of letters depending on nearby letters

34 Handwriting Recognition Currently: –limited success with systems trained on a few users, with separated letters –generic, multi-user, cursive text recognition systems are not accurate enough to be commercially successful Current applications e.g. pre-sorting of mail (but human has to assist with failures)

35 Handwriting Recognition Newton: –printing or cursive writing recognition –dictionary of words –contextual recognition –fine tune spacing and letter shapes –fine tune recognition speed –learn handwriting over time

36 Handwriting Recognition Newton:

37 End What did I learn today? What questions do I still have?

CMPUT 301: Lecture 31 Out of the Glass Box Martin Jagersand Department of Computing Science University of Alberta.

Similar presentations

Presentation on theme: "CMPUT 301: Lecture 31 Out of the Glass Box Martin Jagersand Department of Computing Science University of Alberta."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CMPUT 301: Lecture 31 Out of the Glass Box Martin Jagersand Department of Computing Science University of Alberta.

Similar presentations

Presentation on theme: "CMPUT 301: Lecture 31 Out of the Glass Box Martin Jagersand Department of Computing Science University of Alberta."— Presentation transcript:

Similar presentations

About project

Feedback