Multimodality, universals, natural interaction… and some other stories… Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4.

Slides:



Advertisements
Similar presentations
National Technical University of Athens Department of Electrical and Computer Engineering Image, Video and Multimedia Systems Laboratory
Advertisements

From Facial Features to Facial Expressions A.Raouzaiou, K.Karpouzis and S.Kollias Image, Video and Multimedia Systems Laboratory National Technical University.
HUMAINE Summer School - September Basic Emotions from Body Movements HUMAINE Summer School 2006 Casa Paganini Genova, Italy Ahmad S. Shaarani The.
Descriptive schemes for facial expression introduction.
Irek Defée Signal Processing for Multimodal Web Irek Defée Department of Signal Processing Tampere University of Technology W3C Web Technology Day.
Hillary Crissinger, M.A.& Doctoral candidate in Special Education
Facial expression as an input annotation modality for affective speech-to-speech translation Éva Székely, Zeeshan Ahmed, Ingmar Steiner, Julie Carson-Berndsen.
Chapter 1 Introduction. “How do I send picture by ?” “Click on Attach button, or paper clip icon, select the picture and click attach” The instructions.
 INTRODUCTION  STEPS OF GESTURE RECOGNITION  TRACKING TECHNOLOGIES  SPEECH WITH GESTURE  APPLICATIONS.
Languages & The Media, 5 Nov 2004, Berlin 1 New Markets, New Trends The technology side Stelios Piperidis
John Hu Nov. 9, 2004 Multimodal Interfaces Oviatt, S. Multimodal interfaces Mankoff, J., Hudson, S.E., & Abowd, G.D. Interaction techniques for ambiguity.
ICANN workshop, September 14, Athens, Greece Intelligent Affective Interaction technologies and applications.
ICT Networking Session « SEMANTIC ADAPTATION IN AFFECTIVE INTERACTION » STEFANOS KOLLIAS National Technical University of Athens Computer Science Division.
CS335 Principles of Multimedia Systems Multimedia and Human Computer Interfaces Hao Jiang Computer Science Department Boston College Nov. 20, 2007.
1 Human simulations of vocabulary learning Présentation Interface Syntaxe-Psycholinguistique Y-Lan BOUREAU Gillette, Gleitman, Gleitman, Lederer.
Facial Feature Detection
Multimodal+emotion+recognition a.k.a. ‘better than the sum of its parts’ Kostas Karpouzis Assoc. researcher ICCS/NTUA
Enabling enactive interaction in virtualized experiences Stefano Tubaro and Augusto Sarti DEI – Politecnico di Milano, Italy.
Why Do We Call It Multimedia? Masahito Hirakawa Shimane University, Japan.
Nonverbal Communication
EWatchdog: An Electronic Watchdog for Unobtrusive Emotion Detection based on Usage Analysis Rayhan Shikder Department.
GUI: Specifying Complete User Interaction Soft computing Laboratory Yonsei University October 25, 2004.
Maria Neophytou Communication And Internet Studies ENG270 – English for Communication Studies III
Multimedia Specification Design and Production 2013 / Semester 2 / week 8 Lecturer: Dr. Nikos Gazepidis
Zhengyou Zhang Microsoft Research Digital Object Identifier: /MMUL Publication Year: 2012, Page(s): Professor: Yih-Ran Sheu Student.
Recognition of meeting actions using information obtained from different modalities Natasa Jovanovic TKI University of Twente.
Expressive Emotional ECA ✔ Catherine Pelachaud ✔ Christopher Peters ✔ Maurizio Mancini.
Chapter 7. BEAT: the Behavior Expression Animation Toolkit
New Year’s Project – Day 3 Learning goal: Identify cues (signs) that help visual storytellers communicate with their audiences. Distinguish between wide.
Method Participants Participants were 68 preschoolers, between the ages of 29 and 59 months of age. The sample was comprised of 32 male participants and.
Can you think of a few examples? Greeting – We establish who we are and introduce ourselves on first and subsequent meetings. Satisfying needs- In order.
SPEECH CONTENT Spanish Expressive Voices: Corpus for Emotion Research in Spanish R. Barra-Chicote 1, J. M. Montero 1, J. Macias-Guarasa 2, S. Lufti 1,
Lipreading: how it works. Learning objectives Recognise the different processes and skills involved in lipreading Revise factors that help or hinder lipreading.
Recognition, Analysis and Synthesis of Gesture Expressivity George Caridakis IVML-ICCS.
Sign Language and Communication
Human-Computer Interaction
Collection of multimodal data Face – Speech – Body George Caridakis ICCS Ginevra Castellano DIST Loic Kessous TAU.
Language. Phonetics is the study of how elements of language are physically produced.
卓越發展延續計畫分項三 User-Centric Interactive Media ~ 主 持 人 : 傅立成 共同主持人 : 李琳山,歐陽明,洪一平, 陳祝嵩 水美溫泉會館研討會
Welcome! Nonverbal Communication
Toward a Unified Scripting Language 1 Toward a Unified Scripting Language : Lessons Learned from Developing CML and AML Soft computing Laboratory Yonsei.
Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective Xiaohan Ma, Binh H. Le, and Zhigang Deng Department of Computer Science.
VoiceUNL : a proposal to represent speech control mechanisms within the Universal Networking Digital Language Mutsuko Tomokiyo (GETA-CLIPS-IMAG) & Gérard.
Cognitive Systems Foresight Language and Speech. Cognitive Systems Foresight Language and Speech How does the human system organise itself, as a neuro-biological.
1 Workshop « Multimodal Corpora » Jean-Claude MARTIN Patrizia PAGGIO Peter KÜEHNLEIN Rainer STIEFELHAGEN Fabio PIANESI.
4 November 2000Bridging the Gap Workshop 1 Control of avatar gestures Francesca Barrientos Computer Science Division UC Berkeley.
Chapter 7 Affective Computing. Structure IntroductionEmotions Emotions & Computers Applications.
Jennifer Lee Final Automated Detection of Human Emotion.
ENTERFACE’08 Multimodal Communication with Robots and Virtual Agents mid-term presentation.
WRITING FROM OBSERVATION ESSAY 2. TIME TO OBSERVE On your computer, type adjectives that describe the type of individual in the image that you see. Words.
What is Multimedia Anyway? David Millard and Paul Lewis.
Interpreting Ambiguous Emotional Expressions Speech Analysis and Interpretation Laboratory ACII 2009.
Facial Expression Analysis Theoretical Results –Low-level and mid-level segmentation –High-level feature extraction for expression analysis (FACS – MPEG4.
Towards Semantic Affect Sensing in Sentences Alexander Osherenko.
Under Guidance of Mr. A. S. Jalal Associate Professor Dept. of Computer Engineering and Applications GLA University, Mathura Presented by Dev Drume Agrawal.
Perceptive Computing Democracy Communism Architecture The Steam Engine WheelFire Zero Domestication Iron Ships Electricity The Vacuum tube E=mc 2 The.
Modeling Expressivity in ECAs
Ten Myths of Multimodal Interaction
KRISTINA Consortium Presented by: Mónica Domínguez (UPF-TALN)
Automated Detection of Human Emotion
Video-based human motion recognition using 3D mocap data
8th Annual Post-Graduate Research Symposium
The Extended Cohn-Kanade Dataset (CK+): A complete dataset for action unit and emotion-specified expression By: Patrick Lucey, Jeffrey F. Cohn, Takeo.
Pervasive Computing Happening?
Multimodal Caricatural Mirror
Project #2 Multimodal Caricatural Mirror Intermediate report
Presented by: Mónica Domínguez
COMMUNICATION.
Automated Detection of Human Emotion
Presentation transcript:

multimodality, universals, natural interaction… and some other stories… Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4

going multimodal ‘multimodal’ is this decade’s main ‘affective interaction’ aspect. plethora of modalities available to capture and process –visual, aural, haptic… –‘visual’ can be broken down to ‘facial expressivity’, ‘hand gesturing’, ‘body language’, etc. –‘aural’ to ‘prosody’, ‘linguistic content’, etc.

why multimodal? Extending unimodality… –recognition from traditional unimodal inputs had serious limitations –Multimodal corpora become available What to gain? –have recognition rates improved? –or just introduced more uncertain features

essential reading Communications of the ACM, Nov. 1999, Vol. 42, No. 11, pp

putting it all together myth #6: multimodal integration involves redundancy of content between modes you have features from a person’s –facial expressions and body language –speech prosody and linguistic content, –even their heartbeat rate so, what do you do when their face tells you different than their …heart?

first, look at this video

and now, listen!

but it can be good what happens when one of the available modalities is not robust? –better yet, when the ‘weak’ modality changes over time? consider the ‘bartender problem’ –very little linguistic content reaches its target –mouth shape available (viseme) –limited vocabulary

but it can be good

again, why multimodal? holy grail: assigning labels to different parts of human-human or human-computer interaction yes, labels can be nice! –humans do it all the time –and so do computers (e.g., classification) –OK, but what kind of label?

In the beginning … Based on the claim that ‘there are six facial expressions recognized universally across cultures’… all video databases used to contain images of sad, angry, happy or fearful people… thus, more sad, angry, happy or fearful people appear, even when data involve HCI, and subtle emotions/additional labels are out of the picture –can you really be afraid that often when using your computer?

the Humaine approach so where is Humaine in all that? –subtle emotions –natural expressivity –alternative emotion representations –discussing dynamics –classification of emotional episodes from life-like HCI and reality TV

Humaine WP4 results Dataset (leader) Frames/users/ Length Modalities present Features extracted Until nowPlans for 2007Recognition rates ERMIS SAL (QUB- ICCS) Four subjects ~ 2hr of audio/video annotated with Feeltrace facial expressions, Speech prosody, head pose FAPs per frame Acoustic features per tune Phonemes/Vise mes One subject analyzed ~ frames ~800 tunes Extract facial and prosody features from three remaining subjects Analyze head pose RecurrentNNs:87% Rule-based : 78,4% Possibilistic: 65,1% EmoTV (LIMSI) 28 clips ~5 minutes total Subtle facial expressions, Restricted gesturing Overall activation (FAPs or prosody not possible All clipsExtract Remaining expressivity features (where possible) Correlation with manual annotator: κ*=0,83 Emo Taboo (LIMSI) 2 clips ~5 minutes Facial expressions Speech prosody FAPsAll clipsHead pose Prosody features Annotation not yet available CEICES (FAU) 51 children ~ 9 hrs recorded and annotated Speech prosody Acoustic features per turn/word All clips Completed analysis, pending comparison of recognition schemes Mean recognition rate: 55.8% Genoa 06 corpus (Genoa) 10 subjects ~50 gesture repetitions each ~1 hour FAPs gesturing, Pseudolangua ge FAPs gestures Speech All clipsExpressivity features from hand movement Facial: 59.6% Gestures: 67.1% Speech: 70.8% Multimodal: 78.3% GEMEP (GERG) 1200 clips totalFAPs gesturing, Pseudolangua Expressivity, gestures FAPs Speech 8 body clips 30 face clips Analyze remaining 1200 clips Few clips analyzed

HUMAINE 2010 three years from now in a galaxy (not) far, far away…

a fundamental question

OK, people may be angry or sad, or express positive/active emotions face recognition provides response to the ‘who?’ question ‘when?’ and ‘where?’ are usually known or irrelevant but, does anyone know ‘why?’ –context information –semantics

a fundamental question (2)

is it me or?...

some modalities may display no clues or, worse, contradicting clues the same expression may mean different things coming from different people can we ‘bridge’ what we know about someone or about the interaction with what we sense? –and can we adapt what we know based on that? –or can we align what we sense with other sources?

another kind of language

sign language analysis poses a number of interesting problems –image processing and understanding tasks –syntactic analysis –context (e.g. when referring to a third person) –natural language processing –vocabulary limitations

want answers? Let us try to extend some of the issues already raised!

Semantic Analysis Semantics – Context (a peak at the future) Visual- data SegmentationFeature Extraction C1C1 C2.C2. CnCn Context Fusion Adapt - ation Label- ling Ontology infrastructure Context analysis Visual analysis Classifiers Fuzzy Reasoning Engine (FiRE) Centralised /Decentralised Knowledge Repository

Standardisation Activities W3C Multimedia Semantics Incubator Group W3C Emotion Incubator Group  Provide machine understandable representations of available Emotion Modelling, Analysis, Synthesis theory, cues and results to be accessed through the Web and used in all types of affective interaction.