Interpreting Ambiguous Emotional Expressions Speech Analysis and Interpretation Laboratory ACII 2009.

Slides:



Advertisements
Similar presentations
National Technical University of Athens Department of Electrical and Computer Engineering Image, Video and Multimedia Systems Laboratory
Advertisements

1 User experience User interfaces Jaana Holvikivi Metropolia.
Audio-based Emotion Recognition for Advanced Information Retrieval in Judicial Domain ICT4JUSTICE 2008 – Thessaloniki,October 24 G. Arosio, E. Fersini,
Facial expression as an input annotation modality for affective speech-to-speech translation Éva Székely, Zeeshan Ahmed, Ingmar Steiner, Julie Carson-Berndsen.
Joemon M Jose (with Ioannis Arapakis & Ioannis Konstas) Department of Computing Science.
Unsupervised Clustering in Multimodal Multiparty Meeting Analysis.
Spoken Language Generation Project II Synthesizing Emotional Speech in Fairy Tales.
ADVISE: Advanced Digital Video Information Segmentation Engine
Advanced Technology Center Stuttgart EMOTIONAL SPACE IMPROVES EMOTION RECOGNITION Raquel Tato, Rocio Santos, Ralf Kompe Man Machine Interface Lab Advance.
T.Sharon 1 Internet Resources Discovery (IRD) Video IR.
1 Evidence of Emotion Julia Hirschberg
Ch 4: Perceiving Persons Part 1: Sept. 17, Social Perception Get info from people, situations, & behavior – We make quick 1 st impressions of people.
Presented by Zeehasham Rasheed
Techniques for Emotion Classification Kaushal N Lahankar Oct 12,2009 COMS 6998.
1 IUT de Montreuil Université Paris 8 Emotion in Interaction: Embodied Conversational Agents Catherine Pelachaud.
Producing Emotional Speech Thanks to Gabriel Schubiner.
Recognizing Emotions in Facial Expressions
Non-Verbal Communication
Sunee Holland University of South Australia School of Computer and Information Science Supervisor: Dr G Stewart Von Itzstein.
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Facial Feature Detection
Unit 1 – Understanding Non-Fiction and Media Texts
Human Emotion Synthesis David Oziem, Lisa Gralewski, Neill Campbell, Colin Dalton, David Gibson, Barry Thomas University of Bristol, Motion Ripper, 3CR.
GUI: Specifying Complete User Interaction Soft computing Laboratory Yonsei University October 25, 2004.
Recognition of meeting actions using information obtained from different modalities Natasa Jovanovic TKI University of Twente.
APML, a Markup Language for Believable Behavior Generation Soft computing Laboratory Yonsei University October 25, 2004.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
Method Participants Participants were 68 preschoolers, between the ages of 29 and 59 months of age. The sample was comprised of 32 male participants and.
SPEECH CONTENT Spanish Expressive Voices: Corpus for Emotion Research in Spanish R. Barra-Chicote 1, J. M. Montero 1, J. Macias-Guarasa 2, S. Lufti 1,
DEPARTMENT of COMPUTER SCIENCE University of Rochester  Activities  Abductive Inference of Multi-Agent Interaction  Capture the Flag Data Collection.
Multimodal Information Analysis for Emotion Recognition
Vocabularies for Description of Accessibility Issues in MMUI Željko Obrenović, Raphaël Troncy, Lynda Hardman Semantic Media Interfaces, CWI, Amsterdam.
Toward a Unified Scripting Language 1 Toward a Unified Scripting Language : Lessons Learned from Developing CML and AML Soft computing Laboratory Yonsei.
The Expression of Emotion: Nonverbal Communication.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Multimodality, universals, natural interaction… and some other stories… Kostas Karpouzis & Stefanos Kollias ICCS/NTUA HUMAINE WP4.
©2011 1www.id-book.com Data Gathering Chapter 7. ©2011 Data Gathering What is data gathering? –The act of gathering data through a study The data can.
Emotion. Emotion Defining Emotion Defining Emotion Elements of Emotion 1: The Body Elements of Emotion 1: The Body Elements of Emotion 2: The Mind Elements.
EMPATH: A Neural Network that Categorizes Facial Expressions Matthew N. Dailey and Garrison W. Cottrell University of California, San Diego Curtis Padgett.
Performance Comparison of Speaker and Emotion Recognition
Predicting Voice Elicited Emotions
MIT Artificial Intelligence Laboratory — Research Directions The Next Generation of Robots? Rodney Brooks.
Emotional Intelligence
Chapter 7 Affective Computing. Structure IntroductionEmotions Emotions & Computers Applications.
Gesture Recognition 12/3/2009.
Non Verbal Communication. What Is Paralanguage? DEFINITION Paralanguage is the voice intonation that accompanies speech, including voice pitch, voice.
Data Mining for Surveillance Applications Suspicious Event Detection Dr. Bhavani Thuraisingham.
User Modeling for the Mars Medical Assistant MCS Project By Mihir Kulkarni.
Facial Expressions and Emotions Mental Health. Total Participants Adults (30+ years old)328 Adults (30+ years old) Adolescents (13-19 years old)118 Adolescents.
W3C Multimodal Interaction Activities Deborah A. Dahl August 9, 2006.
Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.
WP6 Emotion in Interaction Embodied Conversational Agents WP6 core task: describe an interactive ECA system with capabilities beyond those of present day.
Presented By Meet Shah. Goal  Automatically predicting the respondent’s reactions (accept or reject) to offers during face to face negotiation by analyzing.
Facial Expression Analysis Theoretical Results –Low-level and mid-level segmentation –High-level feature extraction for expression analysis (FACS – MPEG4.
Hierarchical Motion Evolution for Action Recognition Authors: Hongsong Wang, Wei Wang, Liang Wang Center for Research on Intelligent Perception and Computing,
EXAMPLES ABSTRACT RESEARCH QUESTIONS DISCUSSION FUTURE DIRECTIONS Very little is known about how and to what extent emotions are conveyed through avatar.
Under Guidance of Mr. A. S. Jalal Associate Professor Dept. of Computer Engineering and Applications GLA University, Mathura Presented by Dev Drume Agrawal.
Detection Of Anger In Telephone Speech Using Support Vector Machine and Gaussian Mixture Model Prepared By : Siti Marahaini Binti Mahamood.
Derek McColl Alexander Hong Naoaki Hatakeyama Goldie Nejat
University of Rochester
August 15, 2008, presented by Rio Akasaka
Interpretation and Perception
Pick samples from task t
Voluntary (Motor Cortex)
Introduction to (and Theories of) Emotion
The Open World of Micro-Videos
Emotions.
Collection 6: Tales of the Strange and Mysterious
Business Research Methods
Chapter 10 Content Analysis
Presentation transcript:

Interpreting Ambiguous Emotional Expressions Speech Analysis and Interpretation Laboratory ACII 2009

Motivation Emotion expression is challenging: –Multi-scale dependencies: Time, speaker, context, mood, personality, culture –Intentionally obfuscation: Frustration may be suppressed –Inherent multimodality: Contentment is expressed using the face and voice –Colored by mood, culture, personality, dialog-flow Problem Statement: –How can arbitrary emotional expressions be evaluated? –How can interaction-level information be used to inform classification?

Operating Emotion Definitions Prototypical emotions: –Expressions that are consistently recognized by a set of human evaluators (e.g., rage, glee, etc.) Nonprototypical emotions: –Expressions that are not consistently recognized by a set of human evaluators –Potential causes: Ambiguous class definitions [frustration, anger] Emotional subtlety Multimodal expression [sarcasm] Natural emotional flow of a dialog

Emotion and its Complexities Temporal variability: –Emotion is manifested and perceived across varying time scales Additional challenges: –Individual variability: Emotion perception varies at the individual level –Multi-modality: Emotion is expressed using speech, the face, body posture, etc. –Representation: Emotion reporting may be influenced by the representation and method of evaluation

Temporal Variability Multi-scale Representation Emotion is modulated across different time scales There is an inherent interdependency between the manifestations of emotion over the varying scales –Time units: phoneme, syllable, word, phrase, utterance, turn, subdialog, dialog,... –The style of emotion expression is non- constant over these time units. –Segments may be highly prototypical or non- prototypical

Temporal Variability Emotional Profile Create emotional profiles to: –Estimate prototypical ebb and flow –Identify “relevance sections,” sections Describes confidence of an emotional label assignment Soft label representative of the classification output Benefits: –Retention of emotional information lost in a single hard emotion assignment –Locate emotional tenor changes in a dialog Emotional profiles as features

Temporal Variability Interaction modeling Proposal- use emotional profiles to develop an emotional interaction framework High-level example: Angry Happy Sad Neutral Angry Happy Sad Neutral Utterance 1 Utterance 2 Utterance 3 Utterance 4 Ground Truth: Angry Dialog Angry ???? Angry First level classification: Majority-vote assignment There is no evidence to suggest that the emotional content of the dialog is not angry. Assign the emotional tag of the dialog to “angry.”

Temporal Variability Interaction modeling Dynamic dyadic interaction modeling at the dialog level –Captures influences existing between interlocutors –Captures individual-specific temporal characteristics of emotion Emotion state changes as a function of interlocutor’s state Temporal smoothness –Individual’s emotion flow relatively constant between two overlapping windows Captures individual evaluation styles

Temporal Variability An Example of Interaction modeling *First-order Markov Chain for Temporal Dynamics of Emotion State Influence of Speaker A’s State on Speaker B Within a Turn Mutual Influence of Emotion State Across Turns Emotion States During Turn t Emotion States During Turn t-1

Emotion and its Complexities Temporal variability: –Emotion is manifested and perceived across varying time scales Additional challenges: –Individual variability: Emotion perception varies at the individual level –Multi-modality: Emotion is expressed using speech, the face, body posture, etc. –Representation: Emotion reporting may be influenced by the representation and method of evaluation

Additional Challenges Individual Variability: User Perception Emotion perception is colored by: –Emotion content of an utterance –Semantic content of an utterance –Context –Mood of evaluator –Personality of evaluator –Fatigue of evaluator –Attention of evaluator

Additional Challenges Individual Variability: Explicit User Models Capture evaluation style Create models that define: –Perception as a function of mood –Perception as a function of attention –Perception as a function of alertness These models can be used to: –Estimate the state of the user –Create “active-learning” environments

Additional Challenges Multi-modality of Emotion Expression Inherent limits of uni-modal processing The audio information alone does not fully capture the emotion content –“Prototypical” angry example –Video examples: Subtle angerHot angerSarcasmContentment

Additional Challenges The Effect of Representation Reported emotion perception is dependent on the evaluation structure –Evaluation structure for our data: Multi-modal (audio and video) Clips are viewed in order Reported emotion perception is dependent on the evaluation methodology –Categorical –Dimensional

Conclusions Goal: develop techniques to interpret emotional expressions independent of their prototypical or non-prototypical nature Improve dialog-level classification: –Consider the dynamics of the acoustic features and the dynamics of the underlying classification –Classify the emotion within the context of a dialog based on emotionally clear data (vs. ambiguous content) –Will result in enhanced automated emotional comprehension by machines

Open Questions How can prototypical emotions be used to understand and interpret non-prototypical emotions? Is it important to be able to successfully interpret all utterances of an individual? Should a user’s emotion state ever be discarded? How can we best make use of limited data? How can ambiguous emotional content be interpreted and utilized during human- machine interaction?

Questions?

Prototypical & Nonprototypical Prototypical Expressions Nonprototypical Majority-Vote Expressions Nonprototypical Non-Majority-Vote Expressions

Data overview: IEMOCAP database Modalities: –Audio, video, motion capture Collection style: –Dyadic interaction (mixed-gender) –Scripted and improvisational expressions –“Natural” emotion elicitation Size: –Five pairs (five men, five women) –12 hours

Data overview: IEMOCAP database Evaluation: –Twelve evaluators (overlapping subsets) –Sequential annotation –Categorical ratings (3+ per utterance) Angry, happy, excited, sad, neutral, frustrated, surprised, disgusted, fearful, other (~25%) –Dimensional ratings (2 per utterance) Valence, activation

Data overview: IEMOCAP database Database specific definitions: –Prototypical- complete evaluator agreement –Nonprototypical majority-vote- majority vote agreement –Nonprototypical non-majority-vote- expressions without a majority consensus Emotional CategoryPrototypicalNP MVNP NMV* Anger Happiness/Excitement Neutrality Sadness Frustration * At least one evaluator tagged as given emotion, non-disjoint set

Emotional profiling: Sadness

Emotional profiling: Anger

Emotional profiling: Frustration