(Slides modified from D. Jurafsky) Emotion CS 3710 / ISSP 3565.

Slides:

Advertisements

Similar presentations

Affective Facial Expressions Facilitate Robot Learning Joost Broekens Pascal Haazebroek LIACS, Leiden University, The Netherlands.

Advertisements

CS 424P/ LINGUIST 287 Extracting Social Meaning and Sentiment

Nonverbal Communication and Teamwork

CS 224S / LINGUIST 285 Spoken Language Processing Dan Jurafsky Stanford University Spring 2014 Lecture 7: Emotion/Affect Extraction.

Detecting Certainness in Spoken Tutorial Dialogues Liscombe, Hirschberg & Venditti Using System and User Performance Features to Improve Emotion Detection.

5/10/20151 Evaluating Spoken Dialogue Systems Julia Hirschberg CS 4706.

Facial expression as an input annotation modality for affective speech-to-speech translation Éva Székely, Zeeshan Ahmed, Ingmar Steiner, Julie Carson-Berndsen.

AUTOMATIC SPEECH CLASSIFICATION TO FIVE EMOTIONAL STATES BASED ON GENDER INFORMATION ABSTRACT We report on the statistics of global prosodic features of.

Dan Jurafsky Lecture 6: Emotion CS 424P/ LINGUIST 287 Extracting Social Meaning and Sentiment.

Emotions and Voice Quality: Experiments with Sinusoidal Modeling Authors: Carlo Drioli, Graziano Tisato, Piero Cosi, Fabio Tesser Institute of Cognitive.

Health/Disease Culture Environment Temperature Climate Sanitation Etc. Attitudes and Beliefs Definitions of health Conceptions of the body Attributions.

Dan Jurafsky Lecture 2: Emotion and Mood Computational Extraction of Social and Interactional Meaning SSLST, Summer 2011.

Emotion in Meetings: Hot Spots and Laughter. Corpus used ICSI Meeting Corpus – 75 unscripted, naturally occurring meetings on scientific topics – 71 hours.

Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.

Prosodic Cues to Discourse Segment Boundaries in Human-Computer Dialogue SIGDial 2004 Gina-Anne Levow April 30, 2004.

Advanced Technology Center Stuttgart EMOTIONAL SPACE IMPROVES EMOTION RECOGNITION Raquel Tato, Rocio Santos, Ralf Kompe Man Machine Interface Lab Advance.

Spoken Language Processing Lab Who we are: Julia Hirschberg, Stefan Benus, Fadi Biadsy, Frank Enos, Agus Gravano, Jackson Liscombe, Sameer Maskey, Andrew.

Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.

Outline Why study emotional speech?

Extracting Social Meaning Identifying Interactional Style in Spoken Conversation Jurafsky et al ‘09 Presented by Laura Willson.

Emotion. The heart has reasons that reason does not recognize -- Pascal Reason is and ought to be the slave of passion -- Hume Are Emotions Necessary.

Emotional Grounding in Spoken Dialog Systems Jackson Liscombe Giuseppe Riccardi Dilek Hakkani-Tür

On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.

1 Evidence of Emotion Julia Hirschberg

Techniques for Emotion Classification Julia Hirschberg COMS 4995/6998 Thanks to Kaushal Lahankar.

Ch 4: Perceiving Persons Part 1: Sept. 17, Social Perception Get info from people, situations, & behavior – We make quick 1 st impressions of people.

Techniques for Emotion Classification Kaushal N Lahankar Oct 12,2009 COMS 6998.

Producing Emotional Speech Thanks to Gabriel Schubiner.

PowerPoint® Presentation by Jim Foley Motivation and Emotion © 2013 Worth Publishers.

Culture and the Individual Emotions and Motivation.

Psychology 307: Cultural Psychology Lecture 15

Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.

Shriberg, Stolcke, Ang: Prosody for Emotion Detection DARPA ROAR Workshop 11/30/01 1 Liz Shriberg* Andreas Stolcke* Jeremy Ang + * SRI International International.

9 th Conference on Telecommunications – Conftele 2013 Castelo Branco, Portugal, May 8-10, 2013 Sara Candeias 1 Dirce Celorico 1 Jorge Proença 1 Arlindo.

Nonverbal Communication

Culture and Social Interactions, Gender, and Emotions Dr. K. A. Korb University of Jos 1 June 2009.

NON VERBAL COMMUNICATION NOTES. What is communication? Definition Types:  Verbal communication  Nonverbal communication.

Chapter 7. BEAT: the Behavior Expression Animation Toolkit

On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.

circle Adding Spoken Dialogue to a Text-Based Tutorial Dialogue System Diane J. Litman Learning Research and Development Center & Computer Science Department.

SPEECH CONTENT Spanish Expressive Voices: Corpus for Emotion Research in Spanish R. Barra-Chicote 1, J. M. Montero 1, J. Macias-Guarasa 2, S. Lufti 1,

Multimodal Information Analysis for Emotion Recognition

 For all animals, emotions have an adaptive purpose  Emphasis on  Behavior Changes and Facial Expressions  Physiological Changes [Schirmer, A. (2014).

1 Computation Approaches to Emotional Speech Julia Hirschberg

Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.

Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.

Shriberg & Stolcke: Harnessing Prosody for HCI NASA IS-HCC Meeting, Feb , Elizabeth Shriberg Andreas Stolcke Speech Technology and Research.

1 Psychology 307: Cultural Psychology Lecture 14.

Performance Comparison of Speaker and Emotion Recognition

1/17/20161 Emotion in Meetings: Business and Personal Julia Hirschberg CS 4995/6998.

Emotional Intelligence

The Expression of Emotion: Nonverbal Communication.

Emotion. Defining Emotion ► Emotion: not just facial expressions.

Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.

Prosodic Cues to Disengagement and Uncertainty in Physics Tutorial Dialogues Diane Litman, Heather Friedberg, Kate Forbes-Riley University of Pittsburgh.

Presented By Meet Shah. Goal  Automatically predicting the respondent’s reactions (accept or reject) to offers during face to face negotiation by analyzing.

Interpreting Ambiguous Emotional Expressions Speech Analysis and Interpretation Laboratory ACII 2009.

Research Methodology Proposal Prepared by: Norhasmizawati Ibrahim (813750)

Introduction Emotion is defined as: “a mental state that arises spontaneously rather than through conscious effort and often accompanied by physiological.

August 15, 2008, presented by Rio Akasaka

Towards Emotion Prediction in Spoken Tutoring Dialogues

Studying Intonation Julia Hirschberg CS /21/2018.

PowerPoint® Presentation by Jim Foley

Recognizing Structure: Sentence, Speaker, andTopic Segmentation

Liz Shriberg* Andreas Stolcke* Jeremy Ang+ * SRI International

Emotional Speech Julia Hirschberg CS /8/2018.

Emotional Speech Julia Hirschberg CS /16/2019.

Introduction to Sentiment Analysis

Low Level Cues to Emotion

Presentation transcript:

(Slides modified from D. Jurafsky) Emotion CS 3710 / ISSP 3565

Scherer’s typology of affective states Emotion: relatively brief eposide of synchronized response of all or most organismic subsystems in response to the evaluation of an external or internal event as being of major significance angry, sad, joyful, fearful, ashamed, proud, desparate Mood: diffuse affect state, most pronounced as change in subjective feeling, of low intensity but relatively long duration, often without apparent cause cheerful, gloomy, irritable, listless, depressed, buoyant Interpersonal stance: affective stance taken toward another person in a specific interaction, coloring the interpersonal exchange in that situation distant, cold, warm, supportive, contemptuous Attitudes: relatively enduring, affectively colored beliefs, preferences predispositions towards objects or persons liking, loving, hating, valueing, desiring Personality traits: emotionally laden, stable personality dispositions and behavior tendencies, typical for a person nervous, anxious, reckless, morose, hostile, envious, jealous

Extracting social/interactional meaning Emotion Annoyance in talking to dialog systems Uncertainty of students in tutoring Mood Detecting Trauma or Depression Interpersonal Stance Romantic interest, flirtation, friendliness Alignment/accommodation/entrainment Attitudes = Sentiment (positive or negative) – won’t cover Personality Traits Open, Conscienscious, Extroverted, Anxious

Outline Theoretical background on emotion Extracting emotion from speech and text: case studies

Ekman’s 6 basic emotions Surprise, happiness, anger, fear, disgust, sadness

Ekman and colleagues Hypothesis: certain basic emotions are universally recognized across cultures; emotions are evolutionarily adaptive and unlearned. Ekman, Friesen, and Tomkins: showed facial expressions of emotion to observers in 5 different countries (Argentina, US, Brazil, Chile, & Japan) and asked the observers to label each expression. Participants from all five countries showed widespread agreement on the emotion each of these pictures depicted. Ekman, Sorenson, and Friesen: conducted a similar study with preliterate tribes of New Guinea (subjects selected a story that best described the facial expression). The tribesmen correctly labeled the emotion even though they had no prior experience with print media. Ekman and colleagues: asked tribesman to show on their faces what they would look like if they experienced the different emotions. They took photos and showed them to Americans who had never seen a tribesman and had them label the emotion. The Americans correctly labeled the emotion of the tribesmen. Ekman and Friesen conducted a study in the US and Japan asking subjects to view highly stressful stimuli as their facial reactions were secretly videotaped. Both subjects did show exactly the same types of facial expressions at the same points in time, and these expressions corresponded to the same expressions that were considered universal in the judgment research.

Dimensional approach. (Russell, 1980, 2003) Arousal High arousal, High arousal, Displeasure (e.g., anger) High pleasure (e.g., excitement) Valence Low arousal, Displeasure (e.g., sadness) High pleasure (e.g., relaxation) Slide from Julia Braverman

8 Image from Russell 1997valence - + arousal - Image from Russell, 1997

Distinctive vs. Dimensional approach of emotion Distinctive Emotions are units. Limited number of basic emotions. Basic emotions are innate and universal Methodology advantage Useful in analyzing traits of personality. Dimensional Emotions are dimensions. Limited # of labels but unlimited number of emotions. Emotions are culturally learned. Methodological advantage: Easier to obtain reliable measures. Slide from Julia Braverman

Expressed emotion Emotional attribution cues Emotional communication expressed anger ? encoderdecoder perception of anger? Vocal cues Facial cues Gestures Other cues … Loud voice High pitched Frown Clenched fists Shaking Example: slide from Tanja Baenziger

Implications Expressed emotionEmotional attribution cues relation of the cues to the expressed emotion relation of the cues to the perceived emotion matching Recognition (Extraction systems): relation of the cues to expressed emotion Generation (Conversational agents): relation of cues to perceived emotion Important for Agent generationImportant for Extraction slide from Tanja Baenziger

Four Theoretical Approaches to Emotion Darwinian (natural selection) Jamesian: Emotion is bodily experience “we feel sorry because we cry… afraid because we tremble"’ “our feeling of the … changes as they occur IS the emotion“ Cognitive Appraisal An emotion is produced by appraising (extracting) elements of the situation. (Scherer) Fear: produced by the appraisal of an event or situation as obstructive to one’s central needs and goals, requiring urgent action, being difficult to control through human agency, and lack of sufficient power or coping potential to deal with it. Social Constructivism Emotions are cultural products (Averill) Explains gender and social group differences anger is elicited by the appraisal that one has been wronged intentionally and unjustifiably by another person. don’t get angry if you yank my arm accidentally or if you are a doctor and do it to reset a bone only if you do it on purpose

Why Emotion Detection from Speech or Text? Detecting frustration of callers to a help line Detecting stress in drivers or pilots Detecting (dis)interest, (un)certainty in on-line tutors Pacing/Content/Feedback Hot spots in meeting browsers Synthesis/generation: On-line literacy tutors in the children’s storybook domain Computer games

Hard Questions in Emotion Recognition How do we know what emotional speech is? Acted speech vs. natural (hand labeled) corpora What can we classify? Distinguish among multiple ‘classic’ emotions Distinguish Valence: is it positive or negative? Activation: how strongly is it felt? (sad/despair) What features best predict emotions? What techniques best to use in classification? Slide from Julia Hirschberg

Accuracy of facial versus vocal cues to emotion (Scherer 2001)

Data and tasks for Emotion Detection Scripted speech Acted emotions, often using 6 emotions Controls for words, focus on acoustic/prosodic differences Features: F0/pitch Energy speaking rate Spontaneous speech More natural, harder to control Dialogue Kinds of emotion focused on: frustration, annoyance, certainty/uncertainty “activation/hot spots”

Quick case studies 1. Acted speech LDC’s EPSaT 2. Uncertainty in natural speech Pitt ITSPOKE 3. Annoyance/Frustration in natural speech Ang et al (assigned reading)

Example 1: Acted speech; emotional Prosody Speech and Transcripts Corpus (EPSaT) Recordings from LDC 8 actors read short dates and numbers in 15 emotional styles Slide from Jackson Liscombe

EPSaT Examples happy sad angry confident frustrated friendly Interested Etc. Slide from Jackson Liscombe

Liscombe et al (Detection) Automatic Acoustic-prosodic Features Global characterization pitch loudness speaking rate Slide from Jackson Liscombe

Global Pitch Statistics Different Valence/Different Activation Slide from Jackson Liscombe

Global Pitch Statistics Different Valence/Same Activation Slide from Jackson Liscombe

Liscombe et al. Features Automatic Acoustic-prosodic ToBI Contours Spectral Tilt Slide from Jackson Liscombe

Liscombe et al. Experiments Binary Classification for Each Emotion Ripper, 90/10 split Results 62% average baseline 75% average accuracy Most useful features: Slide from Jackson Liscombe

[pr01_sess00_prob58]

Task 1 Negative Confused, bored, frustrated, uncertain Positive Confident, interested, encouraged Neutral

Task 2: Uncertainty um I don’t even think I have an idea here now.. mass isn’t weight mass is the space that an object takes up is that mass? Slide from Jackson Liscombe [ :92-113] um I don’t even think I have an idea here now.. mass isn’t weight mass is the space that an object takes up is that mass?

Liscombe et al: ITSpoke Experiment Human-Human Corpus AdaBoost(C4.5) 90/10 split in WEKA Classes: Uncertain vs Certain vs Neutral Results: Slide from Jackson Liscombe FeaturesAccuracy Baseline66% Acoustic-prosodic75%

Current ITSpoke Experiment(s) Human-Computer (Binary) Uncertainty and (Binary) Disengagment Wizard and fully automated Be a pilot subject!

Scherer summaries re: Prosodic features

Juslin and Laukka metastudy

Ang et al 2002 Prosody-Based detection of annoyance/ frustration in human computer dialog DARPA Communicator Project Travel Planning Data NIST June 2000 collection: 392 dialogs, 7515 utts CMU 1/2001-8/2001 data: 205 dialogs, 5619 utts CU 11/1999-6/2001 data: 240 dialogs, 8765 utts Considers contributions of prosody, language model, and speaking style Questions How frequent is annoyance and frustration in Communicator dialogs? How reliably can humans label it? How well can machines detect it? What prosodic or other features are useful? Slide from Shriberg, Ang, Stolcke

Data Annotation 5 undergrads with different backgrounds Each dialog labeled by 2+ people independently 2nd “Consensus” pass for all disagreements, by two of the same labelers Slide from Shriberg, Ang, Stolcke

Data Labeling Emotion: neutral, annoyed, frustrated, tired/disappointed, amused/surprised, no-speech/NA Speaking style: hyperarticulation, perceived pausing between words or syllables, raised voice Repeats and corrections: repeat/rephrase, repeat/rephrase with correction, correction only Miscellaneous useful events: self-talk, noise, non- native speaker, speaker switches, etc. Slide from Shriberg, Ang, Stolcke

Emotion Samples Neutral July 30 Yes Disappointed/tired No Amused/surprised No Annoyed Yes Late morning (HYP) Frustrated Yes No No, I am … (HYP) There is no Manila... Slide from Shriberg, Ang, Stolcke

Emotion Class Distribution Slide from Shriberg, Ang, Stolcke To get enough data, grouped annoyed and frustrated, versus else (with speech)

Prosodic Model Classifier: CART-style decision trees Downsampled to equal class priors Automatically extracted prosodic features based on recognizer word alignments Used 3/4 for train, 1/4th for test, no call overlap Slide from Shriberg, Ang, Stolcke

Prosodic Features Duration and speaking rate features duration of phones, vowels, syllables normalized by phone/vowel means in training data normalized by speaker (all utterances, first 5 only) speaking rate (vowels/time) Pause features duration and count of utterance-internal pauses at various threshold durations ratio of speech frames to total utt-internal frames Slide from Shriberg, Ang, Stolcke

Prosodic Features (cont.) Pitch features F0-fitting approach developed at SRI (Sönmez) LTM model of F0 estimates speaker’s F0 range Many features to capture pitch range, contour shape & size, slopes, locations of interest Normalized using LTM parameters by speaker, using all utts in a call, or only first 5 utts Slide from Shriberg, Ang, Stolcke Log F 0 Time F0F0F0F0 LTM Fitting

Features (cont.) Spectral tilt features average of 1st cepstral coefficient average slope of linear fit to magnitude spectrum difference in log energies btw high and low bands extracted from longest normalized vowel region Slide from Shriberg, Ang, Stolcke

Language Model Features Train two 3-gram class-based LMs one on frustration, one on other. Given a test utterance, chose class that has highest LM likelihood (assumes equal priors) In prosodic decision tree, use sign of the likelihood difference as input feature Slide from Shriberg, Ang, Stolcke

Results (cont.) H-H labels agree 72% H labels agree 84% with “consensus” (biased) Tree model agrees 76% with consensus-- better than original labelers with each other Language model features alone (64%) are not good predictors Slide from Shriberg, Ang, Stolcke

Prosodic Predictors of Annoyed/Frustrated Pitch: high maximum fitted F0 in longest normalized vowel high speaker-norm. (1st 5 utts) ratio of F0 rises/falls maximum F0 close to speaker’s estimated F0 “topline” minimum fitted F0 late in utterance (no “?” intonation) Duration and speaking rate: long maximum phone-normalized phone duration long max phone- & speaker- norm.(1st 5 utts) vowel low syllable-rate (slower speech) Slide from Shriberg, Ang, Stolcke

Ang et al ‘02 Conclusions Emotion labeling is a complex task Prosodic features: duration and stylized pitch Speaker normalizations help Language model not a good feature