Presentation is loading. Please wait.

Presentation is loading. Please wait.

L-Devillers - Plenary 5 juin 2007 1 Emotional Speech detection Laurence Devillers, LIMSI-CNRS, Expression of emotions in.

Similar presentations


Presentation on theme: "L-Devillers - Plenary 5 juin 2007 1 Emotional Speech detection Laurence Devillers, LIMSI-CNRS, Expression of emotions in."— Presentation transcript:

1 L-Devillers - Plenary 5 juin 2007 1 Emotional Speech detection Laurence Devillers, LIMSI-CNRS, devil@limsi.fr devil@limsi.fr Expression of emotions in Speech synthesis Marc Schröder, DFKI, schroed@dfki.de schroed@dfki.de Humaine Plenary Meeting, 4-6 June 2007, Paris Emotional Speech

2 L-Devillers - Plenary 5 juin 2007 2 Overview Challenge: Real-time system for “real-life” emotional speech detection in order to build an affectively competent agent Emotion is considered in the broad sense Real-life emotions are often shaded, blended, masked emotions due to social aspects

3 L-Devillers - Plenary 5 juin 2007 3 State-of-the-art State-of-the-art Static emotion detection system (emotional unit level: word, chunk, sentence) Static emotion detection system (emotional unit level: word, chunk, sentence) Statistical approach (such as SVM) using large amount of data to train models Statistical approach (such as SVM) using large amount of data to train models 4-6 emotions detected, rarely more 4-6 emotions detected, rarely more Emotion detection P(E i /O) 0: 0bservation E models The scheme shows the components of an automatic emotion recognition system The performances on realistic data (CEICES): 2 emotions > 80% 4 emotions >60% Extraction features

4 L-Devillers - Plenary 5 juin 2007 4 Automatic emotion detection The difficulty of the detection task increases with the variability of the emotional speech expression. 4 dimensions: Speaker (dependent/independent, age, gender, health), Speaker (dependent/independent, age, gender, health), Environment (transmission channel, noise environment), Environment (transmission channel, noise environment), Number and type of emotions (primary, secondary) Number and type of emotions (primary, secondary) Acted/real-life data and applications context Acted/real-life data and applications context

5 L-Devillers - Plenary 5 juin 2007 5 Automatic emotion detection: Research evolution Automatic emotion detection: Research evolution 2007 2003 Speakers Emotion representation Acted/Woz/ real-life data Environment Transmission Speaker-independent: Adaptation to gender with adaptation 1996 Positive/Negative emotions HMI Emotion/Unemotion (WoZ) Primary acted-emotions Channel-independent Public place Speaker-dependent Pluri-speaker Personality, Health, Age, Culture actors documentaires journaux fictions TV clips Phone Quiet room Channel-dependent 2- 5 realistic emotions (children, CEICES), HMI Real-life call-center emotions Emotion in interaction. >5 Real emotions. >4 acted -emotions WoZ. Call center data. HMI. Voice Superposition

6 L-Devillers - Plenary 5 juin 2007 6 Challenge with spontaneous emotions Authenticity is present but there is no control on the emotion Authenticity is present but there is no control on the emotion Need to find appropriate labels and measures for annotation validation Need to find appropriate labels and measures for annotation validation Blended emotions (Scherer: Geneva Airport Lost Luggage Study ) Blended emotions (Scherer: Geneva Airport Lost Luggage Study ) Annotation and Validation of annotation Annotation and Validation of annotation Expert annotation phase by several coders (10 coders, CEICES (5 coders), often only two) Expert annotation phase by several coders (10 coders, CEICES (5 coders), often only two) Control of the quality of annotations: Control of the quality of annotations: Intra/Inter annotations agreement Intra/Inter annotations agreement Perception tests Perception tests Validate the annotation scheme and the annotations Validate the annotation scheme and the annotations Perception of emotion mixtures (40 subjects) NEG/POS valence Importance of the context Give measure for comparing human perception with automatic detection. Give measure for comparing human perception with automatic detection.

7 L-Devillers - Plenary 5 juin 2007 7 Human-Human Real-life Corpora LIMSICorpusAudio/Audio-VisuelSize#Speakers Emotion classes Stock exchange Call center French4.5h 100 callers/ 4 agents Anger, Fear, Satisfaction, Excuse Financial loan Call center French2h 250 callers/ 2 agents Anger, Fear, Satisfaction, Excuse Medical Call center French20h 784 callers/ 7 agents 20 classes 7 broad-classes EmoTaboo Emotion induction GameFrench7h30 10 speakers > 20 classes EmoTV TV news French < 1h 100 speakers 14 - 35 classes 7 macro SAFE actors Movies7h 400 speakers Fear, other Neg, Pos. Audio Visuel

8 L-Devillers - Plenary 5 juin 2007 8 Context-dependent emotion labels Context-dependent emotion labels Do the labels represent the emotion of a considered task or context? Example: Real-life emotion studies (call center): The Fear label represents different expressions of Fear due to different contexts: Fear for callers of losing money, Fear for callers for life, Fear for agents of mistaking The difference is not just a question of intensity/activation -> Primary/Secondary fear ? -> Degree of Urgency/reality of the threat ? Fear in the fiction (movies): study of many different contexts How to generalize ? Should we define labels in function of the type of context? We just defined the social role (agent/caller) as a context See Poster of C. Clavel

9 L-Devillers - Plenary 5 juin 2007 9 Emotional labels The majority of the detection systems uses emotion discrete representation The majority of the detection systems uses emotion discrete representation Need a sufficient amount of data. In that objective, we use hierarchical organization of labels (LIMSI example) Need a sufficient amount of data. In that objective, we use hierarchical organization of labels (LIMSI example) Coarse levelFine-grained level (8 classes)(20 classes + Neutral) FearFear, Anxiety, Stress, Panic, Embarrassment AngerAnnoyance, Impatience, ColdAnger, HotAnger Sadness Sadness, Dismay, Disappointment, Resignation, Despair Hurt Surprise Relief InterestInterest, Compassion Other Positive Amusement

10 L-Devillers - Plenary 5 juin 2007 10 No bad coders but different perceptions Combining annotations of different coders: a Soft vector of emotions Labeler 1: (Major) Annoyance, (Minor) Interest Labeler 2: (Major) Stress, (Minor) Annoyance  (wM/W Annoyance, wm/W Stress, wm/W Interest) For wM=2, wm=1,W=6  (0.5 Annoyance, 0.33 Stress, 0.17 Interest).  (0.5 Annoyance, 0.33 Stress, 0.17 Interest).

11 L-Devillers - Plenary 5 juin 2007 11 Speech data processing WEKA toolkit :(www.cs.waikato.ac.nz - Witten & Franck, 1999)www.cs.waikato.ac.nz ~ 200 cues Prosodic - F0 - Formants - Energy Micro- prosody - Jitter - Shimmer… Disfluences Affect bursts transcription WEKA: - attribute Selection - SVM,.. Praat L u mots Preprocessing Stemming N-grams model combination LIMSI – see Poster L. Vidrascu Standard features Pich level, range, Energy level, range Speaking rate Spectral features (formants, Mfccs) Less standard Voice quality: local disturbances (jitter/shimmer) Disfluences (pauses, filler pauses) Affect bursts We need to automatically detect affect bursts and to add new features such as voice quality features Phone signal is not of sufficient quality for many existing techniques see Ni Chasaide poster

12 L-Devillers - Plenary 5 juin 2007 12 Fe:fear, Sd:sadness; Ag:anger; Ax anxi, St:stress, Re relief LIMSI: Results with paralinguistic cues (SVMs): from 2 to 5 emotion classes (% of good detection)

13 L-Devillers - Plenary 5 juin 2007 13 25 best features for 5 emotions detection The difference of the media channel (phone/microphone), the type of data (adult vs. children, realistic vs. naturalistic) and the emotion classes have an impact on the best relevant set of features. Out of our 5 classes, Sadness is the least recognized without mixing the cues. Features from all the classes were selected (different from one class to another) Anger, Fear, Sadness, Relief Neutral state

14 L-Devillers - Plenary 5 juin 2007 14 Real-life emotional system Real-life emotional system System based on acted data -> inadequate for real-life data detection (Batliner) GEMEP/CEMO comparison: different emotions First experiments show only an acceptable detection score for Anger. Real-life emotion studies are necessary Detection results on call center data: state of the art for « realistic emotions » > 80% 2 emotions, > 60% 4 emotions, ~55% 5 emotions > 80% 2 emotions, > 60% 4 emotions, ~55% 5 emotions

15 L-Devillers - Plenary 5 juin 2007 15 Short-term: Acceptable solutions for targeted applications are in reach Use dynamic model of emotion for real-time emotion detection (history memory) New features: Automatically extracted information on voice quality, affect bursts and disfluences from the signal that does not require exact speech recognition. Detect relaxed/tensed voice (Scherer) Add contextual knowledge to the blind statistical model: social role, type of action, regulation (adapt emotional expression to strategic interaction goals (faces theory, Goffman)). Long-term Emotion dynamic processus based on appraisal model. Combining informations at several levels: acoustic/linguistic, multimodal cues, adding contextual informations (social role) Challenges ahead

16 L-Devillers - Plenary 5 juin 2007 16 Demo (coffee break…)

17 L-Devillers - Plenary 5 juin 2007 17 Thanks


Download ppt "L-Devillers - Plenary 5 juin 2007 1 Emotional Speech detection Laurence Devillers, LIMSI-CNRS, Expression of emotions in."

Similar presentations


Ads by Google