Presentation is loading. Please wait.

Presentation is loading. Please wait.

CEICES: a “Vertical” Approach Towards Recognizing Emotion in Speech Anton Batliner Lehrstuhl für Mustererkennung (Informatik 5) (Chair for Pattern Recognition)

Similar presentations


Presentation on theme: "CEICES: a “Vertical” Approach Towards Recognizing Emotion in Speech Anton Batliner Lehrstuhl für Mustererkennung (Informatik 5) (Chair for Pattern Recognition)"— Presentation transcript:

1 CEICES: a “Vertical” Approach Towards Recognizing Emotion in Speech Anton Batliner Lehrstuhl für Mustererkennung (Informatik 5) (Chair for Pattern Recognition) Friedrich-Alexander-Universität Erlangen-Nürnberg HUMAINE Plenary, Paris, June 4th, 2007

2 Seite 2 A. Batliner Click to edit Master title style Ergebnisse Mitarbeiterbefragung 2001 page 2 What is CEICES?

3 Seite 3 A. Batliner Click to edit Master title style Ergebnisse Mitarbeiterbefragung 2001 page 3 CEICES Initiative Combining Efforts for Improving automatic Classification of Emotional user States - a "forced co-operation" initiative Partners active:  from outside HUMAINE: TUM (Technische Universität München), FBK-irst (inside/outside)  from within HUMAINE, WP4: FAU, UA, LIMSI, TAU/AFEKA People:  Anton Batliner, Stefan Steidl, Björn Schuller, Dino Seppi, Thurid Vogt, Johannes Wagner, Laurence Devillers, Laurence Vidrascu, Noam Amir, Loic Kessous, Vered Aharonson

4 Seite 4 A. Batliner Click to edit Master title style Ergebnisse Mitarbeiterbefragung 2001 page 4 Idea Behind different research traditions at different sites somehow fossilized approaches at different sites co-operation pays off: pooling together competence and feature sets

5 Seite 5 A. Batliner Click to edit Master title style Ergebnisse Mitarbeiterbefragung 2001 page 5 Which data do we use?

6 Seite 6 A. Batliner Click to edit Master title style Ergebnisse Mitarbeiterbefragung 2001 page 6 Database German corpus with recordings of 51 ten to twelve year old children communicating with Sony's Aibo pet robot (9.2 hours of speech, words) data ± reverberated, transliterated, annotated: 5 labellers, 11 word-based "emotion" labels originator site (FAU) provides speech files, phonetic lexicon, definition of train and test samples, etc. effort for manual “pre-processing” only: ~80 k € researcher, ~80 k € students (conservative estimation)

7 Seite 7 A. Batliner Click to edit Master title style Ergebnisse Mitarbeiterbefragung 2001 page 7 A “Vertical” Approach segmentation, transliterationemotion labellingannotation of interactionmanual word segmentationmanual correction of F0syntactic annotationrule-based chunking system

8 Seite 8 A. Batliner Click to edit Master title style Ergebnisse Mitarbeiterbefragung 2001 page 8 Basics of Chosen Approach children: not exotic but normal annotation: with context (it’s speech, not sounds) majority voting (≤ 3 out of 5 agree) unit of annotation: the word, because  link to ASR  link to higher processing (syntax, dialogue, semantics)  smallest possible emotional unit  can be combined onto higher units of different size mapping onto 4 cover classes, due to sparse data:  Motherese (positive valence)  default class Neutral  "pre-stage" to negative: Emphatic  negative (Angry) "dimension" (smearing fine-grained differences between: touchy, reprimanding, angry)  AMEN sub-sample

9 Seite 9 A. Batliner Click to edit Master title style Ergebnisse Mitarbeiterbefragung 2001 page 9 The AMEN Sub-sample syntactically/semantically meaningful chunks with at least one AMEN word syntactic-prosodic chunking rules: IF (synt. bound. = sentence/free phrase/between vocatives) OR (pause  500 ms at any other synt. bound.) frequencies:  Motherese: 586  Neutral: 1998  Emphatic: 1045  Angry: 914 experiments so far with 2- or 3-fold speaker-independent cross-validation, upsampling for training

10 Seite 10 A. Batliner Click to edit Master title style Ergebnisse Mitarbeiterbefragung 2001 page 10 Type of Database Scenario  - acted  - prompted  - real-life  + elicited/induced  + volunteering  + application-oriented  - emotion-oriented Outcome  + spontaneous  + natural  + realistic  - selected acted - induced - natural

11 Seite 11 A. Batliner Click to edit Master title style Ergebnisse Mitarbeiterbefragung 2001 page 11 fully exploiting the state of the art: relevance of features

12 Seite 12 A. Batliner Click to edit Master title style Ergebnisse Mitarbeiterbefragung 2001 page 12 SEM S8I02102M1D4R5111L002000A C F N00X T PPOV_positive_valence BOW S5I01413M1D4R5101L200000A C F N05X T PTUM_logTF_ILS_und S S6I03001M1D4R5112L000000A C F N00X T Pspectral_cog_mean E S1I00004M1D4R5112L000000A C F N00X T__________PeneMean___ BOW S5I01270M1D4R5101L200000A C F N05X T PTUM_logTF_ILS_kom BOW S5I01344M1D4R5101L200000A C F N05X T PTUM_logTF_ILS_rum E S8I01036M1D4R5112L000000A C F N00X T PEnMax0_0_f36_min S S5I04027M1D4R5102L000000A C F N00X T PTUM_0_fa1_band_stdd SEM S8I02108M1D4R5111L002000A C F N04X T PPOV_positive_valence_norm BOW S5I01440M1D4R5101L200000A C F N05X T PTUM_logTF_ILS_wieder E S8I01229M1D4R5112L000000A C F N00X T PEnEneAbs0_0_f29_mean POS S5I00042M1D4R5101L020000A C F N00X T PTUM_sum_APN POS S5I00045M1D4R5101L050000A C F N00X T PTUM_sum_PAJ SEM S8I02106M1D4R5111L009000A C F N00X T PRES_rest D S8I01056M1D4R5111L000000A C F N00X T PDurAbsSyl0_0_f56_min P S8I01263M1D4R5111L000000A C F N00X T PF0RegCoeff0_0_f63_mean S S4I00080M1D4R5111L000000A C F N00X T Pvnhr P S4I01055M1D4R5111L000000A C F N00X T Pprctilep4A E S4I01001M1D4R5111L000000A C F N00X T Ploud_maxval V S4I00075M1D4R5111L000000A C F N00X T Pvshimapq3 E S8I01029M1D4R5112L000000A C F N00X T PEnEneAbs0_0_f29_min V S4I00074M1D4R5111L000000A C F N00X T Pvshimloc BOW S5I01217M1D4R5101L200000A C F N05X T PTUM_logTF_ILS_halt BOW S5I01382M1D4R5101L200000A C F N05X T PTUM_logTF_ILS_sollst C S5I03116M1D4R5102L000000A C F N00X T PTUM_MFCC10Average C S5I05195M1D4R5102L000000A C F N00X T PTUM_0_mfcc_c12_d_cnt E S6I01002M1D4R5112L000000A C F N00X T Penergy_max C S6I04113M1D4R5112L000000A C F N00X T Pmfcc4_var E S1I00006M1D4R5112L000000A C F N00X T__________PeneTau____ S S6I03006M1D4R5112L000000A C F N00X T Pspectral_cog_median SEM S8I02103M1D4R5111L003000A C F N00X T PNEV_negative_valence E S6I01114M1D4R5112L000000A C F N00X T Penergy_deltadelta_median Feature Encoding Scheme (WS at FAU 12/06)

13 Seite 13 A. Batliner Click to edit Master title style Ergebnisse Mitarbeiterbefragung 2001 page 13 SEM S8I02102M1D4R5111L002000A PPOV_positive_valence BOW S5I01413M1D4R5101L200000A PTUM_logTF_ILS_und S S6I03001M1D4R5112L000000A Pspectral_cog_mean E S1I00004M1D4R5112L000000A PeneMean___ BOW S5I01270M1D4R5101L200000A PTUM_logTF_ILS_kom BOW S5I01344M1D4R5101L200000A PTUM_logTF_ILS_rum E S8I01036M1D4R5112L000000A PEnMax0_0_f36_min S S5I04027M1D4R5102L000000A PTUM_0_fa1_band_stdd Zoom on Feature Encoding Scheme linguistic encoding acoustic encoding

14 Seite 14 A. Batliner Click to edit Master title style Ergebnisse Mitarbeiterbefragung 2001 page 14 Impact of Feature Types (SVM), Separate and Combined Classification of Chunks, F Values: Acoustic and Linguistic Features feature types # all red. (150)SFFS sep SFFS comb energy duration F spectral/formant cepstral voice quality wavelets bag of words part-of-speech higher semantics non-verbal disfluencies

15 Seite 15 A. Batliner Click to edit Master title style Ergebnisse Mitarbeiterbefragung 2001 page 15 Impact of Feature Types (SVM), Separate and Combined Classification of Chunks, F Values: Acoustic and Linguistic Features feature types # all red. (150)SFFS sep SFFS comb energy duration F spectral/formant cepstral voice quality wavelets bag of words part-of-speech higher semantics non-verbal disfluencies

16 Seite 16 A. Batliner Click to edit Master title style Ergebnisse Mitarbeiterbefragung 2001 page 16 Impact of Feature Types (SVM), Separate and Combined Classification of Chunks, F Values: Acoustic and Linguistic Features feature types # all red. (150)SFFS sep SFFS comb energy duration F spectral/formant cepstral voice quality wavelets bag of words part-of-speech higher semantics non-verbal disfluencies

17 Seite 17 A. Batliner Click to edit Master title style Ergebnisse Mitarbeiterbefragung 2001 page 17 Impact of Feature Types (SVM), Separate and Combined Classification of Chunks, F Values: Acoustic and Linguistic Features feature types # all red. (150)SFFS sep SFFS comb energy duration F spectral/formant cepstral voice quality wavelets bag of words part-of-speech higher semantics non-verbal disfluencies

18 Seite 18 A. Batliner Click to edit Master title style Ergebnisse Mitarbeiterbefragung 2001 page 18 Impact of Feature Types (SVM), Separate and Combined Classification of Chunks, F Values: Acoustic and Linguistic Features feature types # all red. (150)SFFS sep SFFS comb energy duration F spectral/formant cepstral voice quality wavelets bag of words part-of-speech higher semantics non-verbal disfluencies

19 Seite 19 A. Batliner Click to edit Master title style Ergebnisse Mitarbeiterbefragung 2001 page 19 Impact of Feature Types (SVM), Separate and Combined Classification of Chunks, F Values: Acoustic and Linguistic Features feature types # all red. (150)SFFS sep SFFS comb energy duration F spectral/formant cepstral voice quality wavelets bag of words part-of-speech higher semantics non-verbal disfluencies

20 Seite 20 A. Batliner Click to edit Master title style Ergebnisse Mitarbeiterbefragung 2001 page 20 Types of Approaches, SFFS, F Values knowledge-based and sequential (FAU, FBK: 118)*:58.8 knowledge-based (TAU, LIMSI: 312):53.3 brute-force (TUM, UA: 3304):54.9 all acoustic features(3714)63.4 all linguistic features (531)62.6 all together(4245)65.5 * word-based features, using manually corrected word boundaries, combined into chunk-based features

21 Seite 21 A. Batliner Click to edit Master title style Ergebnisse Mitarbeiterbefragung 2001 page 21 beyond the state of the art: units of analysis

22 Seite 22 A. Batliner Click to edit Master title style Ergebnisse Mitarbeiterbefragung 2001 page 22 Performance for Different Chunks, Preliminary Experiments at FBK, Small Feature Set, F Values # F optimal, i.e. adjacent identical labels (e.g.: NNN EE AAAA NNNNN M N MMM) turns (pause > 1.5 sec.) syntactic-prosodic rule system words syntactic rule system (clauses/phrases/ …) prosodic rule system (pause > 0.5 sec.) LM2 (bi-gram language model) POS-LM (part-of-speech language model)

23 Seite 23 A. Batliner Click to edit Master title style Ergebnisse Mitarbeiterbefragung 2001 page 23 Summing up our Results impact of acoustic feature types: energy most important, voice quality less important, other types in between (note domain-dependency!) impact of linguistic feature types: very high - to be checked with real Automatic Speech Recognition (ASR) output sequential approach promising chunking is the right way to do emotion recognition seems to be less prone to noise than comparable speech processing tasks (ICASSP 2007) PDA (Pitch Detection Algorithm) extraction errors deteriorate performance consistently but not detrimentally (ICPhS 2007)

24 Seite 24 A. Batliner Click to edit Master title style Ergebnisse Mitarbeiterbefragung 2001 page 24 In a Nutshell full exploitation of state-of-the-art approaches  > 4 k features  knowledge-based vs. brute-force  selection and classification and beyond state-of-the-art  towards new dimensions (UMUAI 2007)  meaningful units of analysis (chunking)  interaction/dialogue modelling  prototyping  personalization ……

25 Seite 25 A. Batliner Click to edit Master title style Ergebnisse Mitarbeiterbefragung 2001 page 25 and the Message of the Day people stare at classification performance which is tuned explicitely by highly sophisticated classifiers and implicitely by settings not obvious to the 'normal' reader such as  manual emotion chunking  using only prototypes  using acted data  and other devices

26 Seite 26 A. Batliner Click to edit Master title style Ergebnisse Mitarbeiterbefragung 2001 page 26 Thank you for your attention


Download ppt "CEICES: a “Vertical” Approach Towards Recognizing Emotion in Speech Anton Batliner Lehrstuhl für Mustererkennung (Informatik 5) (Chair for Pattern Recognition)"

Similar presentations


Ads by Google