Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 WP3 speech and emotion (analysis & recognition) human language technologies.

Similar presentations

Presentation on theme: "1 WP3 speech and emotion (analysis & recognition) human language technologies."— Presentation transcript:

1 1 WP3 speech and emotion (analysis & recognition) human language technologies

2 2 Databases and Annotations

3 3 UERLN: SYMPAFLY Fully automatic speech dialogue telephone system for flight reservation and booking, different system stages; 270 Dialogues. Annotations: word-based emotional user states, prosodic and conversational peculiarities; dialogue (step) success; emotional user states  distribution follows nested Pareto (80/20) principle

4 4 UERLN: AIBO Children's interaction (age 10-12, 51 children, 9.2 hours of speech) with SONY’s AIBO robot, Wizard-of- Oz-scenario; cf. WP5 (plus English and read speech) Annotations: word-based emotional user states (holistic, 5 labellers) and prosodic peculiarities; alignment of children's utterances with AIBO's actions; manual correction of F0, labelling of voice quality. Emotional user states for the English data.

5 5 AIBO disobedient: from motherese to angry g'radeaus Aibolein ja M fein M gut M machst M du M *da M | *tz l"aufst du mal bitte nach links | stopp E Aibo stopp | nach links E umdrehen | nein M nein M nein M so M weit M *simma M noch M nicht M aufstehen M Schlafm"utze M komm M hoch M | ja M so M ist M es M guter M Hund M lauf mal jetzt nach links | nach links Aibo | Aibolein M aufstehen M *son M sonst M werd' M ich M b"ose M hoch E | nach A links A | Aibo A nach A links A | Aibolein A ganz A b"oser A Hund A jetzt A stehst A du A auf A | hoch A | dreh dich ein bisschen | ja M so ist es gut stopp Aibo stopp | *tz lauf g'radeaus |

6 6 UERLN: Different Conceptualizations Aibo straight on stop Aibo stop turn round to the left Aibo get up turn round to the left Aibo get up turn round, to the left Aibo get up get up Aibo now go left now straight on Aibo st´ straight on Straight on little Aibo ok great You‘re doing fine now please to the left stop Aibo stop turn to the left no no no we aren´t that far yet get up sleepyhead get up yes that´s a good dog now go left left Aibo little Aibo get up else I´m getting angry get up Aibo left little Aibo bad boy now get up turn a little ok that´s fine stop Aibo stop straight on Remote control tool Pet dog

7 7 Fully automatic speech dialogue telephone system 15,6 hours of Italian natural speech 9444 files (turns) -> 450 emotionally rich Word-level Orthographic transcription and word segmentation Prosodic peculiarities annotated Turn-level Holistic emotion labels Sympafly (cf. UERLN) for comparison and benchmarking ITC: Targhe

8 8 UKA: LDC2002S28 Elicited emotional speech database; native American English labels: 1 of 15 holistic speaker states per utterance; used in algorithm and feature set development

9 9 UKA: ISL Meeting Corpus 18 recordings of multi-party (mean 5.1 participants) meetings; mean 35 minute duration; American English Annotations: orthographic transcription; Verbmobil II, and discourse-level annotations.

10 10 Assessment of Data Collection: focus on spontaneous, realistic data important/new types of dialogues/interaction evaluation of annotations considerable percentage of realistic (processed and available) databases world-wide

11 11 Features & Classification

12 12 UERLN: Features large feature vector for a context of  2 words:  95 prosodic (duration, energy, F0, pauses)  80 spectral (HNR, formant based frequencies and energy) 24 MFCC  30 POS Language Models & dialogue based features

13 13 Baseline feature set 96 features Based on energy, duration, and pitch Final feature set 273 features (many redundant) Based on energy, duration, pitch, and pauses Different pitch extractors tried Normalized Cross Correlation Weighted Auto Correlation UERLN PDA Different subsets compared Different tests to reduce the feature space Principal component analysis ITC: Features

14 14 UKA: 133 Acoustic Features pitch, unvoiced/unvoiced energy, quartiles (15) voice quality, Praat metrics (11) harmonicity, quartiles (5) and Praat metrics (3) zero-crossing rate vs energy, histogram (20) correlation/regression, coefficients (36) vocal tract volume, quartiles (25) duration/timing, verbmobil features (18)

15 15 Classifiers UERLN: Linear Discriminant Analysis LDA, Decision Trees (CARTs), Neural Networks NN, Support Vector machines SVM, Gaussian Mixtures GM, Language Models LM ITC: Decision Trees (CARTs), Neural Networks NN UKA: Linear, Neural Networks NN, Support Vector machines SVM

16 16 UERLN classification I: SympaFly GM/NN, 2 classes, neutral vs. problem, l ≠t dialogue step success, 2 classes, SVM: CL 82.5 dialogue success, 2 classes, CART: CL 85.4 combination CL RR Pros.+MFCC: 74.474.2 HNR+Pros: 74.876.0 HNR+MFCC: 70.469.8 RR: overall rec. rate CL: class-wise averaged rec. rate LDA, 4 classes SVM/CART, 2 classes, l oo

17 17 UERLN classification II: AIBO featuresCL pros/POS59.7 pros. /POS, opt.63.2 MFCC, frames45.4 MFCC, words58.3 pros/POS + MFCC65.3 4 classes "AMEN", NN joyful surprised motherese neutral (default) rest (non-neutral) bored helpless, hesitant emphatic touchy (=irritated) angry reprimanding

18 18 Final feature set 273 (acoustic/temporal) features 2 class problem (neutral and non neutral) ITC Classification II: ClassifierCARTNeural Networks DatabaseTargheSympaflyTargheSympafly RR73.2%73.9%74.2%73.5% CL70.7%72.1%69.4%74.1% RR = overall rec. rate; CL = class-wise averaged rec. rate N = neutral turns; NN = Non neutral turns

19 19 UKA Classification II: 133 utterance-level prosodic features, 15 classes, acted speech, 8 speakers: TaskClassifierFeat SelectionCL spk-indeplinearnone19.0% spk-indeplinearspk-indep21.3% spk-indeplinearspk-dep31.3% spk-deplinearnone38.7% spk-depSVMnone53.0%

20 20 Assessment of Features a pool of many different features/feature groups implemented/compared prosodic features better (more consistent) than "spectral" features in realistic speech combination of knowledge sources improves performance relevance of single features (feature classes)?

21 21 Assessment of Classifications not much difference between different classifiers in classification performance (linear classifiers highly competitive in speaker-independent classification) large differences between speaker-dependent and speaker-independent classification

22 22 Categories & Dimensions cf. also tomorrow

23 23 UKA: Meeting Annotation Meeting audio appears to be rich in non-neutral speech. Open-set holistic labeling of 5 meetings by 3 labellers

24 24 UKA: towards new Dimensions for Social Interaction in Meetings denoting conflict, bulding community, or skepticism etc. weak  power  strong self  support  group

25 25 Assessment of Categories & Dimensions New categories, new dimensions, new consistency measure prototypical "full-blown" emotions are rare labels depend on type of data (call center, human- robot, different types of multi-party meeting) new dimensions that do not model emotions but interaction between participants in communication new entropy based consistency measure

26 26 Thak you for your attention

Download ppt "1 WP3 speech and emotion (analysis & recognition) human language technologies."

Similar presentations

Ads by Google