Presentation on theme: "1 WP3 speech and emotion (analysis & recognition) human language technologies."— Presentation transcript:
1 WP3 speech and emotion (analysis & recognition) human language technologies
2 Databases and Annotations
3 UERLN: SYMPAFLY Fully automatic speech dialogue telephone system for flight reservation and booking, different system stages; 270 Dialogues. Annotations: word-based emotional user states, prosodic and conversational peculiarities; dialogue (step) success; emotional user states distribution follows nested Pareto (80/20) principle
4 UERLN: AIBO Children's interaction (age 10-12, 51 children, 9.2 hours of speech) with SONY’s AIBO robot, Wizard-of- Oz-scenario; cf. WP5 (plus English and read speech) Annotations: word-based emotional user states (holistic, 5 labellers) and prosodic peculiarities; alignment of children's utterances with AIBO's actions; manual correction of F0, labelling of voice quality. Emotional user states for the English data.
5 AIBO disobedient: from motherese to angry g'radeaus Aibolein ja M fein M gut M machst M du M *da M | *tz l"aufst du mal bitte nach links | stopp E Aibo stopp | nach links E umdrehen | nein M nein M nein M so M weit M *simma M noch M nicht M aufstehen M Schlafm"utze M komm M hoch M | ja M so M ist M es M guter M Hund M lauf mal jetzt nach links | nach links Aibo | Aibolein M aufstehen M *son M sonst M werd' M ich M b"ose M hoch E | nach A links A | Aibo A nach A links A | Aibolein A ganz A b"oser A Hund A jetzt A stehst A du A auf A | hoch A | dreh dich ein bisschen | ja M so ist es gut stopp Aibo stopp | *tz lauf g'radeaus |
6 UERLN: Different Conceptualizations Aibo straight on stop Aibo stop turn round to the left Aibo get up turn round to the left Aibo get up turn round, to the left Aibo get up get up Aibo now go left now straight on Aibo st´ straight on Straight on little Aibo ok great You‘re doing fine now please to the left stop Aibo stop turn to the left no no no we aren´t that far yet get up sleepyhead get up yes that´s a good dog now go left left Aibo little Aibo get up else I´m getting angry get up Aibo left little Aibo bad boy now get up turn a little ok that´s fine stop Aibo stop straight on Remote control tool Pet dog
7 Fully automatic speech dialogue telephone system 15,6 hours of Italian natural speech 9444 files (turns) -> 450 emotionally rich Word-level Orthographic transcription and word segmentation Prosodic peculiarities annotated Turn-level Holistic emotion labels Sympafly (cf. UERLN) for comparison and benchmarking ITC: Targhe
8 UKA: LDC2002S28 Elicited emotional speech database; native American English labels: 1 of 15 holistic speaker states per utterance; used in algorithm and feature set development
9 UKA: ISL Meeting Corpus 18 recordings of multi-party (mean 5.1 participants) meetings; mean 35 minute duration; American English Annotations: orthographic transcription; Verbmobil II, and discourse-level annotations.
10 Assessment of Data Collection: focus on spontaneous, realistic data important/new types of dialogues/interaction evaluation of annotations considerable percentage of realistic (processed and available) databases world-wide
11 Features & Classification
12 UERLN: Features large feature vector for a context of 2 words: 95 prosodic (duration, energy, F0, pauses) 80 spectral (HNR, formant based frequencies and energy) 24 MFCC 30 POS Language Models & dialogue based features
13 Baseline feature set 96 features Based on energy, duration, and pitch Final feature set 273 features (many redundant) Based on energy, duration, pitch, and pauses Different pitch extractors tried Normalized Cross Correlation Weighted Auto Correlation UERLN PDA Different subsets compared Different tests to reduce the feature space Principal component analysis ITC: Features
18 Final feature set 273 (acoustic/temporal) features 2 class problem (neutral and non neutral) ITC Classification II: ClassifierCARTNeural Networks DatabaseTargheSympaflyTargheSympafly RR73.2%73.9%74.2%73.5% CL70.7%72.1%69.4%74.1% RR = overall rec. rate; CL = class-wise averaged rec. rate N = neutral turns; NN = Non neutral turns
20 Assessment of Features a pool of many different features/feature groups implemented/compared prosodic features better (more consistent) than "spectral" features in realistic speech combination of knowledge sources improves performance relevance of single features (feature classes)?
21 Assessment of Classifications not much difference between different classifiers in classification performance (linear classifiers highly competitive in speaker-independent classification) large differences between speaker-dependent and speaker-independent classification
22 Categories & Dimensions cf. also tomorrow
23 UKA: Meeting Annotation Meeting audio appears to be rich in non-neutral speech. Open-set holistic labeling of 5 meetings by 3 labellers
24 UKA: towards new Dimensions for Social Interaction in Meetings denoting conflict, bulding community, or skepticism etc. weak power strong self support group
25 Assessment of Categories & Dimensions New categories, new dimensions, new consistency measure prototypical "full-blown" emotions are rare labels depend on type of data (call center, human- robot, different types of multi-party meeting) new dimensions that do not model emotions but interaction between participants in communication new entropy based consistency measure