Download presentation
Presentation is loading. Please wait.
1
Emotional Grounding in Spoken Dialog Systems Jackson Liscombe jaxin@cs.columbia.edu Giuseppe Riccardi Dilek Hakkani-Tür dsp3@research.att.com dtur@research.att.com
2
10.14.04 Jackson Liscombe -- CU / AT&T 2 In Spoken Dialog Systems, users can … … start angry. … get angry. … end angry. The Problem: Emotion
3
10.14.04 Jackson Liscombe -- CU / AT&T 3 Outline Previous Work Corpus Description Feature Extraction Classification Experiments
4
10.14.04 Jackson Liscombe -- CU / AT&T 4 Outline Previous Work Corpus Description Feature Extraction Classification Experiments
5
10.14.04 Jackson Liscombe -- CU / AT&T 5 Past Work I.Isolated Speech II.Spoken Dialog Systems
6
10.14.04 Jackson Liscombe -- CU / AT&T 6 Past Work: Isolated Speech Acted Data Features: F0/pitch energy speaking rate Researchers (late 1990s - present) Aubergé, Campbell, Cowie, Douglas-Cowie, Hirscheberg, Liscombe, Mozziconacci, Oudeyer, Pereira, Roach, Scherer, Schröder, Tato, Yuan, Zetterholm, …
7
10.14.04 Jackson Liscombe -- CU / AT&T 7 Past Work: Spoken Dialog Systems (1) Batliner, Huber, Fischer, Spilker, Nöth (2003) system = Verbmobil (Wizard of Oz scenarios) binary classification features: prosodic lexical (POS tags, swear words) dialog acts (repeat/repair/insult) 0.1% relative improvement using dialog acts
8
10.14.04 Jackson Liscombe -- CU / AT&T 8 Past Work: Spoken Dialog Systems (2) Ang, Dhillon, Krupski, Shriberg, Stolcke (2002) system = DARPA Communicator binary classification features: prosodic lexical (language model) dialog acts (repeats/repairs) 4% relative improvement using dialog acts
9
10.14.04 Jackson Liscombe -- CU / AT&T 9 Past Work: Spoken Dialog Systems (3) Lee, Narayanan (2004) system = Speechworks call-center binary classification features: prosodic lexical (weighted mutual information) dialog acts (repeat/rejection) 3% improvement using dialog acts
10
10.14.04 Jackson Liscombe -- CU / AT&T 10 Past Work: Summary Past research has focused on acoustic data But, moving toward grounding emotion in context (dialogs acts) Summer work = extend contextual features for better emotion prediction
11
10.14.04 Jackson Liscombe -- CU / AT&T 11 Outline Previous Work Corpus Description Feature Extraction Classification Experiments
12
10.14.04 Jackson Liscombe -- CU / AT&T 12 Corpus Description AT&T’s “How May I Help You? SM ” corpus (0300 Benchmark) Labeled with “Voice Signature” information: user state (emotion) gender age accent type
13
10.14.04 Jackson Liscombe -- CU / AT&T 13 Corpus Description StatisticTrainingTesting number user turns15,0135,000 number of dialogs4,2591,431 number of turns per dialog3.5 number of words per turn9.09.9
14
10.14.04 Jackson Liscombe -- CU / AT&T 14 User Emotion Distribution
15
10.14.04 Jackson Liscombe -- CU / AT&T 15 Emotion Labels Original Set: Positive/Neutral Somewhat Frustrated Very Frustrated Somewhat Angry Very Angry Other Somewhat Negative Very Negative Reduced Set: Positive Negative
16
10.14.04 Jackson Liscombe -- CU / AT&T 16 Corpus Description: Binary User States StatisticTrainingTesting % of turns that are positive88.1%73.1% % of dialogs with at least one negative turn 24.8%44.7% % of negative dialogs that start negative 43.5%59.9% % of negative dialogs that end negative 42.4%48.7%
17
10.14.04 Jackson Liscombe -- CU / AT&T 17 Outline Previous Work Corpus Description Feature Extraction Classification Experiments
18
10.14.04 Jackson Liscombe -- CU / AT&T 18 Feature Set Space Features Context ProsodicLexicalDiscourse turn i turn i-1 turn i turn i-2 turn i-1 …………
19
10.14.04 Jackson Liscombe -- CU / AT&T 19 Feature Set Space: Context Overview Features Context ProsodicLexicalDiscourse turn i Isolated turn i-1 turn i DifferentialsPrior Statistics turn i-2 turn i-1 …
20
10.14.04 Jackson Liscombe -- CU / AT&T 20 Lexical Features Language Model (ngrams) Examples of words significantly correlated with negative user state (p<0.001) : 1 st person pronouns: ‘I’, ‘me’ requests for a human operator: ‘person’, ‘talk’, ‘speak’, ‘human’, ‘machine’ billing-related words: ‘dollars’, ‘cents’ curse words: …
21
10.14.04 Jackson Liscombe -- CU / AT&T 21 Prosodic Features Praat open source tool for speech analysis, synthesis, statistics, manipulation, … Paul Boersma and David Weenink University of Amsterdam www.praat.org www.praat.org
22
10.14.04 Jackson Liscombe -- CU / AT&T 22 Prosodic Features Pitch (F0) 1. overall minimum 2. overall maximum 3. overall median 4. overall standard deviation 5. mean absolute slope 6. slope of final vowel 7. longest vowel mean Other 8. local jitter over longest vowel Energy 9. overall minimum 10. overall maximum 11. overall mean 12. overall standard deviation 13. longest vowel mean Speaking Rate 14. vowels per second 15. mean vowel length 16. ratio voiced frames to total frames 17. percent internal silence
23
Prosodic Features in Action
24
10.14.04 Jackson Liscombe -- CU / AT&T 24 “That amount is incorrect.” A View Of Pitch 20020508/0508095935fhf2910 Very Frustrated Somewhat Frustrated
25
10.14.04 Jackson Liscombe -- CU / AT&T 25 PersonTranscriptionEmotionPlay ……… Agent: I'm sorry, I didn't understand that. How may I help you? User: please an operator please [.hst ] may I speak- Positive/Neutral Agent: I'm sorry we're having trouble. Can we try one more time? Just briefly tell me how I may help you. User:may I speak to an assistant pleaseSomewhat Frustrated Agent:Are you calling from your home phone? User:yesSomewhat Angry 20020221/0221080552atf1536 Pitch/Energy/Speaking Rate : Change
26
10.14.04 Jackson Liscombe -- CU / AT&T 26 20020221/0221080552atf1536 Pitch/Energy/Speaking Rate : Change
27
10.14.04 Jackson Liscombe -- CU / AT&T 27 PersonTranscriptionEmotionPlay Agent: Hello, I am an AT&T automated assistant. You can speak naturally to me. How may I help you ? User: I need to find out about a number that I don't recognize Positive/Neutral Agent: Would you like to look up a number you don't recognize on you bill? User:yes I wouldPositive/Neutral Agent:Are you calling from your home phone? User:yes I amPositive/Neutral 20020317/0317220739atf9051 Pitch/Energy/Speaking Rate : No Change
28
10.14.04 Jackson Liscombe -- CU / AT&T 28 20020317/0317220739atf9051 Pitch/Energy/Speaking Rate : No Change
29
10.14.04 Jackson Liscombe -- CU / AT&T 29 Feature Set Space: Baseline Features Context ProsodicLexicalDiscourse turn i turn i-1 turn i turn i-2 turn i-1 …………
30
10.14.04 Jackson Liscombe -- CU / AT&T 30 Discourse Features Dialog Acts greeting re-prompt confirmation specification acknowledgment disambiguation …
31
10.14.04 Jackson Liscombe -- CU / AT&T 31 Feature Set Space: State-of-the-Art Features Context ProsodicLexicalDiscourse turn i turn i-1 turn i turn i-2 turn i-1 …………
32
10.14.04 Jackson Liscombe -- CU / AT&T 32 Contextual Features Lexical (2) edit distance with previous 2 turns Discourse (10) turn number call type repetition with previous 2 turns dialog act repetition with previous 2 turns Prosodic (34) 1 st and 2 nd order differentials for each feature Other (2) user state of previous 2 turns
33
10.14.04 Jackson Liscombe -- CU / AT&T 33 Feature Set Space: Contextual Features Context ProsodicLexicalDiscourse turn i turn i-1 turn i turn i-2 turn i-1 …………
34
10.14.04 Jackson Liscombe -- CU / AT&T 34 Outline Previous Work Corpus Description Feature Extraction Classification Experiments
35
10.14.04 Jackson Liscombe -- CU / AT&T 35 Experimental Design Training size = 15,013 turns Testing size = 5,000 turns Most frequent user state (positive) accounts for 73.1% of testing data Learning Algorithm Used: BoosTexter (boosting w/ weak learners) continuous and discrete valued features 2000 iterations
36
10.14.04 Jackson Liscombe -- CU / AT&T 36 Performance Accuracy Summary Feature SetAccuracy Rel. Improv. over Baseline Most Freq. State73.1%----- Baseline76.1%----- State-of-the-Art77.0%1.2% Contextual79.0%3.8%
37
10.14.04 Jackson Liscombe -- CU / AT&T 37 Conclusions Baseline (prosodic and lexical features) leads to improved emotion prediction over chance State-of-the-Art (baseline plus dialog acts) gives further improvement Innovative contextual features: improves emotion prediction even further Towards a computation model of emotional grounding
38
Thank You
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.