Presentation is loading. Please wait.

Presentation is loading. Please wait.

Outline Why study emotional speech?

Similar presentations


Presentation on theme: "Outline Why study emotional speech?"— Presentation transcript:

0 Emotional Speech CS 4706 Julia Hirschberg (thanks to Jackson Liscombe and Lauren Wilcox for some slides)

1 Outline Why study emotional speech?
Why is modeling emotional speech so difficult? Production and perception studies Voice Quality features: the holy grail CS 4706

2 Why study emotional speech?
Recognition Customer-care centers Tutoring systems Automated agents (Wildfire) Generation Characteristics of ‘emotional speech’ little understood, so hard to produce: …a voice that sounds friendly, sympathetic, authoritative…. TTS systems Games CS 4706

3 Emotion in Spoken Dialogue Systems
Batliner, Huber, Fischer, Spilker, Nöth (2003) Verbmobil (Wizard of Oz scenarios) Ang, Dhillon, Krupski, Shriberg, Stolcke (2002) DARPA Communicator Liscombe, Guicciardi, Tur, Gokken-Tur (2005) “How May I Help You?” call center Lee, Narayanan (2004) Speechworks call-center Liscombe, Hirschberg, Venditti (2005) ITSpoke Tutoring System (physics) CS 4706

4 Why is emotional speech so hard to model?
Colloquial definitions of speakers and listeners ≠ technical definitions Utterances may convey multiple emotions simultaneously Result: Human consensus low Hard to get reliable training data CS 4706

5 Spontaneous Corpora Unconstrained [Campbell, 2003] [Roach, 2000]
[Cowie et al., 2001] Call centers [Vidrascu & Devillers, 2005] [Ang et al., 2002] [Litman and Forbes-Riley, 2004] [Batliner et al., 2003] [Lee & Narayanan, 2005] Meetings [Wrede and Shriberg, 2003] CS 4706

6 anxious bored encouraging Acted Corpora happy sad angry confident
frustrated friendly interested anxious bored encouraging CS 4706

7 LDC Emotional Prosody and Transcripts corpus
Semantically neutral (dates and numbers) 8 actors 15 emotions CS 4706

8 Are Emotions Mutually Exclusive?
User study to classify tokens from LDC Emotional Prosody corpus 10 emotions only: Positive: confident, encouraging, friendly, happy, interested Negative: angry, anxious, bored, frustrated, sad Example CS 4706

9 Emotion Intercorrelations
sad angry bored frust anxs friend conf happy inter encour 0.44 0.26 0.22 -0.27 -0.32 -0.42 -0.33 0.70 0.21 -0.41 -0.37 -0.09 0.14 -0.14 -0.28 -0.17 frustrated 0.32 -0.43 -0.47 -0.16 -0.39 anxious -0.25 friendly 0.77 0.59 0.75 confident 0.45 0.51 0.58 0.73 interested 0.62 encouraging (p < 0.001) CS 4706

10 Results Emotions are heavily correlated Positive with positive
Negative with negative Emotions are non-exclusive Can they be clustered empirically Activation Valency CS 4706

11 Global Pitch Statistics
Different Valence/Activation Global Pitch Statistics CS 4706

12 Different Valence/Same Activation
CS 4706

13 Identifying Emotions Automatic Acoustic-prosodic
[Davitz, 1964] [Huttar, 1968] Global characterization pitch loudness speaking rate Intonational Contours [Mozziconacci & Hermes, 1999] Spectral Tilt [Banse & Scherer, 1996] [Ang et al., 2002] CS 4706

14 Machine Learning Experiment
RIPPER 90/10 split Binary classification for each emotion Results 62% average baseline 75% average accuracy Acoustic-prosodic features for activation /H-L%/ for negative; /L-L%/ for positive Spectral tilt for valence? CS 4706

15 Accuracy Distinguishing One Emotion from the Rest
Baseline Accuracy angry 69.32% 77.27% confident 75.00% happy 57.39% 80.11% interested 69.89% 74.43% encouraging 52.27% 72.73% sad 61.93% anxious 55.68% 71.59% bored 66.48% 78.98% friendly 59.09% 73.86% frustrated CS 4706

16 A Call Center Application
AT&T’s “How May I Help You?” system Customers often angry and frustrated CS 4706

17 HMIHY Example Very Frustrated Somewhat Frustrated CS 4706

18 Pitch, Energy and Rate CS 4706

19 Features Automatic Acoustic-prosodic Contextual [Cauldwell, 2000]
Lexical [Schröder, 2003] [Brennan, 1995] Pragmatic [Ang et al., 2002] [Lee & Narayanan, 2005] CS 4706

20 Rel. Improv. over Baseline
Results Feature Set Accuracy Rel. Improv. over Baseline Majority Class 73.1% ----- pros+lex 76.1% pros+lex+da 77.0% 1.2% all 79.0% 3.8% CS 4706

21 Tutoring Systems Should Respond to Uncertainty
SCoT [Pon-Barry et al. 2006] Responding to uncertainty Active listening Hinting vs. paraphrasing Features examined Latency Filled pauses Hedges Performance metric Learning gain But no improvement by responding to uncertainty CS 4706

22 What does uncertainty sound like?
CS 4706

23 [pr01_sess00_prob58] CS 4706

24 Uncertainty in ITSpoke
um <sigh> I don’t even think I have an idea here now .. mass isn’t weight mass is the space that an object takes up is that mass? One ‘.’ corresponds to 0.25 seconds. [ :92-113] CS 4706

25 ITSpoke Experiment Human-Human Corpus
AdaBoost(C4.5) 90/10 split in WEKA Classes: Uncertain vs Certain vs Neutral Results: Features Accuracy Baseline 66% Acoustic-prosodic 75% + contextual 76% + breath-groups 77% CS 4706

26 ITSpoke Results Emotion Precision Recall F-measure certain 0.611 0.602
0.606 uncertain 0.515 0.393 0.446 neutral 0.846 0.891 0.868 Emotion label Classified as certain uncertain neutral 80 11 42 26 35 28 25 22 384 CS 4706

27 Voice Quality and Emotion
Perceptual coloring Derived from a variety of laryngeal and supralaryngeal features modal, creaky, whispered, harsh, breathy, ... Correlates with emotion Laver ‘80, Scherer ‘86, Murray& Arnott ’93, Laukkanen ’96, Johnstone & Scherer ’99, Gobl & Chasaide, ‘03, Fernandez ‘00 CS 4706

28 Phonation Gestures Adductive tension: interarytenoid muscles adduct the arytenoid muscles Medial compression: adductive force on vocal processes- adjustment of ligamental glottis Longitudinal pressure: tension of vocal folds CS 4706

29 Modal Voice “Neutral” mode Muscular adjustments moderate
Vibration of vocal folds periodic, full closing of glottis, no audible friction Frequency of vibration and loudness in low to mid range for conversational speech CS 4706

30 Tense Voice Very strong tension of vocal folds, very high tension in vocal tract CS 4706

31 Whispery Voice Very low adductive tension
Medial compression moderately high Longitudinal tension moderately high Little or no vocal fold vibration Turbulence generated by friction of air in and above larynx CS 4706

32 Creaky Voice Vocal fold vibration at low frequency, irregular
Low tension (only ligamental part of glottis vibrates) The vocal folds strongly adducted Longitudinal tension weak Moderately high medial compression CS 4706

33 Breathy Voice Tension low Minimal adductive tension
Weak medial compression Medium longitudinal vocal fold tension Vocal folds do not come together completely, leading to frication CS 4706

34 Estimating Voice Quality
Estimate wrt controlled neutral quality But how do we know the control is truly “neutral”? Must must match the natural laryngeal behavior to laboratory “neutral” Our knowledge of models of vocal fold movements may be inadequate for describing real phonation Known relationships between acoustic signal and voice source are complex Only can observe behavior of voicing indirectly so prone to error. Direct source data obtained by invasive techniques which may interfere with signal CS 4706

35 Next Class Deceptive Speech CS 4706


Download ppt "Outline Why study emotional speech?"

Similar presentations


Ads by Google