Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.

Slides:



Advertisements
Similar presentations
Human Speech Recognition Julia Hirschberg CS4706 (thanks to John-Paul Hosum for some slides)
Advertisements

Detecting Certainness in Spoken Tutorial Dialogues Liscombe, Hirschberg & Venditti Using System and User Performance Features to Improve Emotion Detection.
1 The Effect of Pitch Span on the Alignment of Intonational Peaks and Plateaux Rachael-Anne Knight University of Cambridge.
Two Types of Listeners? Marie Nilsenov á (Tilburg University) 1. Background When you and I listen to the same utterance, we may not perceive the linguistic.
Automatic Prosodic Event Detection Using Acoustic, Lexical, and Syntactic Evidence Sankaranarayanan Ananthakrishnan, Shrikanth S. Narayanan IEEE 2007 Min-Hsuan.
Agustín Gravano 1 · Stefan Benus 2 · Julia Hirschberg 1 Elisa Sneed German 3 · Gregory Ward 3 1 Columbia University 2 Univerzity Konštantína Filozofa.
Combining Prosodic and Text Features for Segmentation of Mandarin Broadcast News Gina-Anne Levow University of Chicago SIGHAN July 25, 2004.
EMOTIONS NATURE EVALUATION BASED ON SEGMENTAL INFORMATION BASED ON PROSODIC INFORMATION AUTOMATIC CLASSIFICATION EXPERIMENTS RESYNTHESIS VOICE PERCEPTUAL.
Comparing American and Palestinian Perceptions of Charisma Using Acoustic-Prosodic and Lexical Analysis Fadi Biadsy, Julia Hirschberg, Andrew Rosenberg,
Recognition of Voice Onset Time for Use in Detecting Pronunciation Variation ● Project Description ● What is Voice Onset Time (VOT)? – Physical Realization.
Spoken Language Generation Project II Synthesizing Emotional Speech in Fairy Tales.
Presented by Ravi Kiran. Julia Hirschberg Stefan Benus Jason M. Brenier Frank Enos Sarah Friedman Sarah Gilman Cynthia Girand Martin Graciarena Andreas.
Advanced Technology Center Stuttgart EMOTIONAL SPACE IMPROVES EMOTION RECOGNITION Raquel Tato, Rocio Santos, Ralf Kompe Man Machine Interface Lab Advance.
Spoken Language Processing Lab Who we are: Julia Hirschberg, Stefan Benus, Fadi Biadsy, Frank Enos, Agus Gravano, Jackson Liscombe, Sameer Maskey, Andrew.
Ability to attract and retain followers by virtue of personal characteristics - not traditional or political office (Weber ‘47) What makes an individual.
Automatic Prosody Labeling Final Presentation Andrew Rosenberg ELEN Speech and Audio Processing and Recognition 4/27/05.
Outline Why study emotional speech?
Extracting Social Meaning Identifying Interactional Style in Spoken Conversation Jurafsky et al ‘09 Presented by Laura Willson.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg, Julia Hirschberg Columbia University Interspeech /14/06.
Dianne Bradley & Eva Fern á ndez Graduate Center & Queens College CUNY Eliciting and Documenting Default Prosody ABRALIN23-FEB-05.
On the Correlation between Energy and Pitch Accent in Read English Speech Andrew Rosenberg Weekly Speech Lab Talk 6/27/06.
1 Evidence of Emotion Julia Hirschberg
Turn-taking in Mandarin Dialogue: Interactions of Tone and Intonation Gina-Anne Levow University of Chicago October 14, 2005.
Classification of Discourse Functions of Affirmative Words in Spoken Dialogue Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Shira Mitchell, Ilia.
9/5/20051 Acoustic/Prosodic and Lexical Correlates of Charismatic Speech Andrew Rosenberg & Julia Hirschberg Columbia University Interspeech Lisbon.
10/10/20051 Acoustic/Prosodic and Lexical Correlates of Charismatic Speech Andrew Rosenberg & Julia Hirschberg Columbia University 10/10/05 - IBM.
Producing Emotional Speech Thanks to Gabriel Schubiner.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Schizophrenia and Depression – Evidence in Speech Prosody Student: Yonatan Vaizman Advisor: Prof. Daphna Weinshall Joint work with Roie Kliper and Dr.
Categorizing Emotion in Spoken Language Janine K. Fitzpatrick and John Logan METHOD RESULTS We understand emotion through spoken language via two types.
Automated Scoring of Picture- based Story Narration Swapna Somasundaran Chong Min Lee Martin Chodorow Xinhao Wang.
On Speaker-Specific Prosodic Models for Automatic Dialog Act Segmentation of Multi-Party Meetings Jáchym Kolář 1,2 Elizabeth Shriberg 1,3 Yang Liu 1,4.
SPEECH CONTENT Spanish Expressive Voices: Corpus for Emotion Research in Spanish R. Barra-Chicote 1, J. M. Montero 1, J. Macias-Guarasa 2, S. Lufti 1,
Evaluating prosody prediction in synthesis with respect to Modern Greek prenuclear accents Elisabeth Chorianopoulou MSc in Speech and Language Processing.
The Effect of Pitch Span on Intonational Plateaux Rachael-Anne Knight University of Cambridge Speech Prosody 2002.
1 Computation Approaches to Emotional Speech Julia Hirschberg
Predicting Student Emotions in Computer-Human Tutoring Dialogues Diane J. Litman&Kate Forbes-Riley University of Pittsburgh Department of Computer Science.
Recognizing Discourse Structure: Speech Discourse & Dialogue CMSC October 11, 2006.
M. Brendel 1, R. Zaccarelli 1, L. Devillers 1,2 1 LIMSI-CNRS, 2 Paris-South University French National Research Agency - Affective Avatar project ( )
Performance Comparison of Speaker and Emotion Recognition
Predicting Voice Elicited Emotions
MIT Artificial Intelligence Laboratory — Research Directions The Next Generation of Robots? Rodney Brooks.
Subjective evaluation of an emotional speech database for Basque Aholab Signal Processing Laboratory – University of the Basque Country Authors: I. Sainz,
Interpreting Ambiguous Emotional Expressions Speech Analysis and Interpretation Laboratory ACII 2009.
On the role of context and prosody in the interpretation of ‘okay’ Julia Agustín Gravano, Stefan Benus, Julia Hirschberg Héctor Chávez, and Lauren Wilcox.
Predicting and Adapting to Poor Speech Recognition in a Spoken Dialogue System Diane J. Litman AT&T Labs -- Research
Sentence Durations and Accentedness Judgments
Investigating Pitch Accent Recognition in Non-native Speech
University of Rochester
August 15, 2008, presented by Rio Akasaka
Towards Emotion Prediction in Spoken Tutoring Dialogues
Studying Intonation Julia Hirschberg CS /21/2018.
Meanings of Intonational Contours
Studying Intonation Julia Hirschberg CS /21/2018.
Intonational and Its Meanings
The American School and ToBI
Comparing American and Palestinian Perceptions of Charisma Using Acoustic-Prosodic and Lexical Analysis Fadi Biadsy, Julia Hirschberg, Andrew Rosenberg,
Meanings of Intonational Contours
Anastassia Loukina, Klaus Zechner, James Bruno, Beata Beigman Klebanov
Representing Intonational Variation
Recognizing Structure: Sentence, Speaker, andTopic Segmentation
Fadi Biadsy. , Andrew Rosenberg. , Rolf Carlson†, Julia Hirschberg
Agustín Gravano & Julia Hirschberg {agus,
Comparative Studies Avesani et al 1995; Hirschberg&Avesani 1997
Emotional Speech Julia Hirschberg CS /8/2018.
Agustín Gravano1 · Stefan Benus2 · Julia Hirschberg1
Patricia Keating, Marco Baroni, Sven Mattys, Rebecca Scarborough,
Emotional Speech Julia Hirschberg CS /16/2019.
Low Level Cues to Emotion
Acoustic-Prosodic and Lexical Entrainment in Deceptive Dialogue
Presentation transcript:

Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003

Motivation A speaker’s emotional state conveys important and potentially useful information –To recognize (e.g. Spoken Dialogue Systems, tutoring systems ) –To generate (e.g. games) –If we know what emotion is and what aspects of productions convey different types Defining emotion in multidimensional space –Valence: happy vs. sad –Activation: sad vs. despairing

Features that might convey emotion –Acoustic and prosodic –Lexical and syntactic –Facial and gestural

Previous Research Emotion detection in corpus studies –Batliner, Noeth, et al; Ang et al: anger/frustration in dialogue systems –Lee et al: pos/neg emotion in call center data –Ringel & Hirschberg: voic … in laboratory studies –Forced choice among emotion categories –Sometimes with confidence rating

Problems Hard to identify emotions reliably –Variation in ‘emotional’ utterances: production and perception –How can we obtain better training data? Easier to detect variation in activation than in valence –Variation in ‘emotional’ utterances –Large space of potential features –Which are necessary and sufficient?

New methods for eliciting judgments Hypothesis: Utterances in natural speech may evoke multiple emotions –Elicit judgments on multiple scales multiple scales –Tokens from LDC Emotional Prosody Speech and Transcripts Corpus Professional actors reading 4-syllable dates and numbers disgust, panic, anxiety, hot anger, cold anger, despair, sadness, elation, happiness, interest, boredom, shame, pride, contempt, neutrality

Modified category set: –Positive: confident, encouraging, friendly, happy, interested –Negative: angry, anxious, bored, frustrated, sad –Neutral For study: 1 token of each from each of 4 voices plus practice tokens Subjects participated over the internet

–40 native speakers of standard American English with no reported hearing impairment –17 female, 23 male, all 18+ –4 random orders rotated among subjects

Correlations between Judgments sad ang bor fru anx fri con hap int enc sad angry bored frustrated anxious friendly confident happy interested.62 encouraging

What acoustic features correlate with which emotion categories? –F0: min, max, mean, ‘range’, stdev –RMS: min, max, mean, range, stdev –Voiced samples/all samples (VCD) –Mean syllable length –TILT: spectral tilt (2-1 harmonic over 30ms window) of highest ampl vowel, nuclear stressed vowel –Type of nuclear accent, contour, phrasal ending

Results F0, RMS and rate distinguish emotion categories by activation (act) –+act correlate with higher F0 and RMS, faster –do not distinguish valence (val) Tilt of highest amplitude vowel groups +act emotions with different val into different categories (e.g. friendly, happy, encouraging vs. angry, frustrated) Phrase accent/boundary tone also separates +val from -val

–H-L% positively correlated with -val and negatively with +val –+val positively correlated with L-L% and -val not

Predicting Emotion Categories Automatically 1760 judgment/token datapoints (90%/10% training/test) –collapse 2-5 ratings to one Ripper machine learning algorithm –Baseline: choose most frequent ranking –Mean performance over all emotions 75% (22% improvement over baseline) –Individual emotion categories

–Happy, encouraging, sad, and anxious predicted well –Confident and interested show little improvement –Which features best predict which emotion categories?

Best Performing Features

Conclusions New features to distinguish valence: spectral tilt and prosodic endings New understanding of relations among emotion categories –Judgments –Features

Current/Future Work Use ML to rank rather than classify (RankBoost) Eye-tracking task, matching tokens to ‘emotional’ pictures –Web survey to ‘norm’ pictures –Layout issues