Presentation is loading. Please wait.

Presentation is loading. Please wait.

Producing Emotional Speech Thanks to Gabriel Schubiner.

Similar presentations


Presentation on theme: "Producing Emotional Speech Thanks to Gabriel Schubiner."— Presentation transcript:

1

2 Producing Emotional Speech Thanks to Gabriel Schubiner

3 Papers Generation of Affect in Synthesized Speech Corpus-based approach to synthesis Expressive visual speech using talking head Demos Affect Editor Quiz/Demo Synface Demo

4 Affect in Speech Goals Addition of Emotion to Synthetic speech Acoustic Model Typology of parameters of emotional speech Quantification Addresses problem of expressiveness What benefit is gained from expressive speech?

5 Emotion Theory/ Assumptions Emotion -> Nervous System -> Speech Output Binary distinction Parasympathetic vs Sympathetic based on physical changes universal emotions

6 Approaches to Affect Generative Emotion -> Physical -> Acoustic Descriptive Observed acoustic params imposed

7 Descriptive Framework 4 Parameter groups Pitch Timing Voice Quality Articulation Assumption of independence How could this affect design and results?

8 Pitch Timing Accent Shape Average Pitch Contour Slope Final Lowering Pitch Range Reference Line Exaggeration (not used) Fluent Pauses Hesitation Pauses Speech Rate Stress Frequency Stressed Stressable

9 Voice Quality Articulation Breathiness Brilliance Loudness Pause Discontinuity Pitch Discontinuity Tremor Laryngealization Precision

10 Implementation Each parameter has scale Each scale is independent from other parameters between positive and negative

11 Implementation Settings grouped into preset conditions for each emotion based on prior studies

12 Program Flow: Input Emotion -> parameter representation Utterance -> clauses Agent, Action, Object, Locative Clause and lexeme annotations Finds all possible locations of affect and chooses whether or not to use

13 Program Flow Utterance -> Tree structure -> linear phonology “compiled” for specific synthesizer with software to simulate affects not available in hardware

14

15 Perception 30 Utterances 5 sentences * 6 affects Forced choice of one of six affects magnitude and comments

16 Elicitation Sentences Intro I’m almost finished I’m going to the city I saw your name in the paper X I thought you really meant it Look at that picture

17 Pop Quiz!!!

18 Pop Quiz Solutions I’m almost finished Disgust : Surprise : Sadness : Gladness : Anger : Fear I’m going to the city Surprise : Gladness : Anger : Disgust : Sadness : Fear I thought you really meant it Anger : Disgust : Gladness : Sadness : Fear : Surprise Look at that picture Anger : Fear : Disgust : Sadness : Gladness : Surprise

19 Results approx 50% recognition rate 91% sadness

20

21 Conclusions Effective? Thoughts?

22 Corpus-based Approach to Expressive Speech Synthesis

23 Corpus Collect utterances in each emotion emotion-dependent semantics One speaker Good news, Bad news, Question

24 Model: Feature Vector Features Lexical stress Phrase-level stress Distance from beginning of phrase Distance from end of phrase POS Phrase-type End of syllable pitch

25 Model: Classification Predicts F0 5 syllable window Uses feature vector to predict observation vector observation vector: log(p), Δp p = end of syllable pitch Decision Tree

26 Model: Target Duration Similar to predicting F0 build tree with goal of providing Gaussian at leafs Use mean of class as target duration discretization

27 Models Uses acoustic analogue of n-grams captures sense of context compared to describing full emotion as sequence compare to Affect Editor Uses only F0 and length (comp. A E) Include information about from which utterance the features are derived intentional bias, justified?

28 Model: Synthesis Data tagged with original expression and emotion expression-cost matrix noted trade-off: emotional intensity vs. smoothness Paralinguistic events

29 SSML Compare to Cahn’s typology Abstraction layers

30 Perception Experiment Distinguish same utterance spoken with neutral and affected prosody Semantic content problematic?

31 Results Binary decision Reasonable gain over baseline?

32 Conclusion Major contributions? Paths forward?

33 Synthesis of Expressive Visual Speech on a Talking Head

34

35 Synthesis Background Manipulation of video images Virtual model with deformation parameters Synchronized with time-aligned transcription Articulatory Control Model Cohen & Massaro (1993)

36 Data Single actor Given specific emotion as instruction 6 emotions + neutral

37 Facial Animation Parameters Face independent FAP Matrix * scaling factor + position 0 Weighted deformations of distance between vertices and feature point

38 Modeling Phonetic segments assigned target parameter vector temporal blending over dominance functions Principal components

39 ML Separate models for each emotion 6:1 training:testing ratio models -> PC traj -> FAP traj * emotion param matrix

40 Results More extreme emotions easier to perceive 73% sad, 60% angry, 40% sad

41 Synface Demo

42 Discussion Changes in approach from Cahn to Eide Production compared to Detection


Download ppt "Producing Emotional Speech Thanks to Gabriel Schubiner."

Similar presentations


Ads by Google