Presentation is loading. Please wait.

Presentation is loading. Please wait.

High Level Prosody features: through the construction of a model for emotional speech Loic Kessous Tel Aviv University Speech, Language and Hearing

Similar presentations


Presentation on theme: "High Level Prosody features: through the construction of a model for emotional speech Loic Kessous Tel Aviv University Speech, Language and Hearing"— Presentation transcript:

1

2 High Level Prosody features: through the construction of a model for emotional speech Loic Kessous Tel Aviv University Speech, Language and Hearing kessous@post.tau.ac.il

3 Speech, Music and Emotion Prosody can be defined as ‘ the rhythmic and intonational aspect of language ’. One can also add lexical stress in speech to rhythm and intonation. By definition, 'Prosody in speech' and 'music' are closely related. One definition of music is ‘ the art or science of combining vocal or instrumental sounds (or both) to produce beauty of form, harmony, and expression of emotion ’. Prosody can be defined as ‘ the rhythmic and intonational aspect of language ’. One can also add lexical stress in speech to rhythm and intonation. By definition, 'Prosody in speech' and 'music' are closely related. One definition of music is ‘ the art or science of combining vocal or instrumental sounds (or both) to produce beauty of form, harmony, and expression of emotion ’.

4 Speech, music and emotion If one consider then emotional speech, it seems obvious to search for relationship between emotional speech and music. If one consider then emotional speech, it seems obvious to search for relationship between emotional speech and music. Defining the concept of a common framework for music theory and linguistic and a real common research approach … the road is not easy … Defining the concept of a common framework for music theory and linguistic and a real common research approach … the road is not easy … Basis of music theory: the concept of pitch intervals. Basis of music theory: the concept of pitch intervals.

5 Representation of Prosody It needs to be related to perception It needs to be related to perception In order to arrive at a better understanding and modeling of prosody in emotional speech In order to arrive at a better understanding and modeling of prosody in emotional speech In order to extract pertinent features In order to extract pertinent features It should be as automatic as possible ( It should be as automatic as possible ( for recognition, extraction of patterns and ‘ hidden structure ’ for recognition, extraction of patterns and ‘ hidden structure ’ ‘ Reversible ’ for expressive speech synthesis ‘ Reversible ’ for expressive speech synthesis

6 Example: Prosogram (P.Mertens) Music-like ( ‘ piano roll ’ - like) visual representation Music-like ( ‘ piano roll ’ - like) visual representation Pitch stylization based on glissando perception and nuclei segmentation Pitch stylization based on glissando perception and nuclei segmentation Uses manual or automatic segmentation Uses manual or automatic segmentation Allows pitch corrected files as input Allows pitch corrected files as input

7 Type of prosodic features Pitch Pitch Perceived pitch intervals between syllables Perceived pitch intervals between syllables 'Glissando' presence, type and properties 'Glissando' presence, type and properties Duration Duration Length of syllables, length ratio/difference between syllable, word length Length of syllables, length ratio/difference between syllable, word length Distance between syllables (pause length) Distance between syllables (pause length) Energy: Energy: Word's energy Word's energy Ratio/difference of syllables energy Ratio/difference of syllables energy

8 Analysis method for pitch features Syllable segmentation Syllable segmentation Glissando presence decision for each syllable Glissando presence decision for each syllable No glissando: calculation of a perceived pitch value No glissando: calculation of a perceived pitch value Glissando: pitch at end of syllable is considered Glissando: pitch at end of syllable is considered Others: Others: Minima of stylized pitch, direction of glissando, range of glissando, etc... Minima of stylized pitch, direction of glissando, range of glissando, etc...

9 Example Two syllable word Two syllable word CEICES database CEICES database Word 'Aibo' Word 'Aibo' ‘Expressiveness’ ‘Expressiveness’ User challenging the robot User challenging the robot

10 Why this word? No strong meaning that can be etymologically related to a specific emotion No strong meaning that can be etymologically related to a specific emotion grammatical role that can then give results more related to linguistic than expressiveness grammatical role that can then give results more related to linguistic than expressiveness Challenge the robot, so can be individually considered as a complete and finite prosodic unit that doesn ’ t sound as ‘ non-ended ’. Challenge the robot, so can be individually considered as a complete and finite prosodic unit that doesn ’ t sound as ‘ non-ended ’. Calling the robot by his name before to express something to him can also be considered as a specificity of human-robot interaction, and could eventually be imposed to the user as a constant of a application or OS system. Calling the robot by his name before to express something to him can also be considered as a specificity of human-robot interaction, and could eventually be imposed to the user as a constant of a application or OS system.

11 LabelNb of wordsNb of ‘Aibo’% of ‘Aibo’ inside label Emphatic25282459.69 Motherese1260897.06 Reprimanding31023776.45 Irritated (touchy) 2254821.33 Joyful1012 Angry843238.09 Neutral39169535913.68

12

13 Example - Pitch intervals Motherese EmphaticNeutral angry (CEICES database)

14

15

16

17

18 From 2-syllable words to sentences More important couple of successive pitch nuclei More important couple of successive pitch nuclei Analysis: Discovering ‘ hidden harmonic structure ’ and patterns Analysis: Discovering ‘ hidden harmonic structure ’ and patterns Synthesis: rules for completion and global pattern Synthesis: rules for completion and global pattern

19

20

21 Expressive synthesis Examples Using Mbrola diphone concatenation diphone concatenation PSOLA pitch transposition and time stretching (formant preservation) PSOLA pitch transposition and time stretching (formant preservation) Possibilities: definition of phonemes duration, definition of pitch points and linear interpolation between them Possibilities: definition of phonemes duration, definition of pitch points and linear interpolation between them Not possible: changing energy of each diphone, changing voice quality Not possible: changing energy of each diphone, changing voice quality


Download ppt "High Level Prosody features: through the construction of a model for emotional speech Loic Kessous Tel Aviv University Speech, Language and Hearing"

Similar presentations


Ads by Google