HARMONIC MODEL FOR FEMALE VOICE EMOTIONAL SYNTHESIS Anna PŘIBILOVÁ Department of Radioelectronics, Slovak University of Technology Ilkovičova 3, SK-812.

HARMONIC MODEL FOR FEMALE VOICE EMOTIONAL SYNTHESIS Anna PŘIBILOVÁ Department of Radioelectronics, Slovak University of Technology Ilkovičova 3, SK-812 19 Bratislava, Slovakia, E-mail: Anna.Pribilova@stuba.sk Jiří PŘIBIL Institute of Photonics and Electronics, Academy of Sciences of the Czech Republic Chaberská 57, CZ-182 51 Praha 8, Czech Republic, E-mail: Jiri.Pribil@savba.sk Introduction Harmonic speech model with AR parameterization Spectral modifications for emotional synthesis Prosodic modifications for emotional synthesis Listening tests results Conclusion

Harmonic speech model with AR parameterization voicing transition frequency

Voicing transition frequency

Determination of model parameters spectral flatness measure

F1  300 Hz  840 Hz F2  840 Hz  2400 Hz F3  2400 Hz  3840 Hz F4  3840 Hz  4800 Hz Female formant areas (+20%) Emotional influence on speech formants pleasant emotions – faucal and pharyngeal expansion, relaxation of tract walls, mouth corners retracted upward (F1 falling, resonances raised) unpleasant emotions – faucal and pharyngeal constriction, tensing of vocal tract walls, mouth corners retracted downward (F1 rising, F2 and F3 falling) pleasant emotions F1 falling, resonances raised unpleasant emotions F1 rising, F2 and F3 falling Scherer, K., R.: Vocal Communication of Emotion: A Review of Research Paradigms. Speech Communication, Vol. 40 (2003) 227-256 Male formant areas F1  250 Hz  700 Hz F2  700 Hz  2000 Hz F3  2000 Hz  3200 Hz F4  3200 Hz  4000 Hz Fant, G.: Speech Acoustics and Phonetics. Kluwer Academic Publishers, Dordrecht (2004) 700 Hz 840 Hz

Spectral modifications for emotional synthesis frequency scale transformation

Frequency scale transformation F 1,2 F1 ( < F 1,2 ) increased (decreased) F2, F3, F4 ( > F 1,2 ) decreased (increased) fs/4 F 1,2 fs/4 f [kHz] [-][-]  [ - ]

Formant ratio between emotional and neutral speech chosen formant ratio (for frequency after transformation)  1 (214.3 Hz)  2 (2666.7 Hz) joyous-to-neutral formant ratio (shift) 0.7 (  30 % ) 1.05 ( + 5 % ) angry-to-neutral formant ratio (shift) 1.35 ( + 35 % ) 0.85 (  15 % ) sad-to-neutral formant ratio (shift) 1.1 ( + 10 % ) 0.9 (  10 % ) mean formant ratio in formant areas F1 300  840 Hz F2 840  2400 Hz F3 2400  3840 Hz F4 3840  4800 Hz joyous-to-neutral formant ratio (shift) 0.8982  10.18 %) 1.0589 (+ 5.89 %) 1.0334 (+ 3.34 %) 0.9964 (  0.36 %) angry-to-neutral formant ratio (shift) 1.1289 (+ 12.89 %) 0.8849 (  11.51 %) 0.8623  13.77 %) 0.9012  9.88 %) sad-to-neutral formant ratio (shift) 1.0432 (+ 4.32 %) 0.9383  6.17 %) 0.8991  10.09 %) 0.9076  9.24 %) joyous angry sad joyous angry sad  30 %  15 %  10 %  10.18 %  13.77 %  9.88 %  10.09 %  6.17 % + 5.89 %+ 3.34 % + 12.89 % + 4.32 % + 35 % + 10 % + 5 %  0.36 %  9.24 %  11.51 %

Prosody of emotional speech Scherer, K., R.: Vocal Communication of Emotion: A Review of Research Paradigms. Speech Communication, Vol. 40 (2003) 227-256 EMOTIONF0 meanF0 rangeenergyduration JOYhigher shorter ANGERhigher shorter SADNESSlower longer EMOTIONF0 meanF0 rangeenergyduration JOY1.181.30 0.81 ANGER1.161.301.700.84 SADNESS0.810.620.951.16 OUR CHOICE OF EMOTIONAL-TO-NEUTRAL RATIOS

Linear trend of F0 at the end of sentences JOY EMOTIONlinear trend typelinear trend start JOYrising55 % from the end ANGERfalling35 % from the end ANGER

Listening tests “Determination of emotion type” – 10 evaluation sets selected randomly from the testing corpus – 60 short sentences (1 s  3.5 s) – from the Czech stories – female professional actors – 4 possibilities: “joy”, “anger”, “sadness”, “other” 20 listeners (16 Czechs and 4 Slovaks, 6 women and 14 men) http://www.lef.um.savba.sk/Scripts/itstposl2.dll MS ISAPI/NSAPI DLL script - runs on server PC - communicates with user via HTTP protocol http://www.lef.um.savba.sk/Scripts/itstposl2.dll

Listening tests http://www.lef.um.savba.sk/Scripts/itstposl2.dll MS ISAPI/NSAPI DLL script - runs on server PC - communicates with user via HTTP protocol http://www.lef.um.savba.sk/Scripts/itstposl2.dll

Listening tests results EMOTIONJOYANGERSADNESSOTHER JOY59.0 % 0.5 %16.0 %24.5 % ANGER 2.5 %73.5 % 2.0 %22.0 % SADNESS 0.5 % 90.0 % 9.0 % Successful determination of emotions (summed for all emotions) Confusion matrix correctnot classifiedexchanged best evaluated sentence * 88.1 %11.9 % 0 % worst evaluated sentence ** 57.6 %30.3 %12.1 % * “Vše co potřeboval.” (“All he needed.”) ** “Máš ho mít.” (“You ought to have it.”)

Conclusion Female voice emotional conversion: – harmonic speech model with AR parameterization Spectral modifications: – spectral envelope: formant shift – spectral flatness => voicing transition frequency Prosodic modifications: – energy, duration, F0 mean, range, linear trend at the end of sentences Listening tests: best synthesized: sadness worst synthesized: joy Next research: – inclusion of microprosodic features in emotional voice conversion – modifications of F0 linear trend at the beginning of sentences

HARMONIC MODEL FOR FEMALE VOICE EMOTIONAL SYNTHESIS Anna PŘIBILOVÁ Department of Radioelectronics, Slovak University of Technology Ilkovičova 3, SK-812.

Similar presentations

Presentation on theme: "HARMONIC MODEL FOR FEMALE VOICE EMOTIONAL SYNTHESIS Anna PŘIBILOVÁ Department of Radioelectronics, Slovak University of Technology Ilkovičova 3, SK-812."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

HARMONIC MODEL FOR FEMALE VOICE EMOTIONAL SYNTHESIS Anna PŘIBILOVÁ Department of Radioelectronics, Slovak University of Technology Ilkovičova 3, SK-812.

Similar presentations

Presentation on theme: "HARMONIC MODEL FOR FEMALE VOICE EMOTIONAL SYNTHESIS Anna PŘIBILOVÁ Department of Radioelectronics, Slovak University of Technology Ilkovičova 3, SK-812."— Presentation transcript:

Similar presentations

About project

Feedback