HARMONIC MODEL FOR FEMALE VOICE EMOTIONAL SYNTHESIS Anna PŘIBILOVÁ Department of Radioelectronics, Slovak University of Technology Ilkovičova 3, SK-812.

Slides:



Advertisements
Similar presentations
PF-STAR: emotional speech synthesis Istituto di Scienze e Tecnologie della Cognizione, Sezione di Padova – “Fonetica e Dialettologia”, CNR.
Advertisements

1 Analysis of Parameter Importance in Speaker Identity Ricardo de Córdoba, Juana M. Gutiérrez-Arriola Speech Technology Group Departamento de Ingeniería.
Basic Spectrogram & Clinical Application: Consonants
Acoustic Characteristics of Consonants
Vowel Formants in a Spectogram Nural Akbayir, Kim Brodziak, Sabuha Erdogan.
Physical modeling of speech XV Pacific Voice Conference PVSF-PIXAR Brad Story Dept. of Speech, Language and Hearing Sciences University of Arizona.
From Resonance to Vowels March 8, 2013 Friday Frivolity Some project reports to hand back… Mystery spectrogram reading exercise: solved! We need to plan.
“Connecting the dots” How do articulatory processes “map” onto acoustic processes?
Perturbation Theory March 11, 2013 Just So You Know The Fourier Analysis/Vocal Tract exercise is due on Wednesday. Please note: don’t make too much out.
Acoustic Characteristics of Vowels
Nasal Stops.
Tools for Speech Analysis Julia Hirschberg CS4995/6998 Thanks to Jean-Philippe Goldman, Fadi Biadsy.
Associations of behavioral parameters of speech emotional prosody perception with EI measures in adult listeners Elena Dmitrieva Kira Zaitseva, Alexandr.
High Level Prosody features: through the construction of a model for emotional speech Loic Kessous Tel Aviv University Speech, Language and Hearing
CENTER FOR SPOKEN LANGUAGE UNDERSTANDING 1 PREDICTION AND SYNTHESIS OF PROSODIC EFFECTS ON SPECTRAL BALANCE OF VOWELS Jan P.H. van Santen and Xiaochuan.
Speaking Style Conversion Dr. Elizabeth Godoy Speech Processing Guest Lecture December 11, 2012.
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
ACOUSTICAL THEORY OF SPEECH PRODUCTION
The Human Voice Chapters 15 and 17. Main Vocal Organs Lungs Reservoir and energy source Larynx Vocal folds Cavities: pharynx, nasal, oral Air exits through.
Effectiveness of spatial cues, prosody, and talker characteristics in selective attention C.J. Darwin & R.W. Hukin.
Speech Group INRIA Lorraine
VOICE CONVERSION METHODS FOR VOCAL TRACT AND PITCH CONTOUR MODIFICATION Oytun Türk Levent M. Arslan R&D Dept., SESTEK Inc., and EE Eng. Dept., Boğaziçi.
Analysis and Synthesis of Shouted Speech Tuomo Raitio Jouni Pohjalainen Manu Airaksinen Paavo Alku Antti Suni Martti Vainio.
Vocal Emotion Recognition with Cochlear Implants Xin Luo, Qian-Jie Fu, John J. Galvin III Presentation By Archie Archibong.
L 17 The Human Voice. The Vocal Tract epiglottis.
AUTOMATIC SPEECH CLASSIFICATION TO FIVE EMOTIONAL STATES BASED ON GENDER INFORMATION ABSTRACT We report on the statistics of global prosodic features of.
Dr. O. Dakkak & Dr. N. Ghneim: HIAST M. Abu-Zleikha & S. Al-Moubyed: IT fac., Damascus U. Prosodic Feature Introduction and Emotion Incorporation in an.
Emotions and Voice Quality: Experiments with Sinusoidal Modeling Authors: Carlo Drioli, Graziano Tisato, Piero Cosi, Fabio Tesser Institute of Cognitive.
Pitch changes result from changing the length and tension of the vocal folds The pitch you produce is based on the number of cycles (vocal fold vibrations)
EMOTIONS NATURE EVALUATION BASED ON SEGMENTAL INFORMATION BASED ON PROSODIC INFORMATION AUTOMATIC CLASSIFICATION EXPERIMENTS RESYNTHESIS VOICE PERCEPTUAL.
Advanced Technology Center Stuttgart EMOTIONAL SPACE IMPROVES EMOTION RECOGNITION Raquel Tato, Rocio Santos, Ralf Kompe Man Machine Interface Lab Advance.
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
A PRESENTATION BY SHAMALEE DESHPANDE
Representing Acoustic Information
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Source/Filter Theory and Vowels February 4, 2010.
IIT Bombay ICA 2004, Kyoto, Japan, April 4 - 9, 2004   Introdn HNM Methodology Results Conclusions IntrodnHNM MethodologyResults.
Categorizing Emotion in Spoken Language Janine K. Fitzpatrick and John Logan METHOD RESULTS We understand emotion through spoken language via two types.
Speech Production1 Articulation and Resonance Vocal tract as resonating body and sound source. Acoustic theory of vowel production.
Acoustic Phonetics 3/9/00. Acoustic Theory of Speech Production Modeling the vocal tract –Modeling= the construction of some replica of the actual physical.
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Prepared by: Waleed Mohamed Azmy Under Supervision:
The role of prosody in dialect synthesis and authentication Kyuchul Yoon Division of English Kyungnam University Spring 2008 Joint Conference of KSPS.
Speech Science Fall 2009 Oct 28, Outline Acoustical characteristics of Nasal Speech Sounds Stop Consonants Fricatives Affricates.
SPEECH CONTENT Spanish Expressive Voices: Corpus for Emotion Research in Spanish R. Barra-Chicote 1, J. M. Montero 1, J. Macias-Guarasa 2, S. Lufti 1,
Structure of Spoken Language
Speech Science VI Resonances WS Resonances Reading: Borden, Harris & Raphael, p Kentp Pompino-Marschallp Reetzp
Perceptual Analysis of Talking Avatar Head Movements: A Quantitative Perspective Xiaohan Ma, Binh H. Le, and Zhigang Deng Department of Computer Science.
Takeshi SAITOU 1, Masataka GOTO 1, Masashi UNOKI 2 and Masato AKAGI 2 1 National Institute of Advanced Industrial Science and Technology (AIST) 2 Japan.
IIT Bombay 14 th National Conference on Communications, 1-3 Feb. 2008, IIT Bombay, Mumbai, India 1/27 Intro.Intro.
Transgender voice Useful Reference Voice and Communication Therapy for the Transgender/Transsexual Client A Comprehensive Clinical Guide Eds. Richard.
Performance Comparison of Speaker and Emotion Recognition
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
IIT Bombay 17 th National Conference on Communications, Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
IIT Bombay ISTE, IITB, Mumbai, 28 March, SPEECH SYNTHESIS PC Pandey EE Dept IIT Bombay March ‘03.
1 Acoustic Phonetics 3/28/00. 2 Nasal Consonants Produced with nasal radiation of acoustic energy Sound energy is transmitted through the nasal cavity.
P105 Lecture #27 visuals 20 March 2013.
Acoustic Phonetics 3/14/00.
Subjective evaluation of an emotional speech database for Basque Aholab Signal Processing Laboratory – University of the Basque Country Authors: I. Sainz,
Acoustic Cues to Emotional Speech Julia Hirschberg (joint work with Jennifer Venditti and Jackson Liscombe) Columbia University 26 June 2003.
L 17 The Human Voice.
The Human Voice. 1. The vocal organs
August 15, 2008, presented by Rio Akasaka
Vocoders.
Emotional Speech Modelling and Synthesis
EXPERIMENTS WITH UNIT SELECTION SPEECH DATABASES FOR INDIAN LANGUAGES
Speech Perception (acoustic cues)
Tools for Speech Analysis
Speech Prosody Conversion using Sequence Generative Adversarial Nets
Presentation transcript:

HARMONIC MODEL FOR FEMALE VOICE EMOTIONAL SYNTHESIS Anna PŘIBILOVÁ Department of Radioelectronics, Slovak University of Technology Ilkovičova 3, SK Bratislava, Slovakia, Jiří PŘIBIL Institute of Photonics and Electronics, Academy of Sciences of the Czech Republic Chaberská 57, CZ Praha 8, Czech Republic, Introduction Harmonic speech model with AR parameterization Spectral modifications for emotional synthesis Prosodic modifications for emotional synthesis Listening tests results Conclusion

Harmonic speech model with AR parameterization voicing transition frequency

Voicing transition frequency

Determination of model parameters spectral flatness measure

F1  300 Hz  840 Hz F2  840 Hz  2400 Hz F3  2400 Hz  3840 Hz F4  3840 Hz  4800 Hz Female formant areas (+20%) Emotional influence on speech formants pleasant emotions – faucal and pharyngeal expansion, relaxation of tract walls, mouth corners retracted upward (F1 falling, resonances raised) unpleasant emotions – faucal and pharyngeal constriction, tensing of vocal tract walls, mouth corners retracted downward (F1 rising, F2 and F3 falling) pleasant emotions F1 falling, resonances raised unpleasant emotions F1 rising, F2 and F3 falling Scherer, K., R.: Vocal Communication of Emotion: A Review of Research Paradigms. Speech Communication, Vol. 40 (2003) Male formant areas F1  250 Hz  700 Hz F2  700 Hz  2000 Hz F3  2000 Hz  3200 Hz F4  3200 Hz  4000 Hz Fant, G.: Speech Acoustics and Phonetics. Kluwer Academic Publishers, Dordrecht (2004) 700 Hz 840 Hz

Spectral modifications for emotional synthesis frequency scale transformation

Frequency scale transformation F 1,2 F1 ( < F 1,2 ) increased (decreased) F2, F3, F4 ( > F 1,2 ) decreased (increased) fs/4 F 1,2 fs/4 f [kHz] [-][-]  [ - ]

Formant ratio between emotional and neutral speech chosen formant ratio (for frequency after transformation)  1 (214.3 Hz)  2 ( Hz) joyous-to-neutral formant ratio (shift) 0.7 (  30 % ) 1.05 ( + 5 % ) angry-to-neutral formant ratio (shift) 1.35 ( + 35 % ) 0.85 (  15 % ) sad-to-neutral formant ratio (shift) 1.1 ( + 10 % ) 0.9 (  10 % ) mean formant ratio in formant areas F1 300  840 Hz F2 840  2400 Hz F  3840 Hz F  4800 Hz joyous-to-neutral formant ratio (shift)  %) ( %) ( %) (  0.36 %) angry-to-neutral formant ratio (shift) ( %) (  %)  %)  9.88 %) sad-to-neutral formant ratio (shift) ( %)  6.17 %)  %)  9.24 %) joyous angry sad joyous angry sad  30 %  15 %  10 %  %  %  9.88 %  %  6.17 % % % % % + 35 % + 10 % + 5 %  0.36 %  9.24 %  %

Prosody of emotional speech Scherer, K., R.: Vocal Communication of Emotion: A Review of Research Paradigms. Speech Communication, Vol. 40 (2003) EMOTIONF0 meanF0 rangeenergyduration JOYhigher shorter ANGERhigher shorter SADNESSlower longer EMOTIONF0 meanF0 rangeenergyduration JOY ANGER SADNESS OUR CHOICE OF EMOTIONAL-TO-NEUTRAL RATIOS

Linear trend of F0 at the end of sentences JOY EMOTIONlinear trend typelinear trend start JOYrising55 % from the end ANGERfalling35 % from the end ANGER

Listening tests “Determination of emotion type” – 10 evaluation sets selected randomly from the testing corpus – 60 short sentences (1 s  3.5 s) – from the Czech stories – female professional actors – 4 possibilities: “joy”, “anger”, “sadness”, “other” 20 listeners (16 Czechs and 4 Slovaks, 6 women and 14 men) MS ISAPI/NSAPI DLL script - runs on server PC - communicates with user via HTTP protocol

Listening tests MS ISAPI/NSAPI DLL script - runs on server PC - communicates with user via HTTP protocol

Listening tests results EMOTIONJOYANGERSADNESSOTHER JOY59.0 % 0.5 %16.0 %24.5 % ANGER 2.5 %73.5 % 2.0 %22.0 % SADNESS 0.5 % 90.0 % 9.0 % Successful determination of emotions (summed for all emotions) Confusion matrix correctnot classifiedexchanged best evaluated sentence * 88.1 %11.9 % 0 % worst evaluated sentence ** 57.6 %30.3 %12.1 % * “Vše co potřeboval.” (“All he needed.”) ** “Máš ho mít.” (“You ought to have it.”)

Conclusion Female voice emotional conversion: – harmonic speech model with AR parameterization Spectral modifications: – spectral envelope: formant shift – spectral flatness => voicing transition frequency Prosodic modifications: – energy, duration, F0 mean, range, linear trend at the end of sentences Listening tests: best synthesized: sadness worst synthesized: joy Next research: – inclusion of microprosodic features in emotional voice conversion – modifications of F0 linear trend at the beginning of sentences