Speech Prosody Conversion using Sequence Generative Adversarial Nets

Speech Prosody Conversion using Sequence Generative Adversarial Nets
with Continuous Wavelet Transform F0 features Zhaojie Luo, Tetsuya Takiguchi, and Yasuo Ariki (Kobe University) Canonical Correlation Analysis Overview Background Problems Goal 1. The representation of fundamental frequency (F0) is too simple for prosody conversion. 2. GANs can only give the score/loss for an entire sequence when it has been generated. 1. Applying the continuous wavelet transform (CWT) to systematically capture the F0 features of different temporal scales. 2. Using sequence-GAN to address the continuity of each matrix in a completed sentence, which can train both long-term and short-term dependencies. Sentence Sentence Keep linguistic information unchanged　 Neutral Sad Happy Angry Speech prosody conversion Emotional robot Framework Input voice Features extraction Features processing Matrics processing Training Conversion seq-GANs seq-GANs Dataset Feature extraction Dataset Training model Mexican hat Interpolated log-normalized F0 and wavelet transform features (i=30, i=24, i=18, i=12, i=6) Illustration of training the sentence features in the Seq-GANs Dataset Results (a) recorded voice (b) DBNs+LG (c) DBNs+NNs (d) GAN (e) Sequence-GAN Tar./Percept Ang. Sad Hap. Neu. Angry 90 2 8 27 5 68 59 13 3 25 33 21 44 71 4 1 24 97 62 53 43 39 45 15 12 17 Happy 98 18 74 38 47 48 80 Neutral 95 14 10 7 69 56 22 23 37 11 60

Speech Prosody Conversion using Sequence Generative Adversarial Nets

Similar presentations

Presentation on theme: "Speech Prosody Conversion using Sequence Generative Adversarial Nets"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Speech Prosody Conversion using Sequence Generative Adversarial Nets

Similar presentations

Presentation on theme: "Speech Prosody Conversion using Sequence Generative Adversarial Nets"— Presentation transcript:

Similar presentations

About project

Feedback