Presentation is loading. Please wait.

Presentation is loading. Please wait.

IBM Labs in Haifa © 2007 IBM Corporation SSW-6, Bonn, August 23th, 2007 Maximum-Likelihood Dynamic Intonation Model for Concatenative Text to Speech System.

Similar presentations


Presentation on theme: "IBM Labs in Haifa © 2007 IBM Corporation SSW-6, Bonn, August 23th, 2007 Maximum-Likelihood Dynamic Intonation Model for Concatenative Text to Speech System."— Presentation transcript:

1

2 IBM Labs in Haifa © 2007 IBM Corporation SSW-6, Bonn, August 23th, 2007 Maximum-Likelihood Dynamic Intonation Model for Concatenative Text to Speech System Slava Shechtman IBM Haifa Research Laboratory

3 IBM Labs in Haifa © 2007 IBM Corporation 2 Outline  CART intonation modeling  Maximal Likelihood Dynamic intonation model  Dynamic observations  Maximum-likelihood solution  Microprosody preservation technique  Implementation and preliminary results  Future research directions

4 IBM Labs in Haifa © 2007 IBM Corporation 3 CART prosody modeling pitch tree grow duration tree duration tree Semantic data Syllable location Syntactic data Phonetic context grow pitch tree language data Speech corpus with pitch data

5 IBM Labs in Haifa © 2007 IBM Corporation 4 Basic CART intonation model  Rough, but simple and automatic  Extract semantic, syntactic and phonetic features from the TTS Front-end (per syllable)  POS, word stress, syllable stress  Sentence type, phrase type  Syllable location  Phonetic context  3 log-pitch observations per syllable (in a sonorant part of syllable)  Mean pitch values are associated with tree leaves to represent the target intonation (implicit i.i.d. assumption) Q1 Q3 Q2

6 IBM Labs in Haifa © 2007 IBM Corporation 5 Basic application of CART intonation model  Use mean log-pitch values to estimate target pitch for concatenated segments  Use distance from the target pitch cost as an additive factor in the overall segment selection cost  Optionally, use the above target pitch curve for speech synthesis (after smoothing and/or combination with the actual pitch from the selected segments)

7 IBM Labs in Haifa © 2007 IBM Corporation 6 Maximal Likelihood Dynamic intonation model  Model cross-syllable dynamic observations as well as intra-syllable observations  Maximum Likelihood solution, based on HMM synthesis approach (Tokuda et al)  convenient framework for combining both instantaneous and differential observations in order to obtain the most-likely smooth parameter contour, for a given clustering.  May be applied over the regular CART trees S1S1 S2S2 S3S3

8 IBM Labs in Haifa © 2007 IBM Corporation 7 Dynamic features for CART intonation modeling  Extend the static observation vectors for n-th syllable,  Add four time-normalized differences of static observations  Guarantee non-zero time interval between the observation instances  New observation vector Pairs of observation points for difference calculation (→)

9 IBM Labs in Haifa © 2007 IBM Corporation 8 Maximal Likelihood Dynamic intonation model  Assume a cluster sequence Q is predetermined by CART  Each cluster is modeled by a single 7-dim Gaussian ( )  Concatenated observations:  Concatenated static observations:  Sparse (block diagonal) linear transformation:

10 IBM Labs in Haifa © 2007 IBM Corporation 9 Maximal Likelihood Dynamic intonation model  The log-likelihood of O sequence is given by  Where

11 IBM Labs in Haifa © 2007 IBM Corporation 10 Maximal Likelihood Dynamic intonation model  Likelihood Minimization with respect to static observations C  An efficient time-recursive solution exists (Tokuda et al, 1996)  Jointly determine full utterance pitch curve.  The solution depends both on individual CART cluster models and on their sequence in the synthesized sentence

12 IBM Labs in Haifa © 2007 IBM Corporation 11 Maximal Likelihood Dynamic intonation model  Smoothes abrupt changes existing in the mean solution  Controlled by the scaling factor inside dynamic observations  Allows usage of larger CART trees for fine clustering (→)

13 IBM Labs in Haifa © 2007 IBM Corporation 12 Microprosody preservation  Improve rough pitch curve resolution  Keep original fine pitch structure inside the contiguous portion of speech to increase naturalness, but be aligned with the target intonation curve  Compensate for the imperfectness of the CART model and feature extraction

14 IBM Labs in Haifa © 2007 IBM Corporation 13 Mean solution vs. ML dynamic intonation model  Mean solution :  ML dynamic solution Pref.No pref. Static, smoothed (A)Dynamic ML (B) All pref. Strong pref. All pref. Strong pref. %22.534.33.243.29.2

15 IBM Labs in Haifa © 2007 IBM Corporation 14 Incorporation within CTTS system  Applied on embedded version of IBM CTTS system with sub-phoneme basic concatenation unit (regularly one third of a phoneme)  (A): CART mean solution as a target pitch, smoothed original pitch curve as a synthesis pitch.  (B): dynamic ML CART solution as a target pitch, use the microprosody preservation technique to combine original and target pitches  TTS experts + native speakers subjective results Pref.No pref. (A)(B) Strong or weak pref. Strong pref. Strong or weak pref. Strong pref. %37.927.73.234.46.7

16 IBM Labs in Haifa © 2007 IBM Corporation 15 Summary and further research directions  Dynamic ML CART intonation model was proposed and shown to perform better then the baseline CART intonation.  It was successfully combined with the original pitch curve using microprosody preservation technique.  Further research  Alternative dynamic features  Statistical microprosody modeling for very-small-footprint voices  Adaptive microprosody incorporation

17 IBM Labs in Haifa © 2007 IBM Corporation 16 Exact formulation of dynamic features Let Then: (←)(←)


Download ppt "IBM Labs in Haifa © 2007 IBM Corporation SSW-6, Bonn, August 23th, 2007 Maximum-Likelihood Dynamic Intonation Model for Concatenative Text to Speech System."

Similar presentations


Ads by Google