Download presentation
Presentation is loading. Please wait.
Published bySilas Hubbard Modified over 9 years ago
3
Outlines Objectives Study of Thai tones Construction of contextual factors Design of decision-tree structures Design of context clustering styles Characteristics of Thai tones Categorizations of Thai tones Tree-based context clustering Evaluation of overall tone correctness Evaluation of tone correctness for each tone type Evaluation of syllable duration distortion Experiments Conclusions
4
Objectives To implement an HMM-based speech synthesis system for Thai language with the highest correctness of tone.
5
Study of Thai tones Characteristics of Thai tones Syllable Structure [Nakasakul2002] Thai : Tonal Language รัก r-a-k^-3 (love) เรื่อย r-va-j^-2 (always) เคร่ง khr-e-ng^-2 (strict) เครียด khr-ia-t^-2 (stress) และ l-x-3 (and) เพลีย phl-iia-0 (exhausted) เสีย s-iia-4 (spoil) ปริ pr-i-1 (break)
6
Study of Thai tones Characteristics of Thai tones F0 contours of Standard Thai Tones (normalized duration) [Luksaneeyanawin1992] สามัญ Middle(0) เอก Low(1) โท Falling(2) ตรี High(3) จัตวา Rising(4)
7
Study of Thai tones Categorizations of Thai tones Abramson divided the tones into two groups: static group dynamic group According to the final trend of contours: upward trend group downward trend group
8
HMM-based speech synthesizer Phoneme based speech unit modeling Provide flexible models, an efficient adaptation Speaker adaptation Speaking style conversion 1994 K. Tokuda; et al, proposed HMM-based speech synthesizer for Japanese
9
Phrase level current word position in current phrase the number of syllables in {preceding, current, succeeding} phrase Utterance level current phrase position in current sentence the number of syllables in current sentence the number of words in current sentence Phoneme level {preceding, current, succeeding} phonetic type {preceding, current, succeeding} part of syllable structure Syllable level {preceding, current, succeeding} tone type the number of phones in {preceding, current, succeeding} syllable current phone position in current syllable Word level current syllable position in current word part of speech the number of syllables in {preceding, current, succeeding} word Tree-based context clustering Construction of contextual factors Context clustering is to treat the problem of limitation of training data.
10
Tree-based context clustering Design of decision-tree structures F0 contours of (a) synthesized speech from the clustering style of single binary tree without tone type questions and (b) natural speech. Problem of Misshaped F0 contour
11
Tree-based context clustering Design of decision-tree structures
12
Tree-based context clustering Design of 8 context clustering styles (a)-(h) + tone type questions (g)+ tone type questions (e)+ tone type questions (h)+ tone type questions (f)
13
1. Sentence structure analysis 2. Word structure analysis 3. Full context labeling 4. Construction of question set for context clustering 5. Feature extraction System Preparations VAJA Speech corpus Wav fileLabel file ORCHID Text corpus Wav file Label file XML file Parameter file (.cmp) Full context Labeling Feature Extraction (mcep,f0) Parameter file (.cmp) Parameter file (.cmp) Parameter file (.cmp) Full context label file(.lab) Label file (.lab) Label file (.lab) Label file (.lab) Label file (.lab) Full context label file(.lab) Full context label file(.lab) Full context label file(.lab) HMM Training and Synthesis Synthetic Speech
14
Experiments Evaluation of overall tone correctness Figure 5: F0 contours of synthesized speech from 8 different clustering styles; and F0 contour of natural speech.
15
Experiments Evaluation of overall tone correctness Figure 6: Tone error percentages of synthesized speech from 4 different clustering styles
16
Experiments Evaluation of overall tone correctness Figure 7: Tone error percentages of synthesized speech from 8 different clustering styles
17
Experiments Evaluation of tone correctness for each tone type Figure 8: Tone error percentages of synthesized speech from 8 different clustering styles categorized by tone types;
18
Experiments Evaluation of syllable duration distortion Figure 9: Scores of a paired-comparison test for natural duration among 4 different clustering styles;
19
Examples of synthesized speech Female Method corpus size (number of training utterances) Examples 1 2 3 HMM 100 500 2500 VAJA (Unit Selection) Analysis-Synthesis speech Female MethodTree StructureAdd tone question set HMM (a)(e) (b)(f) (c)(g) (d)(h)
20
Conclusions An analysis of tree-based context clustering of an HMM-based Thai speech synthesis system has been conducted in this paper. Four structures of decision tree were designed according to tone groups and tone types to obtain higher correctness of tone of synthesized speech. The results show that the tone-separated tree structures can reduce the tone error percentage of the synthesized speech compared to the single binary tree structure significantly. As for using the contextual tone information in the syllable level, it can improve the tone correctness for all structures of decision tree. There are some distortions of the syllable duration appearing in the case of using the simple tone-separated tree context clustering with a small amount of training data, however it can be relieved when using the constancy-based-tone-separated or the trend-based-tone-separated tree context clustering. The analysis of tone correctness of the average-voice-based speech model and the intonation analysis issues are anticipated to be studied in the future.
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.