Presentation is loading. Please wait.

Presentation is loading. Please wait.

Modeling and Generation of Accentual Phrase F 0 Contours Based on Discrete HMMs Synchronized at Mora-Unit Transitions Atsuhiro Sakurai (Texas Instruments.

Similar presentations


Presentation on theme: "Modeling and Generation of Accentual Phrase F 0 Contours Based on Discrete HMMs Synchronized at Mora-Unit Transitions Atsuhiro Sakurai (Texas Instruments."— Presentation transcript:

1 Modeling and Generation of Accentual Phrase F 0 Contours Based on Discrete HMMs Synchronized at Mora-Unit Transitions Atsuhiro Sakurai (Texas Instruments Japan, Tsukuba R&D Center) Koji Iwano (currently with Tokyo Institute of Technology, Japan) Keikichi Hirose (Dep. of Frontier Eng., The University of Tokyo, Japan)

2 Introduction to Corpus-Based Intonation Modeling Traditional approach: rules derived from linguistic expertise Human-dependent (too complicated and not satisfactory, because the phenomena involved are not completely understood) Corpus-based approach: modeling derived from statistical analysis of speech corpora Automatic (potential to improve as better speech corpora become available)

3 Background HMMs are widely used in speech recognition, and fast learning algorithms exist Macroscopic discrete HMMs associated to accentual phrases can store information such as accent type and prosodic structure Morae are extremely important to describe Japanese intonation - sequences of high and low mora can characterize accent types

4 Overview of the Method Definition of HMM and alphabet: –Accent types modeled by discrete HMMs –2-code mora F 0 contour alphabet used as output symbols –State transitions sychronized with mora transitions Classification of HMMs and training: –HMMs classified according to linguistic attributes –Training by usual FB algorithm Generation of F 0 contours: –Best sequence of symbols generated by a modified Viterbi algorithm

5 The Mora-F 0 Alphabet Two codes: stylized mora F 0 contours and mora- to-mora  F 0 : 34 symbols each Obtained by LBG clustering from a 500-sentence database (ATR continuous speech database, speaker MHT) All the database is labeled using the 2-code symbols.

6 State transition Mora transition Accentual phrase The Accentual Phrase HMM HMM Accentual phrases are classified according to: –Accent type –Position of accentual phrase in the sentence –(Optional: number of morae, part-of-speech, syntactic structure)

7 Example: Example: ‘Karewa Tookyookara kuru. (He comes from Tokyo) Accent typePosition 1 1 0 2 13 Label sequence [],[],[] [],[],[],[],[],[] [],[] shape 1  F 01, shape 2  F 02 M1: M2: M3:

8 HMM Topologies (a) Accent types 0 and 1 (a) Other accent types

9 Training Database ATR Continuous Speech Database (500 sentences, speaker MHT) Segmented in mora and accentual phrases Mora labels using the mora-F 0 alphabet: shape (stylized F 0 contour), mora  F 0. Accentual phrase labels: number of morae, position in the sentence

10 Output Code Generation How to use the HMM for synthesis? A) Recognition B) Synthesis 1 output sequence Likelihood Best path Best output sequence Best path

11 Intonation Modeling Using HMM for t=2,3,...,T for i t =1,2,...,S D min (t, i t ) = min(i t-1 ){D min (t-1, i t-1 ) + [-log a(i t | i t-1 )] +[-log b(y(t)| i t )]}  (t, i t ) =argmin(i t-1 ){D min (t-1, i t-1 )+[-log a(i t | i t-1 )] +[-log b(y(t)| i t )]} next i t next t Viterbi Search for the Recognition Problem:

12 Intonation Modeling Using HMM for t=2,3,...,T for i t =1,2,...,S D min (t, i t ) = min(i t-1 ){D min (t-1, i t-1 ) + [-log a(i t | i t-1 )] +[-log b(y max (t)| i t )]}  (t, i t ) =argmin(i t-1 ){D min (t-1, i t-1 )+[-log a(i t | i t-1 )] +[-log b(y max (t)| i t )]} next i t next t Modified Viterbi Search for the Synthesis Problem:

13 Use of Bigram Probabilities for t=2,3,...,T for i t =1,2,...,S D min (t, i t ) = min(i t-1 ){D min (t-1, i t-1 ) + [-log a(i t | i t-1 )] +max k {[-log b(y (t)| y(t-1),i t )]}}  (t, i t ) =argmin(i t-1 ){D min (t-1, i t-1 )+[-log a(i t | i t-1 )] +max k {[-log b(y (t)| y(t-1),i t )]}} next i t next t k=1,…,K (dimension of y)

14 Accent Type Modeling Using HMM

15 Phrase Boundary Level Modeling Using HMM J-TOBI B.I. Pause Y/N Bound. Level 332332 YNNYNN 123123

16 PH1_0.originalPH1_0.bigram PH1_1.originalPH1_1.bigram PH1_2.originalPH1_2.bigram The Effect of Bigrams

17 Comments We presented a novel approach to intonation modeling for TTS synthesis based on discrete mora- synchronous HMMs. For now on, more features should be included in the HMM modeling (phonetic context, part-of-speech, etc.), and the approach should be compared to rule- based methods. Training data scarcity is a major problem to overcome (by feature clustering, an F 0 contour generation model, etc.)

18 Hidden Markov Models (HMM) A Hidden Markov Model (HMM) is a Finite State Automaton where both state transitions and outputs are stochastic. It changes to a new state each time period, generating a new vector according to the output distribution of that state. Symbols: 1,2,..., K a 12 a 23 a 34 a 22 a 33 b(1|1)~b(K|1)b(1|2)~b(K|2) b(1|3)~b(K|3) a 44 b(1|4)~b(K|4) a 11 a 13 1 2 3 4

19 ステップ1:データベース作 成 ATR の連続音声データベースを使用(500文, 話者 MHT) モーラ単位に分割 モーララベルの付与 F 0 パターンを抽出 LBG 法によるクラスタリング 全データベースにクラスタクラスを付与

20 Bigramの導入 for t=2,3,...,T for i t =1,2,...,S D min (t, i t ) = min(i t-1 ){D min (t-1, i t-1 ) + [-log a(i t | i t-1 )] +max k {[-log b(y (t)| y(t-1),i t )]}}  (t, i t ) =argmin(i t-1 ){D min (t-1, i t-1 )+[-log a(i t | i t-1 )] +max k {[-log b(y (t)| y(t-1),i t )]}} next i t next t k=1,…,K (dimension of y)

21 考察・今後の展望 学習データが少ない TTS システムへの組込みにはさらなる工夫が必要 他の言語情報を考慮(音素、モーラ数、品詞等) データ不足を克服するための工夫(クラスタリング等) モデルの接続に関する検討

22 Hidden Markov Models (HMM) A Hidden Markov Model (HMM) is a Finite State Automaton where both state transitions and outputs are stochastic. It changes to a new state each time period, generating a new vector according to the output distribution of that state. Symbols: 1,2,..., K a 12 a 23 a 34 a 22 a 33 b(1|1)~b(K|1)b(1|2)~b(K|2) b(1|3)~b(K|3) a 44 b(1|4)~b(K|4) a 11 a 13 1 2 3 4


Download ppt "Modeling and Generation of Accentual Phrase F 0 Contours Based on Discrete HMMs Synchronized at Mora-Unit Transitions Atsuhiro Sakurai (Texas Instruments."

Similar presentations


Ads by Google