Modeling and Generation of Accentual Phrase F 0 Contours Based on Discrete HMMs Synchronized at Mora-Unit Transitions Atsuhiro Sakurai (Texas Instruments.

Modeling and Generation of Accentual Phrase F 0 Contours Based on Discrete HMMs Synchronized at Mora-Unit Transitions Atsuhiro Sakurai (Texas Instruments Japan, Tsukuba R&D Center) Koji Iwano (currently with Tokyo Institute of Technology, Japan) Keikichi Hirose (Dep. of Frontier Eng., The University of Tokyo, Japan)

Introduction to Corpus-Based Intonation Modeling Traditional approach: rules derived from linguistic expertise Human-dependent (too complicated and not satisfactory, because the phenomena involved are not completely understood) Corpus-based approach: modeling derived from statistical analysis of speech corpora Automatic (potential to improve as better speech corpora become available)

Background HMMs are widely used in speech recognition, and fast learning algorithms exist Macroscopic discrete HMMs associated to accentual phrases can store information such as accent type and prosodic structure Morae are extremely important to describe Japanese intonation - sequences of high and low mora can characterize accent types

Overview of the Method Definition of HMM and alphabet: –Accent types modeled by discrete HMMs –2-code mora F 0 contour alphabet used as output symbols –State transitions sychronized with mora transitions Classification of HMMs and training: –HMMs classified according to linguistic attributes –Training by usual FB algorithm Generation of F 0 contours: –Best sequence of symbols generated by a modified Viterbi algorithm

The Mora-F 0 Alphabet Two codes: stylized mora F 0 contours and mora- to-mora  F 0 : 34 symbols each Obtained by LBG clustering from a 500-sentence database (ATR continuous speech database, speaker MHT) All the database is labeled using the 2-code symbols.

State transition Mora transition Accentual phrase The Accentual Phrase HMM HMM Accentual phrases are classified according to: –Accent type –Position of accentual phrase in the sentence –(Optional: number of morae, part-of-speech, syntactic structure)

Example: Example: ‘Karewa Tookyookara kuru. (He comes from Tokyo) Accent typePosition 1 1 0 2 13 Label sequence [],[],[] [],[],[],[],[],[] [],[] shape 1  F 01, shape 2  F 02 M1: M2: M3:

HMM Topologies (a) Accent types 0 and 1 (a) Other accent types

Training Database ATR Continuous Speech Database (500 sentences, speaker MHT) Segmented in mora and accentual phrases Mora labels using the mora-F 0 alphabet: shape (stylized F 0 contour), mora  F 0. Accentual phrase labels: number of morae, position in the sentence

Output Code Generation How to use the HMM for synthesis? A) Recognition B) Synthesis 1 output sequence Likelihood Best path Best output sequence Best path

Intonation Modeling Using HMM for t=2,3,...,T for i t =1,2,...,S D min (t, i t ) = min(i t-1 ){D min (t-1, i t-1 ) + [-log a(i t | i t-1 )] +[-log b(y(t)| i t )]}  (t, i t ) =argmin(i t-1 ){D min (t-1, i t-1 )+[-log a(i t | i t-1 )] +[-log b(y(t)| i t )]} next i t next t Viterbi Search for the Recognition Problem:

Intonation Modeling Using HMM for t=2,3,...,T for i t =1,2,...,S D min (t, i t ) = min(i t-1 ){D min (t-1, i t-1 ) + [-log a(i t | i t-1 )] +[-log b(y max (t)| i t )]}  (t, i t ) =argmin(i t-1 ){D min (t-1, i t-1 )+[-log a(i t | i t-1 )] +[-log b(y max (t)| i t )]} next i t next t Modified Viterbi Search for the Synthesis Problem:

Use of Bigram Probabilities for t=2,3,...,T for i t =1,2,...,S D min (t, i t ) = min(i t-1 ){D min (t-1, i t-1 ) + [-log a(i t | i t-1 )] +max k {[-log b(y (t)| y(t-1),i t )]}}  (t, i t ) =argmin(i t-1 ){D min (t-1, i t-1 )+[-log a(i t | i t-1 )] +max k {[-log b(y (t)| y(t-1),i t )]}} next i t next t k=1,…,K (dimension of y)

Accent Type Modeling Using HMM

Phrase Boundary Level Modeling Using HMM J-TOBI B.I. Pause Y/N Bound. Level 332332 YNNYNN 123123

PH1_0.originalPH1_0.bigram PH1_1.originalPH1_1.bigram PH1_2.originalPH1_2.bigram The Effect of Bigrams

Comments We presented a novel approach to intonation modeling for TTS synthesis based on discrete mora- synchronous HMMs. For now on, more features should be included in the HMM modeling (phonetic context, part-of-speech, etc.), and the approach should be compared to rule- based methods. Training data scarcity is a major problem to overcome (by feature clustering, an F 0 contour generation model, etc.)

Hidden Markov Models (HMM) A Hidden Markov Model (HMM) is a Finite State Automaton where both state transitions and outputs are stochastic. It changes to a new state each time period, generating a new vector according to the output distribution of that state. Symbols: 1,2,..., K a 12 a 23 a 34 a 22 a 33 b(1|1)~b(K|1)b(1|2)~b(K|2) b(1|3)~b(K|3) a 44 b(1|4)~b(K|4) a 11 a 13 1 2 3 4

ステップ１：データベース作成 ATR の連続音声データベースを使用（５００文，話者 MHT) モーラ単位に分割モーララベルの付与 F 0 パターンを抽出 LBG 法によるクラスタリング全データベースにクラスタクラスを付与

Ｂｉｇｒａｍの導入 for t=2,3,...,T for i t =1,2,...,S D min (t, i t ) = min(i t-1 ){D min (t-1, i t-1 ) + [-log a(i t | i t-1 )] +max k {[-log b(y (t)| y(t-1),i t )]}}  (t, i t ) =argmin(i t-1 ){D min (t-1, i t-1 )+[-log a(i t | i t-1 )] +max k {[-log b(y (t)| y(t-1),i t )]}} next i t next t k=1,…,K (dimension of y)

考察・今後の展望学習データが少ない TTS システムへの組込みにはさらなる工夫が必要他の言語情報を考慮（音素、モーラ数、品詞等）データ不足を克服するための工夫（クラスタリング等）モデルの接続に関する検討

Hidden Markov Models (HMM) A Hidden Markov Model (HMM) is a Finite State Automaton where both state transitions and outputs are stochastic. It changes to a new state each time period, generating a new vector according to the output distribution of that state. Symbols: 1,2,..., K a 12 a 23 a 34 a 22 a 33 b(1|1)~b(K|1)b(1|2)~b(K|2) b(1|3)~b(K|3) a 44 b(1|4)~b(K|4) a 11 a 13 1 2 3 4

Modeling and Generation of Accentual Phrase F 0 Contours Based on Discrete HMMs Synchronized at Mora-Unit Transitions Atsuhiro Sakurai (Texas Instruments.

Similar presentations

Presentation on theme: "Modeling and Generation of Accentual Phrase F 0 Contours Based on Discrete HMMs Synchronized at Mora-Unit Transitions Atsuhiro Sakurai (Texas Instruments."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Modeling and Generation of Accentual Phrase F 0 Contours Based on Discrete HMMs Synchronized at Mora-Unit Transitions Atsuhiro Sakurai (Texas Instruments.

Similar presentations

Presentation on theme: "Modeling and Generation of Accentual Phrase F 0 Contours Based on Discrete HMMs Synchronized at Mora-Unit Transitions Atsuhiro Sakurai (Texas Instruments."— Presentation transcript:

Similar presentations

About project

Feedback