Presentation is loading. Please wait.

Presentation is loading. Please wait.

長庚 多媒體信號處理 實驗室 1 HTK tutorial Speaker: ricer Date:2005.08.26.

Similar presentations


Presentation on theme: "長庚 多媒體信號處理 實驗室 1 HTK tutorial Speaker: ricer Date:2005.08.26."— Presentation transcript:

1 長庚 多媒體信號處理 實驗室 1 HTK tutorial Speaker: ricer Date:2005.08.26

2 長庚 多媒體信號處理 實驗室 2 outline Data preparation –Corpora: label & speech data Three models for Automatic Speech Recognition –Acoustic model Feature extraction HMM (Hidden Markov Model) –Pronunciation dictionary –Searching net Free-syllable net Large vocabulary Recognizer evaluation

3 長庚 多媒體信號處理 實驗室 3 Data preparation We have –Wave files and the correspond labels –MD01M0P0000 – 國語 年份 女生 編號 We want –The all files list J:/sp12/wav/1.wav, …*.lab J:/sp12/wav/2.wav, …*.lab –The all labels in a file (Master Label File) #!MLF!# "*/1.lab" sia tian. "*/2.lab" sia yu. Function: file2list Function: lab2mlf

4 長庚 多媒體信號處理 實驗室 4 去聲調音節 sia sp tieN sia Y 去聲調音素 s i a sp t i e N s i a Y 音節內右相關 s+i i+a a sp t+i i+e e+N N s+i i+a a Y 音節內左右相關 s+i s-i+t a-i sp t+i t-i+e i-e+N e-N s+i s-i+t a-i Y 國語語音波形 夏天 (sia4 tieN1) 下雨 (sia4 Y3) 聲學單位 夏 sia4 天 tieN1 下 sia4 雨 Y3 xxx1000 夏天下雨 sia4_tieN1_sia4_Y3 transcription

5 長庚 多媒體信號處理 實驗室 5 #!MLF!# "*/1.lab" sia4 tian1. "*/2.lab" sia4 yu3. #!MLF!# "*/1.lab" sia sp tian. "*/2.lab" sia sp yu. #!MLF!# "*/1.lab" s i a sp t i a n. "*/2.lab" s i a sp y u. #!MLF!# "*/1.lab" s+i s-i+a i-a sp t+i t-i+a i-a+n a-n. "*/2.lab" s+i s-i+a a-i sp y+u y-u. Tonal syllable Mono-syllableMono-phoneTri-phone sia4 sia ss+i tian1 tian is-i+a Yu3 yu aa-i sp t t+i … …. Model list Master label File Function: hled syl_tone.mod syl_sp.modphn.modtri.mod

6 長庚 多媒體信號處理 實驗室 6 Data preparation Mono-phone mlf tri-phone mlf hled hled3("syl.mlf", "ex.led", "syl2tri.dic", "tri.mlf", "tri.mod" )

7 長庚 多媒體信號處理 實驗室 7 Feature extraction Mel-Frequency Cepstrum Coefficient See vip/eda.cfg NATURALREADORDER = TRUE SOURCEFORMAT = WAV TARGETKIND = MFCC_E_D_A TARGETRATE = 100000.0 WINDOWSIZE = 200000.0 USEHAMMING = TRUE PREEMCOEF = 0.97 NUMCHANS = 26 NUMCEPS = 12 ENORMALISE = TRUE DELTAWINDOW = 2 ACCWINDOW = 2

8 長庚 多媒體信號處理 實驗室 8 Creating mono-phone HMM See vip/3n3s1m.pro state1 state3 state2 a 01 a 12 a 23 a 34 a 11 a 22 a 33 0.000e+0 1.000e+0 0.000e+0 0.000e+0 0.000e+0 0.000e+0 5.000e-1 5.000e-1 0.000e+0 0.000e+0 0.000e+0 0.000e+0 5.000e-1 5.000e-1 0.000e+0 0.000e+0 0.000e+0 0.000e+0 5.000e-1 5.000e-1 0.000e+0 0.000e+0 0.000e+0 0.000e+0 0.000e+0 Transition matrix ~o 39 5 3 13 13 13 2 3 1.000000e+000 1.000000e+000 1.000000e+000 1 13 0 0 0 0 0 0 0 0 0 0 0 0 0 13 1 1 1 1 1 1 1 1 1 1 1 1 1 2 13 0 0 0 0 0 0 0 0 0 0 0 0 0 13 1 1 1 1 1 1 1 1 1 1 1 1 1 3 13 0 0 0 0 0 0 0 0 0 0 0 0 0 13 1 1 1 1 1 1 1 1 1 1 1 1 1 Function: hcompv hcompv("3n3s1m.pro","phn.mod","mfc.lst","phn.mlf")

9 長庚 多媒體信號處理 實驗室 9 Creating mono-phone HMM i a t s i a t s All Gaussian have the same mean and varianceRefine Gaussain to fit each data erest(0,"mfc.lst", "phn.mlf", "phn.mod", 4)

10 長庚 多媒體信號處理 實驗室 10 Creating mono-phone HMM No. model nameacountsstate1state2state3 1 "A" 592 4905.791992 2986.182129 2671.728271 2 "C" 340 2879.937256 1012.802734 1034.989014 3 "E" 124 1856.104492 837.523865 670.962036 4 "G" 2082 12491.683594 13483.448242 7560.445313 5 "I" 580 5163.432617 2224.220703 2649.229248 6 "J" 358 1926.428955 990.966858 1072.944946 *.sts AT 2 4 0.2 {sil.transP} AT 4 2 0.2 {sil.transP} AT 1 3 0.3 {sp.transP } TI ssp {sil.state[3],sp.state[2]} hhed(5, "ssp.hed", "phn_sp.mod", 6)

11 長庚 多媒體信號處理 實驗室 11 Creating tri-phone HMM ~h " i " 5 2 3 1.000000e+000 1.000000e+000 1.000000e+000 1 13 -1.231087e+001 -9.749413e-001 9.766034e+000 … 13 2.357146e+001 6.214857e+001 6.707030e+001 … 6.855833e+001 2 13 -4.161490e-002 -3.644128e-001 2.665605e-002 … 13 9.070483e-001 3.527379e+000 4.040065e+000 … 3.090531e+001 3 13 -2.207233e-002 2.695016e-002 -4.460607e-001 … 13 1.712317e-001 4.176200e-001 3.959162e-001 … 6.822828e+000 …… ~h “ s-i+a " 5 2 3 1.000000e+000 1.000000e+000 1.000000e+000 1 13 -1.231087e+001 -9.749413e-001 9.766034e+000 … 13 2.357146e+001 6.214857e+001 6.707030e+001 … 6.855833e+001 …... ~h “ t-i+a " 5 2 3 1.000000e+000 1.000000e+000 1.000000e+000 1 13 -1.231087e+001 -9.749413e-001 9.766034e+000 … 13 2.357146e+001 6.214857e+001 6.707030e+001 … 6.855833e+001 …... hhed Mono-phone HMM tri-phone HMM hhed(10, "tri.hed", "phn_sp.mod", 11)

12 長庚 多媒體信號處理 實驗室 12 Creating tri-phone HMM i a s-i+a i-a+n t-i+a Gaussian (with same mean and variance) i-a

13 長庚 多媒體信號處理 實驗室 13 Creating tri-phone HMM s-i+a i-a+n t-i+a i-a s-i+a i-a+n t-i+a i-a Single Gaissian for each “model”Gaussina Mixtrures (two Gaussians) hhed(15, "mix2.hed", "tri.mod", 16) erest(16,"mfc.lst", "tri.mlf", "tri.mod", 20) behhed(2,"hmm/hmm15/15.sts","hed/mix2.hed") MU 2 {s-i+a.state[2].stream[1-3].mix} MU 2 {t-i+a.state[3].stream[1-3].mix} …… mix2.hed

14 長庚 多媒體信號處理 實驗室 14 Creating tri-phone HMM s+i(mix1) s+i(mix2) s-i+a(mix1) s-i+a(mix2) i-a(mix1) i-a(mix2) i-a(mix3) : Training data

15 長庚 多媒體信號處理 實驗室 15 Pronunciation dictionary sia s i a tian t i a n Yu y u sp sil Syl2phn.dic sia s+i s-i+a i-a sp tian t+i t-i+a i-a+n a-n sp Yu y+u y-u sp sp [] sp sil [] sil Syl2tri.dic hdman hdman("syl.mod", "syl2phn.dic", "syl2rcd.dic","man1.log","man2.log"); 文字發音機率發音 HMM Model 一 0.16134 i2i sp 一 0.26218 i4i sp 一 0.57647 i1i sp 乙 1.00000 i3i sp 丁 1.00000 ding1d+i d-i+n i-n+g n-g sp 七 1.00000 ci1c+i c-i sp

16 長庚 多媒體信號處理 實驗室 16 Searching net Linear net Tree structured net 台 北 市 政 府 中 縣 廳 Free Hanzi net

17 長庚 多媒體信號處理 實驗室 17 Searching net Hparse(" free_syl.grm", “free_syl.net ") $free_syl= sia | tian | yu; ( ) VERSION=1.0 N=6 L=10 I=0 W=yu I=1 W=!NULL I=2 W=tian I=3 W=sia I=4 W=!NULL I=5 W=!NULL J=0 S=1 E=0 J=1 S=5 E=0 J=2 S=0 E=1 J=3 S=2 E=1 J=4 S=3 E=1 J=5 S=1 E=2 J=6 S=5 E=2 J=7 S=1 E=3 J=8 S=5 E=3 J=9 S=1 E=4 !NULL I1 yu i0 tiani2 sia i3 !NULL i4 !NULL i5 j0 j1 j9 j2 j3 j4 j7 j8 j5

18 長庚 多媒體信號處理 實驗室 18 Recognizer evaluation s+i s-i+a i-a :Testing data Vite("mfc.lst", 25, "tri.mod", "syl2tri.dic", "freesyl.net", "rec_freesyl.mlf","rec_freesyl.log" )

19 長庚 多媒體信號處理 實驗室 19 Recognizer evaluation s+i s-i+a i-a t+i t-i+a i-a+n a-n y+u y-u t HMM

20 長庚 多媒體信號處理 實驗室 20 Recognizer evaluation Result("rec_freesyl.mlf", "syl.mlf", "syl.mod", "tri.rec" ) #!MLF!# "*/1.lab" sia tian. "*/2.lab" sia yu. syl.mlf #!MLF!# "*/1.lab" sia yu. "*/2.lab" sia yu rec_freesyl.mlf tri.rec Aligned transcription:I:/… LAB:sia tian REC:sia yu yu Aligned transcription:I:/… LAB:sia yu REC:sia yu WORD: %Correct=50 [H=1, S=1, N=2] SYLL: %Corr=75, Acc=50((3-1/)4) [H=3, D=0, S=1, I=1, N=4] insertion deletion substitution

21 長庚 多媒體信號處理 實驗室 21 Homework (’01 corpus) CGU –(tri-phone,free-syllable net) –g1 台語 MDXXXX 華語 TWXXXX 男生 XXM1XX 女生 XXM0XX 時間 : 兩星期後 Data: 下載


Download ppt "長庚 多媒體信號處理 實驗室 1 HTK tutorial Speaker: ricer Date:2005.08.26."

Similar presentations


Ads by Google