Presentation is loading. Please wait.

Presentation is loading. Please wait.

Results of Tagalog vowel Speech recognition using Continuous HMM Arnel C. Fajardo Ph. D student (Under the supervision of Professor Yoon-Joong Kim)

Similar presentations


Presentation on theme: "Results of Tagalog vowel Speech recognition using Continuous HMM Arnel C. Fajardo Ph. D student (Under the supervision of Professor Yoon-Joong Kim)"— Presentation transcript:

1 Results of Tagalog vowel Speech recognition using Continuous HMM Arnel C. Fajardo Ph. D student (Under the supervision of Professor Yoon-Joong Kim)

2 Basic structure of HTK

3 1.Data Preparation Speech Data for Training and Testing -Data ( wave file) 625 wave file 25 sets 5 sets per speaker -Training Data 5 sets per speaker 25 sets -Test Data Speaker Dependent Test 1 :5 sets Test 2 :10 sets Feature of Speech Data: *.wav -16Khz, 16 bit, linear PCM a1001.wav=>”a” e1001.wav=>”e” i1001.wav=>”I” o1001.wav=>”o” u1001.wav=>”u” …… Variables: 2 test : 5 speakers ( 1 set each) 10 speakers ( 1 set each) Hmmdefs m5 m6

4 Compute Feature Vectors Use HCopy -C configs\HCopy.config -S scripts\HCopy.scp Hcopy.exe – Compute the features from wave file and save the features on the same folder. – MFCC was used -C configs\HCopy.config Configuration file to compute features -S scripts\HCopy.scp Script file of a list Wave file and feature file

5 HCopy Number of Inputs files: 3 Waveform files - *.wav Configuration file – Hcopy.config Script file - Hcopy.scp Number of output file: 1 MFCC file - *.mfc Create Hcopy.config in ….Configs/Hcopy.config Write: # Coding parameters SOURCEKIND = WAVEFORM SOURCEFORMAT = NIST SOURCERATE = 625 TARGETKIND = MFCC_0 TARGETRATE = 100000.0 SAVECOMPRESSED = T SAVEWITHCRC = T WINDOWSIZE = 250000.0 USEHAMMING = T PREEMCOEF = 0.97 NUMCHANS = 26 CEPLIFTER = 22 NUMCEPS = 12 ENORMALISE = F

6 Script file: Scripts/Hcopy.scp

7 Prepare the Master label file Master label File- Word level transcriptions mlfs/words.mlf

8 ModelList -modelList/wordList Hmm model name list

9 Generate initial master macro file or Hmmdefs HCompV -C configs\config -f 0.01 -m -S scripts\train.scp -M wordHmms\m0\ wordHmms\proto HCompV.exe number of Inputs: 3 Input 1 - -C configs/config //parameters for computing feature -f 0.01 //the variance floor macro (called vFloors) will be // computed with value 0.01 times the global variance -m //the mean and the variance will be computed Input 2- -S scripts/train.scp //mfc feature vector list to be used in training Input 3- WordHmms/proto//the handwritten hmm prototype Number of output: 1 -M WordHmms/m0 // directory for the result //vfloors : variance floor macro Output 1 //proto : hmm prototype with valued GMM //hmmdefs : will be written manually with proto

10 Input 1 Configs/config script/Hcopy.config => configs/config

11 wordHmms/mo/vfloor global constant values for computing bj(ot) shown below

12 Input 2 Scripts/train.scp

13 Input 3 General Hmm model(prototype) for mono phone speech Word Hmms/proto It has 3 states Note: NumStates has 5 states since state 1 and 5 correspond to sil

14 wordHmms/proto + global means and variances => wordHmms/m0/proto Shows the result of the command HCompV for wordHmms/m0/proto

15 Input 3 - wordHmms/mo/hmmdefs -Master Macro file (MMF)

16 Step 2.Training HERest -C configs\config -I mlfs\words.mlf -S scripts\train.scp -H wordHmms\m0\hmmdefs -M wordHmms\m1 modelList\wordList HERest Number of inputs: 5 -C configs/config //parameters for feature -I mlfs/words.mlf //master label file, word, speech file modellist/wordList //word name list(hmm list) -S scripts/train.scp //mfc file list for training -H wordHmms/m0/hmmdefs //hmmdefs (a set of hmm prototypes) for all words Number of output: 1 -M wordhmms/m1 // re-estimated hmmdefs

17 Input 1 Configs/config Configuration for wordhmms/m1

18 Input 3 modelList/wordList Input 2 mlfs/words.mlf Input 4 Scripts/train.scp

19 Input 5 wordHmms/mo/hmmdefs (MMF)

20 Output 1 HERest -C configs\config -I mlfs\words.mlf -S scripts\train.scp -H wordHmms\m0\hmmdefs -M wordHmms\m1 modelList\wordList Result: wordHmms/m1/hmmdefs

21 Reestimate hmmdefs : HERest –C configs/config –I mlfs/words.mlf -S scripts/train.scp –H wordHmms/m1/hmmdefs –M wordHmms/m2 modelList/wordList HERest –C configs/config –I mlfs/words.mlf -S scripts/train.scp –H wordHmms/m2/hmmdefs –M wordHmms/m3 modelList/wordList HERest –C configs/config –I mlfs/words.mlf -S scripts/train.scp –H wordHmms/m3/hmmdefs –M wordHmms/m4 modelList/wordList HERest –C configs/config –I mlfs/words.mlf -S scripts/train.scp –H wordHmms/m4/hmmdefs –M wordHmms/m5 modelList/wordList

22 Step 3.Recognition Test HVite –C configs/config -S scripts/test.scp –H wordHmms/m5/hmmdefs –w dic/tag_Net –i mlfs/recOutWordm5.mlf dic/dict modelList/wordList HVite Number of Inputs = 5 –C configs/config //parameters for mfc modelList/wordList // hmm name list -S scripts/test.scp // mfc vector list for testing –w dic/tag_Net //word network for recognition Dic/dict //pronouncing dictionary –H wordHmms/m5/hmmdefs //a set of hmms Number of output = 1 –i mlfs/recOutWordm5.mlf // result of recognition

23 dic/dict - Writing a pronouncing dictionary Word [outsym] models –Word : word to be recognized –[outsym] : string to output when word is recognized –models : hmm model list

24 BNF Grammar rule $ :variable {} : zero or more repitions <>:one or more repitions [] : optional (sil $words sil) $words= a | e | i | o | u; (sil $words sil)

25 HParse –C configs/config dic/tag_v_Gram dic/tag_Net (dic/tag_v_Gram)

26 HParse –C configs/config dic/korGram dic/tag_Net Results of HParse to tag_v_Gram: dic/tag_Net configs/config

27 HVite –C configs/config -S scripts/test.scp –H wordHmms/m5/hmmdefs –w dic/tag_Net –i mlfs/recOutWordm5.mlf dic/dict modelList/wordList config/config scripts/test.scp modellist/wordList

28 HVite –C configs/config -S scripts/test.scp -H wordHmms/m5/hmmdefs –w dic/tag_Net –i mlfs/recOutWordm5.mlf dic/dict modelList/wordList mlfs/recOutWordm5.mlf

29 Step 4.Recognition results. HResults –I mlfs/words.mlf modelList/wordList mlfs/recOutWordm5.mlf First test : 5 sets ( each set represents 1 speaker) = > 5 speakers

30 Step 4.Recognition results. HResults –I mlfs/words.mlf modelList/wordList mlfs/recOutWordm5.mlf Second test : 10 sets ( each set represents 1 speaker) = > 10 speakers

31 Comparison of m5 and m6 ( hmmdefs) ( slight difference) HERest -C configs\config -I mlfs\words.mlf -S scripts\train.scp -H wordHmms\m4\hmmdefs -M wordHmms\m5 modelList\wordList HERest -C configs\config -I mlfs\words.mlf -S scripts\train.scp -H wordHmms\m5\hmmdefs -M wordHmms\m6 modelList\wordList m5m6

32 END


Download ppt "Results of Tagalog vowel Speech recognition using Continuous HMM Arnel C. Fajardo Ph. D student (Under the supervision of Professor Yoon-Joong Kim)"

Similar presentations


Ads by Google