Arnel Fajardo, student (“Hak Seng”)

Using HTK for Phoneme-level Recognition in the developed Filipino Phonetically-balanced words
Arnel Fajardo, student (“Hak Seng”) Under Professor Yoon-joong Kim (“Paksa”)

Phoneme Recognition unit
No. of phonemes =36 Recognized phonemes=33 Set of phonemes Vowels /a/ /e/ /i/ /o/ /u/ Consonants /b/ /k/ /d/ /g/ /h/ /l/ /m/ /n/ /ŋ/ /p/ /r/ /s/ /t/ /w/ /y/ Diphtongs /iw/ /ay/ /aw/ /oy/ /ey/ /uy/ Additions: /p:/ /b:/ /m:/ /t:/ /d:/ /n:/ /s:/ /l:/ /k:/ /g:/

Speech Recognition Process
HTK using Recognition Process

Preparation of the Data
Speech Data for Training and Testing -Data ( wave file) 50 sets ( 25 male and 25 female) 257 words ( 2-word list) Training data 40 sets ( 20 male and 20 female) 212 words (3-word list) Test Data Speaker Dependent: 20 sets each for male and female Speaker independent: 5 sets each for male and female Location: Test_htk_samples/Htk_phoneme_2/Data/pbw_list_2/speakernumber/PBW2/speech file Feature of Speech Data: -wave *.wav -16Khz, 16 bit, linear PCM -Phonetically Balanced Words (PBW)

Creating phoneme in an MLF(Master Label File) HLEd –n modelList/monoList -l * -d dic/phoneDict i mlfs/phones.mlf scripts/mono.led mlfs/words.mlf EX Edit Script (mono.led) Phone Level Transcription (phones.mlf) Word Level Transcription (words.mlf) HLEd Phone Level Model List (monoList) Dictionary (phoneDict)

[input] Creating a modellist Filename: modelList/wordList contents [input] MLF(Creating a file) file: mlfs/words.mlf contents sil aba'y agad akin aking ako'y alam alon amin aming anak anong apat araw atin ating ayaw … #!MLF!# "*/PBW2001.lab" sil aba'y . "*/PBW2002.lab" agad "*/PBW2003.lab" akin …

[output]phoneme model list File: modelList/monoList Contents [output]phoneme-level-file File: mlfs/phones.mlf Contents

Parameter Characteristics Extraction HCopy –C configs/Hcopy.config –S scripts/Hcopy.scp

[input]scripts/HCopy.scp [input]configs/HCopy.config # Coding parameters SOURCEKIND = WAVEFORM SOURCEFORMAT = WAVE SOURCERATE = 625 TARGETKIND = MFCC_0 TARGETRATE = SAVECOMPRESSED = T SAVEWITHCRC = T WINDOWSIZE = USEHAMMING = T PREEMCOEF = 0.97 NUMCHANS = 26 CEPLIFTER = 22 NUMCEPS = 12 ENORMALISE = F

Create file list for testing File: scripts/test.scp Create file list for training File: scripts/train.scp

Acoustic Model Creation
Topology used 7 states left-to-right HMM for PBW 2

General Model Generation HCompV -C configs/config -f 0.01 –m –S scripts/train.scp –M monoHmms/m0 monoHmms/proto

[input] monoHmms/proto

[input] scripts/train.scp [input] Configs/config Start of config file modification TARGETKIND : MFCC_0_D_A # Coding parameters NONUMSCAPES = T TARGETKIND = MFCC_0_D_A TARGETRATE = SAVECOMPRESSED = T SAVEWITHCRC = T WINDOWSIZE = USEHAMMING = T PREEMCOEF = 0.97 NUMCHANS = 26 CEPLIFTER = 22 NUMCEPS = 12 ENORMALISE = F

[output] monoHmms/m0/vFloor

[output] monoHmms/m0/proto

Phoneme units model Written macro file (monoHmms/m0/macros) vFloor The contents of the file created by adding : " ~o <MFCC_0_D_A><VECSIZE>39 <MFCC_D_A_0>”

Model phoneme units Hmms/m0/hmmdefs To recognize all phoneme units of hmm that defines the model ~h “a” <BEGINHMM> ~ <ENDHMM> ~h “b” <BEGINHMM> ~ <ENDHHMM>

Training: Embedded Training
HERest –C configs/config –l mlfs/phones.mlf –S scripts/train.scp H monoHmms/m0/macros –H monoHmms/m0/hmmdefs M monoHmms/m1 modellist/monoList Training Files listed in (train.scp) Configuration File (config) Hmm0(m0) /macros /hmmdefs HERest Hmm1(m1) /macros /hmmdefs Phone Level Transcription (phones.mlf) HMM List (monoList)

[input]monoHmms/m0/hmmdefs [input] scripts/train.scp [input]monoHmms/m0/macros [input]mlfs/phones.mlf

Training : Embedded Training
Evaluation HERest –C configs/config –l mlfs/phones.mlf –S scripts/train.scp H monoHmms/m1/macros –H monoHmms/m1/hmmdefs M monoHmms/m2 modellist/monoList HERest –C configs/config –l mlfs/phones.mlf –S scripts/train.scp H monoHmms/m2/macros –H monoHmms/m2/hmmdefs M monoHmms/m3 modellist/monoList

Re-evaluation HERest –C configs/config –l mlfs/phones.mlf –S scripts/train.scp H monoHmms/m3/macros –H monoHmms/m3/hmmdefs M monoHmms/m4 modellist/monoList HERest –C configs/config –l mlfs/phones.mlf –S scripts/train.scp H monoHmms/m4/macros –H monoHmms/m4/hmmdefs M monoHmms/m5 modellist/monoList

Recognition Preparation for recognition
Pronunciation dictionary(dic/phonedict) Word models Word : target word recognition [models : hmm model list

Recognition Preparation for recognition
Creating a grammar file (dic/pbwGram) Grammar Rules for creating grammar file

HParse –C configs/config dic/pbwGram dic/tag_Net
Recognition Preparation for Recognition-Recognition Network creation HParse dic/pbwGram dic/tag_Net HParse –C configs/config dic/pbwGram dic/tag_Net

Recognition [input] dic/pbwGram [input] configs/config

Recognition [output] dic/tag_Net

Recognition HVite –C configs/config -S scripts/test.scp –H monoHmms/m5/hmmdefs -H Hmms/m7/macros –w dic/tag_Net –i mlfs/recOutWordm7.mlf Dic/dict modelList/monoList

Recognition [input] scripts/test_d.scp [input] configs/config

Recognition [input] modellist/wordList [input]dic/dict

Recognition [output] mlfs/pbw2_dependent_result.mlf

Recognition Analysis of Recognition Results
HResults –I mlfs/words.mlf modelList/wordList mlfs/pbw2_dependent_result.mlf

Arnel Fajardo, student (“Hak Seng”)

Similar presentations

Presentation on theme: "Arnel Fajardo, student (“Hak Seng”)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Arnel Fajardo, student (“Hak Seng”)

Similar presentations

Presentation on theme: "Arnel Fajardo, student (“Hak Seng”)"— Presentation transcript:

Similar presentations

About project

Feedback