數位語音處理概論 HW#2-1 HMM Training and Testing

Slides:



Advertisements
Similar presentations
Building an ASR using HTK CS4706
Advertisements

INSTRUCTOR:Dr.Veton Kepuska STUDENT:Dileep Narayan.Koneru YES/NO RECOGNITION SYSTEM.
Discipline, Crime, and Violence October 2014 Tara K. McDaniel, M.S.
Zhang Hongyi CSCI2100B Data Structures Tutorial 2
Guide To UNIX Using Linux Third Edition
Introduction to Unix (CA263) Introduction to Shell Script Programming By Tariq Ibn Aziz.
語音辨認概論 A Tutorial Example of Using HTK 96/10/18 老師 : 廖元甫 演講者 : 蔡明峰.
專題研究 WEEK3 LANGUAGE MODEL AND DECODING Prof. Lin-Shan Lee TA. Hung-Tsung Lu,Cheng-Kuan Wei.
Programming For Nuclear Engineers Lecture 12 MATLAB (3) 1.
Chapter Seven Advanced Shell Programming. 2 Lesson A Developing a Fully Featured Program.
Operating System Program 5 I/O System DMA Device Driver.
Arthur Kunkle ECE 5525 Fall Introduction and Motivation  A Large Vocabulary Speech Recognition (LVSR) system is a system that is able to convert.
Introduction to Shell Script Programming
Nachos Project 1 Start-up and System call
CSc 2310 Principles of Programming (Java) Dr. Xiaolin Hu.
Credit Union National Association Installing and Uploading Project Zip Code.
Arnel Fajardo, student (“Hak Seng”)
Presentation by Daniel Whiteley AME department
English version of the subjects. How and where to submit the report Please submit your report using CD-R or DVD-R with a soft or hard case, which include.
Booting Ubuntu Linux Live CSCI 130 – Fall 2008 Action Lab Dr. W. Jones.
DSP homework 1 HMM Training and Testing
Essential Shell Programming by Prof. Shylaja S S Head of the Dept. Dept. of Information Science & Engineering, P.E.S Institute of Technology, Bangalore
Results of Tagalog vowel Speech recognition using Continuous HMM Arnel C. Fajardo Ph. D student (Under the supervision of Professor Yoon-Joong Kim)
VIM  This is the text editor you will use on the workstation.  You can also edit the text files under windows environment and upload it to the workstation.
Homework Assignment #1 J. H. Wang Oct. 2, 2015.
IR Homework #2 By J. H. Wang Mar. 31, Programming Exercise #2: Query Processing and Searching Goal: to search relevant documents for a given query.
IR Homework #1 By J. H. Wang Mar. 5, Programming Exercise #1: Indexing Goal: to build an index for a text collection using inverted files Input:
專題研究 (4) HDecode_live Prof. Lin-Shan Lee, TA. Yun-Chiao Li 1.
專題研究 (2) Feature Extraction, Acoustic Model Training WFST Decoding
The HTK Book (for HTK Version 3.2.1) Young et al., 2002.
Files Tutor: You will need ….
Performance Comparison of Speaker and Emotion Recognition
Speech Analysis TA : 林賢進 HW /10/28 1. Goal This homework is aimed to analyze speech from spectrogram, and try to distinguish different initials/
Digital Speech Processing HW3
 Software Development Life Cycle  Software Development Tools  High Level Programming:  Structures  Algorithms  Iteration  Pseudocode  Order of.
DYNAMIC TIME WARPING IN KEY WORD SPOTTING. OUTLINE KWS and role of DTW in it. Brief outline of DTW What is training and why is it needed? DTW training.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
DISCRETE HIDDEN MARKOV MODEL IMPLEMENTATION DIGITAL SPEECH PROCESSING HOMEWORK #1 DISCRETE HIDDEN MARKOV MODEL IMPLEMENTATION Date: Oct, Revised.
IR Homework #1 By J. H. Wang Mar. 25, Programming Exercise #1: Indexing Goal: to build an index for a text collection using inverted files Input:
1 Lecture 7 Introduction to Shell Scripts COP 3353 Introduction to UNIX.
HW2-2 Speech Analysis TA: 林賢進
Unix tools Regular expressions grep sed AWK. Regular expressions Sequence of characters that define a search pattern banana matches the text banana
IR Homework #2 By J. H. Wang Apr. 13, Programming Exercise #2: Query Processing and Searching Goal: to search for relevant documents Input: a query.
Aa Scripting SPM analyses using aa Rhodri Cusack.
長庚 多媒體信號處理 實驗室 1 HTK tutorial Speaker: ricer Date:
Date: October, Revised by 李致緯
Prof. Lin-shan Lee TA. Roy Lu
Homework Assignment #1 J. H. Wang Oct. 11, 2016.
Speech Analysis TA:Chuan-Hsun Wu
Statistical Models for Automatic Speech Recognition
Prof. Lin-shan Lee TA. Lang-Chi Yu
Digital Speech Processing
Lab 2: Isolated Word Recognition
Programming Fundamentals (750113) Ch1. Problem Solving
Statistical Models for Automatic Speech Recognition
Programming in JavaScript
N-Gram Model Formulas Word sequences Chain rule of probability
Lab 3: Isolated Word Recognition
PROJ2: Building an ASR System
Prof. Lin-shan Lee TA. Po-chun, Hsu
專題研究 WEEK 5 – Deep Neural Networks in Kaldi
Presentation by Daniel Whiteley AME department
Compute System Administration Homework 2: Shell Script
Digital Speech Processing
[insert Module title here]
Programming in JavaScript
CS-171 Discussion Week3.
專題進度報告 資工四 B 洪志豪 資工四 B 林宜鴻.
Prof. Lin-shan Lee TA. Roy Lu
Presentation transcript:

數位語音處理概論 HW#2-1 HMM Training and Testing 助教:蔡政昱 指導教授:李琳山 2013/10/23

Digit Recognizer Construct a digit recognizer Free tools of HMM: HTK ling | yi | er | san | si | wu | liu | qi | ba | jiu Free tools of HMM: HTK http://htk.eng.cam.ac.uk/ build it yourself, or use the compiled version htk341_debian_x86_64.tar.gz Training data, testing data, scripts, and other resources all are available on http://speech.ee.ntu.edu.tw/courses/DSP2013Autumn/

Flowchart

Thanks to HTK!

Feature Extraction

Feature Extraction - HCopy HCopy -C lib/hcopy.cfg -S scripts/training_hcopy.scp Convert wave to 39 dimension MFCC -C lib/hcopy.cfg input and output format parameters of feature extraction Chapter 7 - Speech Signals and Front-end Processing -S scripts/training_hcopy.scp a mapping from Input file name to output file name e.g. wav -> MFCC_Z_E_D_A speechdata/training/ N110022.wav MFCC/training/ N110022.mfc

Training Flowchart

Training Flowchart x3 x6

Training Flowchart x3 x6

HCompV - Initialize

HCompV - Initialize Compute global mean and variance of features HCompV -C lib/config.fig -o hmmdef -M hmm -S scripts/training.scp lib/proto Compute global mean and variance of features -C lib/config.fig set format of input feature (MFCC_Z_E_D_A) -o hmmdef -M hmm set output name: hmm/hmmdef -S scripts/training.scp a list of training data lib/proto a description of a HMM model, HTK MMF format You can modify the Model Format here. (# states)

Initial MMF Prototype MMF: HTKBook chapter 7 ~o <VECSIZE> 39 <MFCC_Z_E_D_A> ~h "proto" <BeginHMM> <NumStates> 5 <State> 2 <Mean> 39 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 … <Variance> 39 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 … <State> 3 … <TransP> 5 0.0 1.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 <EndHMM>

Initial HMM bin/macro bin/models_1mixsil These are written in C hmm/hmmdef hmm/models bin/macro construct each HMM bin/models_1mixsil add silence HMM These are written in C But you can do these by a text editor

Training Flowchart x3 x6

HERest - Adjust HMMs Basic problem 3 for HMM Given O and an initial model λ=(A,B, π), adjust λ to maximize P(O|λ)

-H hmm/macros -H hmm/models -M hmm lib/models.lst HERest - Adjust HMMs HERest -C lib/config.cfg -S scripts/training.scp -I labels/Clean08TR.mlf -H hmm/macros -H hmm/models -M hmm lib/models.lst Adjust parameters λ to maximize P(O|λ) one iteration of EM algorithm run this command three times => three iterations –I labels/Clean08TR.mlf set label file to “labels/Clean08TR.mlf” lib/models.lst a list of word models ( liN (零), #i (一), #er (二),… jiou (九), sil )

bin/spmodel_gen hmm/models hmm/models Add SP Model bin/spmodel_gen hmm/models hmm/models Add “sp” (short pause) HMM definition to MMF file “hmm/hmmdef”

HHEd - Modify HMMs lib/sil1.hed lib/models_sp.lst HHEd -H hmm/macros -H hmm/models -M hmm lib/sil1.hed lib/models_sp.lst lib/sil1.hed a list of command to modify HMM definitions lib/models_sp.lst a new list of model ( liN (零), #i (一), #er (二),… jiou (九), sil, sp ) See HTK book 3.2.2 (p. 33)

Training Flowchart x3 x6

HERest - Adjust HMMs Again HERest -C lib/config.cfg -S scripts/training.scp -I labels/Clean08TR_sp.mlf -H hmm/macros -H hmm/models -M hmm lib/models_sp.lst

HHEd – Increase Number of Mixtures HHEd -H hmm/macros -H hmm/models -M hmm lib/mix2_10.hed lib/models_sp.lst

Modification of Models You can modify # of Gaussian mixture here. lib/mix2_10.hed MU 2 {liN.state[2-4].mix} MU 2 {#i.state[2-4].mix} MU 2 {#er.state[2-4].mix} MU 2 {san.state[2-4].mix} MU 2 {sy.state[2-4].mix} … MU 3 {sil.state[2-4].mix} MU +2 {san.state[2-9].mix} Check HTKBook 17.8 HHEd for more details This value tells HTK to change the mixture number from state 2 to state 4. If you want to change # state, check lib/proto. You can increase # Gaussian mixture here.

HERest - Adjust HMMs Again HERest -C lib/config.cfg -S scripts/training.scp -I labels/Clean08TR_sp.mlf -H hmm/macros -H hmm/models -M hmm lib/models_sp.lst

Training Flowchart Hint: Increase mixtures little by little x3 x6

Testing Flowchart

HParse - Construct Word Net HParse lib/grammar_sp lib/wdnet_sp lib/grammar_sp regular expression lib/wdnet_sp output word net

HVite - Viterbi Search -w lib/wdnet_sp -i result/result.mlf lib/dict HVite -H hmm/macros -H hmm/models -S scripts/testing.scp -C lib/config.cfg -w lib/wdnet_sp -l '*' -i result/result.mlf -p 0.0 -s 0.0 lib/dict lib/models_sp.lst -w lib/wdnet_sp input word net -i result/result.mlf output MLF file lib/dict dictionary: a mapping from word to phone sequences ling -> liN, er -> #er, … . 一 -> sic_i i, 七-> chi_i i

HResult - Compared With Answer HResults -e "???" sil -e "???" sp -I labels/answer.mlf lib/models_sp.lst result/result.mlf Longest Common Subsequence (LCS) ====================== HTK Results Analysis ======================= Date: Wed Apr 17 00:26:54 2013 Ref : labels/answer.mlf Rec : result/result.mlf ------------------------ Overall Results -------------------------- SENT: %Correct=38.54 [H=185, S=295, N=480] WORD: %Corr=96.61, Acc=74.34 [H=1679, D=13, S=46, I=387, N=1738] ==============================================================

Part 1 (40%) – Run Baseline Download HTK tools and homework package Set PATH for HTK tools set_htk_path.sh Execute (bash shell script) (00_clean_all.sh) 01_run_HCopy.sh 02_run_HCompV.sh 03_training.sh 04_testing.sh You can find accuracy in “result/accuracy” the baseline accuracy is 74.34%

Useful tips To unzip files To set path in “set_htk_path.sh” unzip XXXX.zip tar -zxvf XXXX.tar.gz To set path in “set_htk_path.sh” PATH=$PATH:“~/XXXX/XXXX” In case shell script is not permitted to run… chmod 744 XXXX.sh

Part 2 (40%) – Improve Recognition Accuracy Acc > 95% for full credit ; 90~95% for partial credit x3 x6 Increase number of states Modify the original script!

Attention(1) Executing 03_training.sh twice is different from doubling the number of training iterations. To increase the number of training iterations, please modify the script, rather than run it many times. If you executed 03_training.sh more than once, you will get some penalty.

Attention(2) Every time you modified any parameter or file, you should run 00_clean_all.sh to remove all the files that were produced before, and restart all the procedures. If not, the new settings will be performed on the previous files, and hence you will be not able to analyze the new results. (Of course, you should record your current results before starting the next experiment.)

Part 3 (20%) Write a report describing your training process and accuracy. Number of states, Gaussian mixtures, iterations, … How some changes effect the performance Other interesting discoveries Well-written report may get +10% bonus.

Submission Requirements 4 shell scripts your modified 01~04_XXXX.sh 1 accuracy file with only your best accuracy (The baseline result is not needed.) proto your modified hmm prototype mix2_10.hed your modified file which specifies the number of GMMs of each state 1 report (in PDF format) the filename should be hw2-1_bXXXXXXXX.pdf (your student ID) Compress the above 8 files into 1 zip file and upload it to Ceiba. 10% of the final score will be taken off for each day of late submission

If you have any problem… Check for hints in the shell scripts. Check the HTK book. Ask friends who are familiar with Linux commands or Cygwin. This should solve all your technical problems. Contact the TA by email. But please allow a few days to respond. 蔡政昱 r02942067@ntu.edu.tw