The HTK Book (for HTK Version 3.2.1) Young et al., 2002
Chapter 1 The Fundamentals of HTK HTK is a toolkit for building hidden Markov models (HMMs). Primarily used to build ASRs, but also other HMM systems: speaker and image recognition, automatic text summarization etc. HTK has tools (modules) for both training and testing HMM systems.
How to Train and Test an ASR? Things needed: A labeled speech corpus and a dictionary (+ grammar). Procedure: 1. Divide corpus into training, development and test sets. 2. Train acoustic models. 3. Test, retrain, test … on the development set. 4. Test on the test data.
How to Build an ASR Using HTK? Goal: A recognizer for voice dialing. ( SENT-START ( DIAL | (PHONE|CALL) $name) SENT- END )
Creating a Dictionary HDMan a list of the phones. An HMM will be estimated for each of these phones.
Recording the Data HSLab noname HSGen (wdnet dict) testprompts
Transcribing the Data HMM training is supervised learning.
Coding the Data HTK supports frame-based FFTs, LPCs, MFCCs, user-defined etc.
Output Probability Specification Most common one is CDHMM. HTK also allows discrete probabilities (for VQ data).
Flat Start Training Build a prototype HMM with reasonable initial guesses of its parameters (HCompV). Specify the topology – usually left to right and 3 states w/ no skips. Create a MMF. Now use HRest or HERest for training.
Realigning and Creating Triphones. Use pseudo-recognition to force align training data w/ multiple pronunciations.
Evaluation
Other Issues HTK supports supervised and unsupervised speaker adaptation (HVite). Language model: n-gram language models.