The HTK Book (for HTK Version 3.2.1) Young et al., 2002.

Slides:



Advertisements
Similar presentations
1 Using the HTK speech recogniser to analyse prosody in a corpus of German spoken learners English Toshifumi Oba, Eric Atwell University of Leeds, School.
Advertisements

Building an ASR using HTK CS4706
Building an ASR using HTK CS4706
數位語音處理概論 HW#2-1 HMM Training and Testing
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
INSTRUCTOR:Dr.Veton Kepuska STUDENT:Dileep Narayan.Koneru YES/NO RECOGNITION SYSTEM.
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Using Motherese in Speech Recognition EE516 final project Steven Schimmel March 13, 2003.
Linguist Module in Sphinx-4 By Sonthi Dusitpirom.
Nonparametric-Bayesian approach for automatic generation of subword units- Initial study Amir Harati Institute for Signal and Information Processing Temple.
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
Sequential Modeling with the Hidden Markov Model Lecture 9 Spoken Language Processing Prof. Andrew Rosenberg.
Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Non-native Speech Languages have different pronunciation spaces
Recognition Process (HTK)
EE225D Final Project Text-Constrained Speaker Recognition Using Hidden Markov Models Kofi A. Boakye EE225D Final Project.
1 CA461 Speech Processing 1 John McKenna. 2 Welcome Admin –Contact –Prerequisites –Assessment Module Overview –Syllabus –Learning Outcomes Introductory.
Automatic Continuous Speech Recognition Database speech text Scoring.
May 20, 2006SRIV2006, Toulouse, France1 Acoustic Modeling of Accented English Speech for Large-Vocabulary Speech Recognition ATR Spoken Language Communication.
Introduction to Automatic Speech Recognition
Adaptation Techniques in Automatic Speech Recognition Tor André Myrvoll Telektronikk 99(2), Issue on Spoken Language Technology in Telecommunications,
1M4 speech recognition University of Sheffield M4 speech recognition Martin Karafiát*, Steve Renals, Vincent Wan.
Arthur Kunkle ECE 5525 Fall Introduction and Motivation  A Large Vocabulary Speech Recognition (LVSR) system is a system that is able to convert.
Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Arnel Fajardo, student (“Hak Seng”)
Presentation by Daniel Whiteley AME department
DSP homework 1 HMM Training and Testing
1M4 speech recognition University of Sheffield M4 speech recognition Vincent Wan, Martin Karafiát.
Results of Tagalog vowel Speech recognition using Continuous HMM Arnel C. Fajardo Ph. D student (Under the supervision of Professor Yoon-Joong Kim)
IRCS/CCN Summer Workshop June 2003 Speech Recognition.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.
SPEECH RECOGNITION Presented to Dr. V. Kepuska Presented by Lisa & Za ECE 5526.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Hidden Markov Models: Decoding & Training Natural Language Processing CMSC April 24, 2003.
Robust speaking rate estimation using broad phonetic class recognition Jiahong Yuan and Mark Liberman University of Pennsylvania Mar. 16, 2010.
July Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme.
Conditional Random Fields for ASR Jeremy Morris July 25, 2006.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.
Performance Comparison of Speaker and Emotion Recognition
© 2013 by Larson Technical Services
Introduction Part I Speech Representation, Models and Analysis Part II Speech Recognition Part III Speech Synthesis Part IV Speech Coding Part V Frontier.
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
HMM-Based Speech Synthesis Erica Cooper CS4706 Spring 2011.
Copyright © 2013 by Educational Testing Service. All rights reserved. Evaluating Unsupervised Language Model Adaption Methods for Speaking Assessment ShaSha.
VoiceXML – Speech Recognition Yousef Rabah. VoiceXML Markup Language Dialogs Dependencies Standalone Vs. Hosted Speaker Dependent Vs. Speaker Independent.
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
1 Experiments with Detector- based Conditional Random Fields in Phonetic Recogntion Jeremy Morris 06/01/2007.
Review: Review: Translating without in-domain corpus: Machine translation post-editing with online learning techniques Antonio L. Lagarda, Daniel Ortiz-Martínez,
By: Nicole Cappella. Why I chose Speech Recognition  Always interested me  Dr. Phil Show Manti Teo Girlfriend Hoax  Three separate voice analysts proved.
Auto Speech Recognition by İlkay ATIL Outline-1 Introduction Today and Future of ASR Automatic Speech Recognition Types of ASR systems Fundamentals.
Christoph Prinz / Automatic Speech Recognition Research Progress Hits the Road.
Speaker Recognition UNIT -6. Introduction  Speaker recognition is the process of automatically recognizing who is speaking on the basis of information.
Creating Speech Recognizers Quickly Björn Bringert Department of Computer Science and Engineering Chalmers.
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture
Automatic Speech Recognition Introduction
Conditional Random Fields for ASR
Jeremy Morris & Eric Fosler-Lussier 04/19/2007
PROJ2: Building an ASR System
Presentation by Daniel Whiteley AME department
Command Me Specification
Cheng-Kuan Wei1 , Cheng-Tao Chung1 , Hung-Yi Lee2 and Lin-Shan Lee2
專題進度報告 資工四 B 洪志豪 資工四 B 林宜鴻.
Visual Recognition of American Sign Language Using Hidden Markov Models 문현구 문현구.
Presentation transcript:

The HTK Book (for HTK Version 3.2.1) Young et al., 2002

Chapter 1 The Fundamentals of HTK HTK is a toolkit for building hidden Markov models (HMMs). Primarily used to build ASRs, but also other HMM systems: speaker and image recognition, automatic text summarization etc. HTK has tools (modules) for both training and testing HMM systems.

How to Train and Test an ASR? Things needed: A labeled speech corpus and a dictionary (+ grammar). Procedure: 1. Divide corpus into training, development and test sets. 2. Train acoustic models. 3. Test, retrain, test … on the development set. 4. Test on the test data.

How to Build an ASR Using HTK? Goal: A recognizer for voice dialing. ( SENT-START ( DIAL | (PHONE|CALL) $name) SENT- END )

Creating a Dictionary HDMan a list of the phones. An HMM will be estimated for each of these phones.

Recording the Data HSLab noname HSGen (wdnet dict) testprompts

Transcribing the Data HMM training is supervised learning.

Coding the Data HTK supports frame-based FFTs, LPCs, MFCCs, user-defined etc.

Output Probability Specification Most common one is CDHMM. HTK also allows discrete probabilities (for VQ data).

Flat Start Training Build a prototype HMM with reasonable initial guesses of its parameters (HCompV). Specify the topology – usually left to right and 3 states w/ no skips. Create a MMF. Now use HRest or HERest for training.

Realigning and Creating Triphones. Use pseudo-recognition to force align training data w/ multiple pronunciations.

Evaluation

Other Issues HTK supports supervised and unsupervised speaker adaptation (HVite). Language model: n-gram language models.