Creating Speech Recognizers Quickly Björn Bringert Department of Computer Science and Engineering Chalmers.

Slides:



Advertisements
Similar presentations
An Integrated Toolkit Deploying Speech Technology for Computer Based Speech Training with Application to Dysarthric Speakers Athanassios Hatzis, Phil Green,
Advertisements

VoiceXML: A Field Evaluation By: Kristy Bradnum Supervisor: Peter Clayton Presented in partial fulfilment of the CS Honours Project.
Building an ASR using HTK CS4706
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
INSTRUCTOR:Dr.Veton Kepuska STUDENT:Dileep Narayan.Koneru YES/NO RECOGNITION SYSTEM.
EE3P BEng Final Year Project – 1 st meeting SLaTE – Speech and Language Technology in Education Martin Russell
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Linguist Module in Sphinx-4 By Sonthi Dusitpirom.
Development of Automatic Speech Recognition and Synthesis Technologies to Support Chinese Learners of English: The CUHK Experience Helen Meng, Wai-Kit.
Natural Language and Speech Processing Creation of computational models of the understanding and the generation of natural language. Different fields coming.
ITCS 6010 Spoken Language Systems: Architecture. Elements of a Spoken Language System Endpointing Feature extraction Recognition Natural language understanding.
Queen Mary, University of London
LING 388: Language and Computers Sandiway Fong Lecture 28: 12/5.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
Language Model. Major role: Language Models help a speech recognizer figure out how likely a word sequence is, independent of the acoustics. A lot of.
Bootstrapping pronunciation models: a South African case study Presented at the CSIR Research and Innovation Conference Marelie Davel & Etienne Barnard.
May 20, 2006SRIV2006, Toulouse, France1 Acoustic Modeling of Accented English Speech for Large-Vocabulary Speech Recognition ATR Spoken Language Communication.
[kmpjuteynl] [fownldi]
Arthur Kunkle ECE 5525 Fall Introduction and Motivation  A Large Vocabulary Speech Recognition (LVSR) system is a system that is able to convert.
Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.
1 Computational Linguistics Ling 200 Spring 2006.
LING 388: Language and Computers Sandiway Fong Lecture 30 12/8.
1 BILC SEMINAR 2009 Speech Recognition: Is It for Real? Tony Mirabito Defense Language Institute English Language Center (DLIELC) DLIELC.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Automata with Output (sketch) We now have ways to formally define languages, and ways to automatically test whether a given string is a member of a language.
Comparison of the SPHINX and HTK Frameworks Processing the AN4 Corpus Arthur Kunkle ECE 5526 Fall 2008.
STARDUST – Speech Training And Recognition for Dysarthric Users of Assistive Technology Mark Hawley et al Barnsley District General Hospital and University.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.
Large Vocabulary Continuous Speech Recognition. Subword Speech Units.
CSE467/567 Computational Linguistics Carl Alphonce Computer Science & Engineering University at Buffalo.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Letter to Phoneme Alignment Using Graphical Models N. Bolandzadeh, R. Rabbany Dept of Computing Science University of Alberta 1 1.
Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.
The HTK Book (for HTK Version 3.2.1) Young et al., 2002.
Performance Comparison of Speaker and Emotion Recognition
MINIMUM WORD CLASSIFICATION ERROR TRAINING OF HMMS FOR AUTOMATIC SPEECH RECOGNITION Yueng-Tien, Lo Speech Lab, CSIE National.
Basic structure of sphinx 4
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.
Reducing uncertainty in speech recognition Controlling mobile devices through voice activated commands Neil Gow, GWXNEI001 Stephen Breyer-Menke, BRYSTE003.
Finite Automata Chapter 1. Automatic Door Example Top View.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Network Training for Continuous Speech Recognition Author: Issac John Alphonso Inst. for Signal and Info. Processing Dept. Electrical and Computer Eng.
ALPHABET RECOGNITION USING SPHINX-4 BY TUSHAR PATEL.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
By: Nicole Cappella. Why I chose Speech Recognition  Always interested me  Dr. Phil Show Manti Teo Girlfriend Hoax  Three separate voice analysts proved.
King Faisal University جامعة الملك فيصل Deanship of E-Learning and Distance Education عمادة التعلم الإلكتروني والتعليم عن بعد [ ] 1 جامعة الملك فيصل عمادة.
語音訊號處理之初步實驗 NTU Speech Lab 指導教授: 李琳山 助教: 熊信寬
#SummitNow Yes, I'm able to index audio files within Alfresco 2013 Fernando González @fegorama.
Arnar Thor Jensson Koji Iwano Sadaoki Furui Tokyo Institute of Technology Development of a Speech Recognition System For Icelandic Using Machine Translated.
This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement n° Reproducible.
VoiceXML Tutorial: Part 1 Introduction and User Interaction with DTMF
Automatic Speech Recognition
Computability Joke. Context-free grammars Parsing. Chomsky
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
Yes, I'm able to index audio files within Alfresco
Juicer: A weighted finite-state transducer speech decoder
Conditional Random Fields for ASR
Specifying, Compiling, and Testing Grammars
EEG Recognition Using The Kaldi Speech Recognition Toolkit
Automatic Speech Recognition: Conditional Random Fields for ASR
PROJ2: Building an ASR System
LECTURE 15: REESTIMATION, EM AND MIXTURES
Research on the Modeling of Chinese Continuous Speech Recognition
Visual Recognition of American Sign Language Using Hidden Markov Models 문현구 문현구.
Network Training for Continuous Speech Recognition
Da-Rong Liu, Kuan-Yu Chen, Hung-Yi Lee, Lin-shan Lee
Presentation transcript:

Creating Speech Recognizers Quickly Björn Bringert Department of Computer Science and Engineering Chalmers University of Technology and Göteborg University

My goals ● For me – Understand how to build a simple speech recognizer ● For others – Make it easier to build speech recognizer prototypes – Allows quick experimentation with different options

Existing components ● Grammatical Framework – GF is a high-level grammar formalism. – We'll assume that there is a GF grammar for what the recognizer should recognize. ● HTK (Hidden Markov Model Toolkit) – Free toolkit for building and using Hidden Markov Models. – General, but geared towards speech recognition.

Things you need to do ● Create pronunciation dictionary – Now automatic for Swedish (still low quality results). ● Create acoustic model – Before with HTK: Lots of semi-automatic steps. – Now automatic given data (still low quality results). ● Create recognition grammar – Can now be generated from a GF grammar.

Recording data ● Generate utterances to record – Automatic given a GF grammar. ● Record utterances – A simple program prompts for each utterance, and records it.

Write pronunciation dictionary ● Markus Forsberg has implemented basic Swedish pronunciation rules. – We use these to generate pronunciations of all word forms in the grammar. ● Can also use Lexin database + Functional Morphology to generate better pronunciations – Lemma pronunciations from Lexin – Word forms from Functional Morphology

Build the acoustic model 1.Transcribe the data using the dictionary. 2.Parametrize the data. 3.Train monophone models. 4.Select the closest pronunciations (using models), retrain. 5.Copy monophone models to make triphone models. 6.Train triphone models.

Create a recognition grammar ● We need a grammar to guide the recognizer – Remember: “recognize speech” / “wreck a nice beach” ● Speech recognition grammars are not fun to write – Simple context-free grammars, or finite automata. – Can generate from a GF grammar: ● GSL (Nuance) ● JSGF (Java Speech API) ● SRGS (W3C standard) ● SLF (HTK)

Evaluation ● Keep some of the recorded data for evaluation. ● Evaluation is automatic using transcribed data, recognition grammar and pronunciation dictionary.

Evaluation results L The phone string lengths used, 1 for monophones and 3 for triphones. TS Number of training utterances. TW Total number of words in the 20 test utterances. CS Percentage of whole test utterances which were recognized correctly. AccThe accuracy (CW - I). CWPercentage of the test words which were recognized correctly. DNumber of deletions as percentage of the number of test words. SNumber of substitutions as percentage of the number of test words. INumber of insertions as percentage of the number of test words.

Future work ● Try with more data. – How good can we make the recognizer with this simple method? ● Tweak model / recognizer parameters – Automatic tweaking using evaluation and machine learning? ● Improve Swedish pronunciation generation. ● Generate more phonetically diverse utterances. ● Improve data collection tool for larger-scale recordings.

Conclusions ● Creating a prototype recognizer has been reduced to: – Writing a GF grammar. – Recording data. – Writing a pronunciation dictionary (automatic for Swedish). ● Quality still low, should try with more data. ● Provides a platform for efficient experimentation.