1M4 speech recognition University of Sheffield M4 speech recognition Vincent Wan, Martin Karafiát.

Slides:



Advertisements
Similar presentations
ViSiCAST Consortium Meeting, January WP3 Report, September-December 2001.
Advertisements

專題研究 WEEK 4 - LIVE DEMO Prof. Lin-Shan Lee TA. Hsiang-Hung Lu,Cheng-Kuan Wei.
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
SRI 2001 SPINE Evaluation System Venkata Ramana Rao Gadde Andreas Stolcke Dimitra Vergyri Jing Zheng Kemal Sonmez Anand Venkataraman.
PERFORMANCE ANALYSIS OF AURORA LARGE VOCABULARY BASELINE SYSTEM Naveen Parihar, and Joseph Picone Center for Advanced Vehicular Systems Mississippi State.
HIWIRE MEETING Paris, February 11, 2005 JOSÉ C. SEGURA LUNA GSTC UGR.
Application of HMMs: Speech recognition “Noisy channel” model of speech.
Speaker Adaptation in Sphinx 3.x and CALO David Huggins-Daines
Incorporating Tone-related MLP Posteriors in the Feature Representation for Mandarin ASR Overview Motivation Tone has a crucial role in Mandarin speech.
Technical Aspects of the CALO Recorder By Satanjeev Banerjee Thomas Quisel Jason Cohen Arthur Chan Yitao Sun David Huggins-Daines Alex Rudnicky.
Improved Tone Modeling for Mandarin Broadcast News Speech Recognition Xin Lei 1, Manhung Siu 2, Mei-Yuh Hwang 1, Mari Ostendorf 1, Tan Lee 3 1 SSLI Lab,
Efficient Computer Interfaces Using Continuous Gestures, Language Models, and Speech Keith Vertanen July 30 th, 2004.
CALO Decoder Progress Report for June Arthur (Decoder, Trainer, ICSI Training) Yitao (Live-mode Decoder) Ziad (ICSI Training) Carnegie Mellon University.
May 30th, 2006Speech Group Lunch Talk Features for Improved Speech Activity Detection for Recognition of Multiparty Meetings Kofi A. Boakye International.
15-Jul-04 FSG Implementation in Sphinx2 FSG Implementation in Sphinx2 Mosur Ravishankar Jul 15, 2004.
Why is ASR Hard? Natural speech is continuous
Adaptation Techniques in Automatic Speech Recognition Tor André Myrvoll Telektronikk 99(2), Issue on Spoken Language Technology in Telecommunications,
The 2000 NRL Evaluation for Recognition of Speech in Noisy Environments MITRE / MS State - ISIP Burhan Necioglu Bryan George George Shuttic The MITRE.
1 International Computer Science Institute Data Sampling for Acoustic Model Training Özgür Çetin International Computer Science Institute Andreas Stolcke.
1M4 speech recognition University of Sheffield M4 speech recognition Martin Karafiát*, Steve Renals, Vincent Wan.
Arthur Kunkle ECE 5525 Fall Introduction and Motivation  A Large Vocabulary Speech Recognition (LVSR) system is a system that is able to convert.
Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.
Speech Recognition Application
March 24, 2005EARS STT Workshop1 A Study of Some Factors Impacting SuperARV Language Modeling Wen Wang 1 Andreas Stolcke 1 Mary P. Harper 2 1. Speech Technology.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Speaker Diarisation and Large Vocabulary Recognition at CSTR: The AMI/AMIDA System Fergus McInnes 7 December 2011 History – AMI, AMIDA and recent developments.
LREC 2008, Marrakech, Morocco1 Automatic phone segmentation of expressive speech L. Charonnat, G. Vidal, O. Boëffard IRISA/Cordial, Université de Rennes.
1 Improved Speaker Adaptation Using Speaker Dependent Feature Projections Spyros Matsoukas and Richard Schwartz Sep. 5, 2003 Martigny, Switzerland.
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
Modeling Speech using POMDPs In this work we apply a new model, POMPD, in place of the traditional HMM to acoustically model the speech signal. We use.
A Phonetic Search Approach to the 2006 NIST Spoken Term Detection Evaluation Roy Wallace, Robbie Vogt and Sridha Sridharan Speech and Audio Research Laboratory,
CMU Robust Vocabulary-Independent Speech Recognition System Hsiao-Wuen Hon and Kai-Fu Lee ICASSP 1991 Presenter: Fang-Hui CHU.
The Use of Context in Large Vocabulary Speech Recognition Julian James Odell March 1995 Dissertation submitted to the University of Cambridge for the degree.
HELSINKI UNIVERSITY OF TECHNOLOGY LABORATORY OF COMPUTER AND INFORMATION SCIENCE NEURAL NETWORKS RESEARCH CENTRE The development of the HTK Broadcast News.
HIERARCHICAL SEARCH FOR LARGE VOCABULARY CONVERSATIONAL SPEECH RECOGNITION Author :Neeraj Deshmukh, Aravind Ganapathiraju and Joseph Picone.
8.0 Search Algorithms for Speech Recognition References: of Huang, or of Becchetti, or , of Jelinek 4. “ Progress.
1 Update on WordWave Fisher Transcription Owen Kimball, Chia-lin Kao, Jeff Ma, Rukmini Iyer, Rich Schwartz, John Makhoul.
ELIS-DSSP Sint-Pietersnieuwstraat 41 B-9000 Gent Recognition of foreign names spoken by native speakers Frederik Stouten & Jean-Pierre Martens Ghent University.
Hidden Markov Models: Decoding & Training Natural Language Processing CMSC April 24, 2003.
Recurrent neural network based language model Tom´aˇs Mikolov, Martin Karafia´t, Luka´sˇ Burget, Jan “Honza” Cˇernocky, Sanjeev Khudanpur INTERSPEECH 2010.
ISL Meeting Recognition Hagen Soltau, Hua Yu, Florian Metze, Christian Fügen, Yue Pan, Sze-Chen Jou Interactive Systems Laboratories.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
The HTK Book (for HTK Version 3.2.1) Young et al., 2002.
Latent Topic Modeling of Word Vicinity Information for Speech Recognition Kuan-Yu Chen, Hsuan-Sheng Chiu, Berlin Chen ICASSP 2010 Hao-Chin Chang Department.
Database and Visual Front End Makis Potamianos.
HMM-Based Speech Synthesis Erica Cooper CS4706 Spring 2011.
Performance Analysis of Advanced Front Ends on the Aurora Large Vocabulary Evaluation Authors: Naveen Parihar and Joseph Picone Inst. for Signal and Info.
Spontaneous speech recognition using a statistical model of VTR-dynamics Team members: L.Deng (co-tech.team leader), J.Ma, M.Schuster, J.Bridle (co-tech.team.
Experiments in Adaptive Language Modeling Lidia Mangu & Geoffrey Zweig.
ALPHABET RECOGNITION USING SPHINX-4 BY TUSHAR PATEL.
1 Voicing Features Horacio Franco, Martin Graciarena Andreas Stolcke, Dimitra Vergyri, Jing Zheng STAR Lab. SRI International.
ECE 8443 – Pattern Recognition EE 8524 – Speech Signal Processing Objectives: Word Graph Generation Lattices Hybrid Systems Resources: ISIP: Search ISIP:
Author :K. Thambiratnam and S. Sridharan DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING Reporter.
2009 NIST Language Recognition Systems Yan SONG, Bing Xu, Qiang FU, Yanhua LONG, Wenhui LEI, Yin XU, Haibing ZHONG, Lirong DAI USTC-iFlytek Speech Group.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
H ADVANCES IN MANDARIN BROADCAST SPEECH RECOGNITION Overview Goal Build a highly accurate Mandarin speech recognizer for broadcast news (BN) and broadcast.
Automatic Speech Recognition
Qifeng Zhu, Barry Chen, Nelson Morgan, Andreas Stolcke ICSI & SRI
Prof. Lin-shan Lee TA. Roy Lu
Juicer: A weighted finite-state transducer speech decoder
The Development of the AMI System for the Transcription of Speech in Meetings Thomas Hain, Lukas Burget, John Dines, Iain McCowan, Giulia Garau, Martin.
Prof. Lin-shan Lee TA. Lang-Chi Yu
Sphinx 3.X (X=4) Four-Layer Categorization Scheme of Fast GMM Computation Techniques in Large Vocabulary Continuous Speech Recognition Systems
Sphinx Recognizer Progress Q2 2004
Speaker Identification:
Presenter : Jen-Wei Kuo
Prof. Lin-shan Lee TA. Roy Lu
Presentation transcript:

1M4 speech recognition University of Sheffield M4 speech recognition Vincent Wan, Martin Karafiát

2M4 speech recognition The Recogniser Front end n -best lattice generation Best first decoding (Ducoder) Trigram language model (SRILM) Word internal triphone models MLLR adaptation (HTK) Cross word triphone models Recognition output Lattice rescoring Time synchronous decoding (HTK) MLLR adaptation (HTK) Recognition output

3M4 speech recognition System limitations N-best list rescoring not optimal Adaptation must be performed on two sets of acoustic models Many more hyper-parameters to tune manually SRILM is not efficient on very large language models (greater than 10e+9 words)

4M4 speech recognition Advances since last meeting Models trained on two databases –SWITCHBOARD recogniser Acoustic & language models trained on 200 hours of speech –ICSI meetings recogniser Acoustic models trained on 40 hours of speech Language model is a combination of SWB and ICSI Improvements mainly affect the Switchboard models 16kHz sampling rate used throughout

5M4 speech recognition Advances since last meeting Adaptation of word internal context dependent models Unified the phone sets and pronunciation dictionaries –Improved the pronunciation dictionary for Switchboard –Now using the ICSI dictionary with missing pronunciations imported from the ISIP dictionary Better handling of multiple pronunciations during acoustic model training General bug fixes

6M4 speech recognition Results overview SWB trnICSI trn SWB trn ICSI adpt ICSI trn ICSI adpt ICSI trn M4 adpt SWB trn M4 adpt SWB trn ICSI adpt M4 adpt SWB ICSI M * † * † % word error rates * Results from lapel mics † Results from beam former

7M4 speech recognition Results: adaptation vs. direct training on ICSI ICSI trained SWB trained ICSI adapted Monophone models * Context dependent word internal models * Lattice rescoring (none or spkr independent adaptation) Lattice rescoring (speaker adaptation) % word error rates * Results from Ducoder using all pruning

8M4 speech recognition Acoustic model adaptation issue Acoustic models are presently not very adaptive –Better MLLR code required (next slide) –More training data required Need to make better use of the combined ICSI/SWB training data for M4.

9M4 speech recognition Other news The next version of HTK’s adaptation code will be made available to M4 before the official public release. Sheffield to acquire HTK LVCSR decoder –Licensing issues to be resolved –May be able to make binaries available to M4 partners