Diamantino Caseiro and Isabel Trancoso INESC/IST, 2000 Large Vocabulary Recognition Applied to Directory Assistance Services.

Slides:



Advertisements
Similar presentations
AUTOMATIC PHONETIC ANNOTATION OF AN ORTHOGRAPHICALLY TRANSCRIBED SPEECH CORPUS Rui Amaral, Pedro Carvalho, Diamantino Caseiro, Isabel Trancoso, Luís Oliveira.
Advertisements

Building an ASR using HTK CS4706
Building an ASR using HTK CS4706
© Fraunhofer FKIE Corinna Harwardt Automatic Speaker Recognition in Military Environment.
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Frederico Rodrigues and Isabel Trancoso INESC/IST, 2000 Robust Recognition of Digits and Natural Numbers.
Author :Panikos Heracleous, Tohru Shimizu AN EFFICIENT KEYWORD SPOTTING TECHNIQUE USING A COMPLEMENTARY LANGUAGE FOR FILLER MODELS TRAINING Reporter :
PERFORMANCE ANALYSIS OF AURORA LARGE VOCABULARY BASELINE SYSTEM Naveen Parihar, and Joseph Picone Center for Advanced Vehicular Systems Mississippi State.
SPEECH RECOGNITION BASED ON BAYESIAN NETWORKS WITH ENERGY AS AN AUXILIARY VARIABLE Jaume Escofet Carmona IDIAP, Martigny, Switzerland UPC, Barcelona, Spain.
Application of HMMs: Speech recognition “Noisy channel” model of speech.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Signal Modeling for Robust Speech Recognition With Frequency Warping and Convex Optimization Yoon Kim March 8, 2000.
Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo.
8/12/2003 Text-Constrained Speaker Recognition Using Hidden Markov Models Kofi A. Boakye International Computer Science Institute.
EE225D Final Project Text-Constrained Speaker Recognition Using Hidden Markov Models Kofi A. Boakye EE225D Final Project.
Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos
Representing Acoustic Information
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
May 20, 2006SRIV2006, Toulouse, France1 Acoustic Modeling of Accented English Speech for Large-Vocabulary Speech Recognition ATR Spoken Language Communication.
Acoustic and Linguistic Characterization of Spontaneous Speech Masanobu Nakamura, Koji Iwano, and Sadaoki Furui Department of Computer Science Tokyo Institute.
Introduction to Automatic Speech Recognition
The 2000 NRL Evaluation for Recognition of Speech in Noisy Environments MITRE / MS State - ISIP Burhan Necioglu Bryan George George Shuttic The MITRE.
Overview of NIT HMM-based speech synthesis system for Blizzard Challenge 2011 Kei Hashimoto, Shinji Takaki, Keiichiro Oura, and Keiichi Tokuda Nagoya.
Isolated-Word Speech Recognition Using Hidden Markov Models
Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
Whither Linguistic Interpretation of Acoustic Pronunciation Variation Annika Hämäläinen, Yan Han, Lou Boves & Louis ten Bosch.
Speech and Language Processing
Utterance Verification for Spontaneous Mandarin Speech Keyword Spotting Liu Xin, BinXi Wang Presenter: Kai-Wun Shih No.306, P.O. Box 1001,ZhengZhou,450002,
Automatic detection of microchiroptera echolocation calls from field recordings using machine learning algorithms Mark D. Skowronski and John G. Harris.
VBS Documentation and Implementation The full standard initiative is located at Quick description Standard manual.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
By: Meghal Bhatt.  Sphinx4 is a state of the art speaker independent, continuous speech recognition system written entirely in java programming language.
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Jacob Zurasky ECE5526 – Spring 2011
Csc Lecture 7 Recognizing speech. Geoffrey Hinton.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Modeling Speech using POMDPs In this work we apply a new model, POMPD, in place of the traditional HMM to acoustically model the speech signal. We use.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
SPEECH RECOGNITION Presented to Dr. V. Kepuska Presented by Lisa & Za ECE 5526.
‘Missing Data’ speech recognition in reverberant conditions using binaural interaction Sue Harding, Jon Barker and Guy J. Brown Speech and Hearing Research.
SCALING UP: LEARNING LARGE-SCALE RECOGNITION METHODS FROM SMALL-SCALE RECOGNITION TASKS Nelson Morgan, Barry Y Chen, Qifeng Zhu, Andreas Stolcke International.
Towards a Cohort-Selective Frequency- Compression Hearing Aid Marie Roch ¤, Richard R. Hurtig ¥, Jing Lui ¤, and Tong Huang ¤ ¥ ¤
Tom Ko and Brian Mak The Hong Kong University of Science and Technology.
1 LSA 352 Summer 2007 LSA 352 Speech Recognition and Synthesis Dan Jurafsky Lecture 6: Feature Extraction and Acoustic Modeling IP Notice: Various slides.
CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.
Problems of Modeling Phone Deletion in Conversational Speech for Speech Recognition Brian Mak and Tom Ko Hong Kong University of Science and Technology.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.
© 2005, it - instituto de telecomunicações. Todos os direitos reservados. Arlindo Veiga 1,2 Sara Cadeias 1 Carla Lopes 1,2 Fernando Perdigão 1,2 1 Instituto.
Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga Maurício O. Tsugawa ©2002,
Performance Comparison of Speaker and Emotion Recognition
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
Performance Analysis of Advanced Front Ends on the Aurora Large Vocabulary Evaluation Authors: Naveen Parihar and Joseph Picone Inst. for Signal and Info.
Phonetic features in ASR Kurzvortrag Institut für Kommunikationsforschung und Phonetik Bonn 17. Juni 1999 Jacques Koreman Institute of Phonetics University.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
Speech Processing Using HTK Trevor Bowden 12/08/2008.
Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.
NTNU SPEECH AND MACHINE INTELEGENCE LABORATORY Discriminative pronunciation modeling using the MPE criterion Meixu SONG, Jielin PAN, Qingwei ZHAO, Yonghong.
Robert Wielgat, Daniel Król Department of Technology
Automatic Speech Recognition
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
Speech Processing Speech Recognition
Chinese ASR in L2 Fluency
Keyword Spotting Dynamic Time Warping
Presentation transcript:

Diamantino Caseiro and Isabel Trancoso INESC/IST, 2000 Large Vocabulary Recognition Applied to Directory Assistance Services

2 Summary SPEECHDAT telephone corpus Training of general purpose acoustic models Directory experiments and results Conclusions and future work

3 Corpus Description Multilingual telephone speech corpus SPEECHDAT(M)1000 Speakers Male 45% - Female 55% SPEECHDAT II4000 Speakers Male 46% - Female 54%

4 Sub-Corpus Used Train and Development (80% speakers) –W - Phonetically rich words (7h) –A - Phonetically rich sentences (63h) Test (20% speakers) –Directory Assistance Words (Speechdat II) O1 - Spontaneous forename O2 - Spontaneous city name O3 - Read city name (set of 500) O7 - Read name and surname (set of 150)

5 Feature Extraction MFCC (Mel Frequency Cepstral Coefficients) –14 Cepstra + 14  Cepstra + Energy +  Energy –Speech signal band-limited between 200 and 3800 Hz –Hamming window of 25 ms every 10 ms –Cepstral Mean Subtraction (CMS)

6 Acoustic Modeling Left-right continuous density HMM’s –39 Portuguese phones. –Silence and filler models with forward and backward skips Gender Dependent models HMM: Hidden Markov Model

7 Acoustic Modeling Word internal tied state triphones –Tree based clustering –13k triphones –8498 shared states

8 Model Topology

9 Train Train monophones Create triphones by cloning monophones Train triphones to separate the distributions Cluster triphone states using decision tree Synthesize unseen triphones Loop –Train triphones –Increase number of mixtures

10 Train Development set results –2356 phonetically rich words

11 Directory Tasks Spontaneous forename –Recognition using a set of 750 frequent names (1) –Recognition using 640 names from transcriptions (2) Spontaneous city name –Open vocabulary 500 cities City name –Closed vocabulary 500 cities Forename and surname –Closed vocabulary 150 forenames and 150 surnames

12 Conclusions & Future Work The results are promising, but Further work is needed: –Improve general purpose models –Create task-specific models –Fine-tune the recognition