Utterance Verification for Spontaneous Mandarin Speech Keyword Spotting Liu Xin, BinXi Wang Presenter: Kai-Wun Shih No.306, P.O. Box 1001,ZhengZhou,450002,

Slides:



Advertisements
Similar presentations
Results obtained in speaker recognition using Gaussian Mixture Models Marieta Gâta*, Gavril Toderean** *North University of Baia Mare **Technical University.
Advertisements

Multipitch Tracking for Noisy Speech
Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
Frederico Rodrigues and Isabel Trancoso INESC/IST, 2000 Robust Recognition of Digits and Natural Numbers.
Author :Panikos Heracleous, Tohru Shimizu AN EFFICIENT KEYWORD SPOTTING TECHNIQUE USING A COMPLEMENTARY LANGUAGE FOR FILLER MODELS TRAINING Reporter :
15.0 Utterance Verification and Keyword/Key Phrase Spotting References: 1. “Speech Recognition and Utterance Verification Based on a Generalized Confidence.
Confidence Measures for Speech Recognition Reza Sadraei.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Speech Recognition. What makes speech recognition hard?
HMM-BASED PATTERN DETECTION. Outline  Markov Process  Hidden Markov Models Elements Basic Problems Evaluation Optimization Training Implementation 2-D.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.
Minimum Classification Error Networks Based on book chapter 9, by Shigeru Katagiri Jaakko Peltonen, 28 th February, 2002.
Language and Speaker Identification using Gaussian Mixture Model Prepare by Jacky Chau The Chinese University of Hong Kong 18th September, 2002.
EE225D Final Project Text-Constrained Speaker Recognition Using Hidden Markov Models Kofi A. Boakye EE225D Final Project.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
2001/03/29Chin-Kai Wu, CS, NTHU1 Speech and Language Technologies for Audio Indexing and Retrieval JOHN MAKHOUL, FELLOW, IEEE, FRANCIS KUBALA, TIMOTHY.
Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos
May 20, 2006SRIV2006, Toulouse, France1 Acoustic Modeling of Accented English Speech for Large-Vocabulary Speech Recognition ATR Spoken Language Communication.
Acoustic and Linguistic Characterization of Spontaneous Speech Masanobu Nakamura, Koji Iwano, and Sadaoki Furui Department of Computer Science Tokyo Institute.
Isolated-Word Speech Recognition Using Hidden Markov Models
Presented by: Fang-Hui, Chu Automatic Speech Recognition Based on Weighted Minimum Classification Error Training Method Qiang Fu, Biing-Hwang Juang School.
June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST.
A Talking Elevator, WS2006 UdS, Speaker Recognition 1.
Speech Processing Laboratory
7-Speech Recognition Speech Recognition Concepts
Diamantino Caseiro and Isabel Trancoso INESC/IST, 2000 Large Vocabulary Recognition Applied to Directory Assistance Services.
Chapter 14 Speaker Recognition 14.1 Introduction to speaker recognition 14.2 The basic problems for speaker recognition 14.3 Approaches and systems 14.4.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
Speaker independent Digit Recognition System Suma Swamy Research Scholar Anna University, Chennai 10/22/2015 9:10 PM 1.
Presented by: Fang-Hui Chu Boosting HMM acoustic models in large vocabulary speech recognition Carsten Meyer, Hauke Schramm Philips Research Laboratories,
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
ICASSP Speech Discrimination Based on Multiscale Spectro–Temporal Modulations Nima Mesgarani, Shihab Shamma, University of Maryland Malcolm Slaney.
Improving Speech Modelling Viktoria Maier Supervised by Prof. Hynek Hermansky.
1 Hidden Markov Model 報告人:鄒昇龍. 2 Outline Introduction to HMM Activity of HMM Problem and Solution Conclusion Reference.
A Phonetic Search Approach to the 2006 NIST Spoken Term Detection Evaluation Roy Wallace, Robbie Vogt and Sridha Sridharan Speech and Audio Research Laboratory,
Speaker Authentication Qi Li and Biing-Hwang Juang, Pattern Recognition in Speech and Language Processing, Chap 7 Reporter : Chang Chih Hao.
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Robust Entropy-based Endpoint Detection for Speech Recognition in Noisy Environments 張智星
Tom Ko and Brian Mak The Hong Kong University of Science and Technology.
National Taiwan University, Taiwan
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
© 2005, it - instituto de telecomunicações. Todos os direitos reservados. Arlindo Veiga 1,2 Sara Cadeias 1 Carla Lopes 1,2 Fernando Perdigão 1,2 1 Instituto.
Performance Comparison of Speaker and Emotion Recognition
A New Approach to Utterance Verification Based on Neighborhood Information in Model Space Author :Hui Jiang, Chin-Hui Lee Reporter : 陳燦輝.
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
Speaker Verification Using Adapted GMM Presented by CWJ 2000/8/16.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
1 Hidden Markov Model: Overview and Applications in MIR MUMT 611, March 2005 Paul Kolesnik MUMT 611, March 2005 Paul Kolesnik.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
S1S1 S2S2 S3S3 8 October 2002 DARTS ATraNoS Automatic Transcription and Normalisation of Speech Jacques Duchateau, Patrick Wambacq, Johan Depoortere,
Subjective evaluation of an emotional speech database for Basque Aholab Signal Processing Laboratory – University of the Basque Country Authors: I. Sainz,
NTNU Speech Lab 1 Topic Themes for Multi-Document Summarization Sanda Harabagiu and Finley Lacatusu Language Computer Corporation Presented by Yi-Ting.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.
A Study on Speaker Adaptation of Continuous Density HMM Parameters By Chin-Hui Lee, Chih-Heng Lin, and Biing-Hwang Juang Presented by: 陳亮宇 1990 ICASSP/IEEE.
Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.
A NONPARAMETRIC BAYESIAN APPROACH FOR
Olivier Siohan David Rybach
Online Multiscale Dynamic Topic Models
Statistical Models for Automatic Speech Recognition
Computational NeuroEngineering Lab
EEG Recognition Using The Kaldi Speech Recognition Toolkit
Statistical Models for Automatic Speech Recognition
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
The Application of Hidden Markov Models in Speech Recognition
Presentation transcript:

Utterance Verification for Spontaneous Mandarin Speech Keyword Spotting Liu Xin, BinXi Wang Presenter: Kai-Wun Shih No.306, P.O. Box 1001,ZhengZhou,450002, Henan, P.R. China

2 Outline Introduction Feature Extraction and Acoustic Modeling Keyword Recognition Keyword Verification And Confidence Measures Experiments and Results Conclusions

Introduction (1/2) 3 Utterance verification represents an important technology in the design of user-friendly speech recognition systems. Recognizers equipped with a keyword spotting capability allow users the flexibility to speak naturally without the need to follow a rigid speaking format.

Introduction (2/2) Keyword spotting systems introduce a filler model for enhancing keyword detection and absorbing out-of-vocabulary event. To reduce false alarm rate, in this paper we have incorporated two- level utterance verification following detection and segmentation of speech into keyword hypothesis via a conventional Viterbi search. 4

Feature Extraction and Acoustic Modeling (1/3) 5

Considering that Chinese is a monosyllable language, we choose syllable as the base recognition units. Except for the background silence unit, each syllable is modeled by six-state left-to-right hidden markov models (HMM). Each state is characterized by a mixture Gaussian state observation density. Training of each syllable model consisted of estimating the mean, covariance, and mixture weights for each state using maximum likelihood(ML) estimation. 6 Feature Extraction and Acoustic Modeling (2/3)

For each syllable model, an anti-syllable model was also trained. In general, for every syllable model, the corresponding anti-syllable model should be trained on the data of all syllables but that of syllable. Aside from syllable and anti-syllable models, we also introduced a general acoustic filler model trained on non-keyword speech data, and a silence model trained on the non-speech segments of the signal. 7 Feature Extraction and Acoustic Modeling (3/3)

8 Keyword Recognition (1/2)

9 Keyword Recognition (2/2)

10 Keyword Verification And Confidence Measures (1/9)

11 Keyword Verification And Confidence Measures (2/9)

12 Keyword Verification And Confidence Measures (3/9)

13 Keyword Verification And Confidence Measures (4/9)

14 Keyword Verification And Confidence Measures (5/9)

15 Keyword Verification And Confidence Measures (6/9)

16 Keyword Verification And Confidence Measures (7/9)

17 Keyword Verification And Confidence Measures (8/9)

18 Keyword Verification And Confidence Measures (9/9)

In this system, 20 city names were selected as the keywords. A continuous telephone-speech database was employed to train the system which is composed of short spontaneous speech, syllables, words and sentences. This database was pronounced by 70 speakers (50 males,20 females). We also recorded 205 utterances for testing spoken by a different group of 20 speakers (15 males, 5 females) responding to 20 city names. 19 Experiments and Results Average Detection Rate Average False Alarm Rate No verification87.5%12.0% 86.5%8.4% 86.8%7.4% 87.5%7.0% 86.7%8.2% Table1 Performance with several confidence measures

The spotting system adopts a Wastage strategy, with recognition followed by verification and the basic unit of the system is syllable. In the second stage, a keyword verification function with four different confidence measures is evaluated. Experiment results show that utterance verification with the third confidence measure outperforms the baseline system. 20 Conclusions

END 21