A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez-

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Improved Neural Network Based Language Modelling and Adaptation J. Park, X. Liu, M.J.F. Gales and P.C. Woodland 2010 INTERSPEECH Bang-Xuan Huang Department.
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
1 Abstract This paper presents a novel modification to the classical Competitive Learning (CL) by adding a dynamic branching mechanism to neural networks.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.
Improving Utterance Verification Using a Smoothed Na ï ve Bayes Model Reporter : CHEN, TZAN HWEI Author :Alberto Sanchis, Alfons Juan and Enrique Vidal.
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
Table 3:Yale Result Table 2:ORL Result Introduction System Architecture The Approach and Experimental Results A Face Processing System Based on Committee.
Hierarchical Distributed Genetic Algorithm for Image Segmentation Hanchuan Peng, Fuhui Long*, Zheru Chi, and Wanshi Siu {fhlong, phc,
REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.
Eric H. Huang, Richard Socher, Christopher D. Manning, Andrew Y. Ng Computer Science Department, Stanford University, Stanford, CA 94305, USA ImprovingWord.
Ekapol Chuangsuwanich and James Glass MIT Computer Science and Artificial Intelligence Laboratory,Cambridge, Massachusetts 02139,USA 2012/07/2 汪逸婷.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
COMPARISON OF A BIGRAM PLSA AND A NOVEL CONTEXT-BASED PLSA LANGUAGE MODEL FOR SPEECH RECOGNITION Md. Akmal Haidar and Douglas O’Shaughnessy INRS-EMT,
DISCRIMINATIVE TRAINING OF LANGUAGE MODELS FOR SPEECH RECOGNITION Hong-Kwang Jeff Kuo, Eric Fosler-Lussier, Hui Jiang, Chin-Hui Lee ICASSP 2002 Min-Hsuan.
Signature with Text-Dependent and Text-Independent Speech for Robust Identity Verification B. Ly-Van*, R. Blouet**, S. Renouard** S. Garcia-Salicetti*,
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Luis Fernando D’Haro, Ondřej Glembek, Oldřich Plchot, Pavel Matejka, Mehdi Soufifar, Ricardo Cordoba, Jan Černocký.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
AGA 4/28/ NIST LID Evaluation On Use of Temporal Dynamics of Speech for Language Identification Andre Adami Pavel Matejka Petr Schwarz Hynek Hermansky.
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
Round-Robin Discrimination Model for Reranking ASR Hypotheses Takanobu Oba, Takaaki Hori, Atsushi Nakamura INTERSPEECH 2010 Min-Hsuan Lai Department of.
Efficient Language Model Look-ahead Probabilities Generation Using Lower Order LM Look-ahead Information Langzhou Chen and K. K. Chin Toshiba Research.
1 Sentence Extraction-based Presentation Summarization Techniques and Evaluation Metrics Makoto Hirohata, Yousuke Shinnaka, Koji Iwano and Sadaoki Furui.
Speaker Verification Speaker verification uses voice as a biometric to determine the authenticity of a user. Speaker verification systems consist of two.
Designing multiple biometric systems: Measure of ensemble effectiveness Allen Tang NTUIM.
UNSUPERVISED CV LANGUAGE MODEL ADAPTATION BASED ON DIRECT LIKELIHOOD MAXIMIZATION SENTENCE SELECTION Takahiro Shinozaki, Yasuo Horiuchi, Shingo Kuroiwa.
I-SMOOTH FOR IMPROVED MINIMUM CLASSIFICATION ERROR TRAINING Haozheng Li, Cosmin Munteanu Pei-ning Chen Department of Computer Science & Information Engineering.
A Repetition Based Measure for Verification of Text Collections and for Text Categorization Dmitry V.Khmelev Department of Mathematics, University of Toronto.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Latent Topic Modeling of Word Vicinity Information for Speech Recognition Kuan-Yu Chen, Hsuan-Sheng Chiu, Berlin Chen ICASSP 2010 Hao-Chin Chang Department.
MINIMUM WORD CLASSIFICATION ERROR TRAINING OF HMMS FOR AUTOMATIC SPEECH RECOGNITION Yueng-Tien, Lo Speech Lab, CSIE National.
Threshold Setting and Performance Monitoring for Novel Text Mining Wenyin Tang and Flora S. Tsai School of Electrical and Electronic Engineering Nanyang.
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
Relevance Language Modeling For Speech Recognition Kuan-Yu Chen and Berlin Chen National Taiwan Normal University, Taipei, Taiwan ICASSP /1/17.
1 Minimum Error Rate Training in Statistical Machine Translation Franz Josef Och Information Sciences Institute University of Southern California ACL 2003.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Database Management Systems, R. Ramakrishnan 1 Algorithms for clustering large datasets in arbitrary metric spaces.
HMM vs. Maximum Entropy for SU Detection Yang Liu 04/27/2004.
Speaker Change Detection using Support Vector Machines V.Kartik, D.Srikrishna Satish and C.Chandra Sekhar Speech and Vision Laboratory Department of Computer.
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Multi-Criteria-based Active Learning for Named Entity Recognition ACL 2004.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Confidence Measures As a Search Guide In Speech Recognition Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering, University.
A Maximum Entropy Language Model Integrating N-grams and Topic Dependencies for Conversational Speech Recognition Sanjeev Khudanpur and Jun Wu Johns Hopkins.
An Empirical Study on Language Model Adaptation Jianfeng Gao, Hisami Suzuki, Microsoft Research Wei Yuan Shanghai Jiao Tong University Presented by Patty.
Maximum Entropy techniques for exploiting syntactic, semantic and collocational dependencies in Language Modeling Sanjeev Khudanpur, Jun Wu Center for.
Author :K. Thambiratnam and S. Sridharan DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING Reporter.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
2009 NIST Language Recognition Systems Yan SONG, Bing Xu, Qiang FU, Yanhua LONG, Wenhui LEI, Yin XU, Haibing ZHONG, Lirong DAI USTC-iFlytek Speech Group.
Recent Paper of Md. Akmal Haidar Meeting before ICASSP 2013 報告者:郝柏翰 2013/05/23.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Pruning Analysis for the Position Specific Posterior Lattices for Spoken Document Search Jorge Silva University of Southern California Ciprian Chelba and.
Paul van Mulbregt Sheera Knecht Jon Yamron Dragon Systems Detection at Dragon Systems.
Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.
Bayes Risk Minimization using Metric Loss Functions R. Schlüter, T. Scharrenbach, V. Steinbiss, H. Ney Present by Fang-Hui, Chu.
A Simple Approach for Author Profiling in MapReduce
2 Research Department, iFLYTEK Co. LTD.
Compact Query Term Selection Using Topically Related Text
Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,
Presentation transcript:

A DYNAMIC APPROACH TO THE SELECTION OF HIGH ORDER N-GRAMS IN PHONOTACTIC LANGUAGE RECOGNITION Mikel Penagarikano, Amparo Varona, Luis Javier Rodriguez- Fuentes, German Bordel ICASSP 2011 Min-Hsuan Lai Department of Computer Science & Information Engineering National Taiwan Normal University

2 Outline Introduction Baseline SVM-based phonotactic language recognizer Dynamic feature selection Experimental setup Results Conclusions

3 Introduction The performance of each phone recognizer can be increased significantly by computing the statistics from phone lattices instead of 1-best phone strings, since lattices provide richer and more robust information. Another way to increase system performance is the use of high-order n-gram counts, which are expected to contain more discriminant (more language-specific) information.

4 Introduction Due to computational bounds, most SVM-based phonotactic language recognition systems consider only low-order n-grams (up to n = 3), thus limiting the potential performance of this approach. The huge amount of n-grams for n ≥ 4 makes it computationally unfeasible even selecting the most frequent n-grams. In this paper, we propose a new n-gram selection algorithm that allows the use of high-order n-grams (for n = 4, 5, 6, 7) to improve the performance of a baseline system based on trigram SVMs.

5 Baseline SVM-based phonotactic language recognizer In this work a SVM-based phonotactic language recognizer is used as baseline system. And the NIST LRE2007 database is used for development and evaluation. An energy-based voice activity detector is applied in first place, which splits and removes long-duration non- speech segments from the signals.

6 Baseline SVM-based phonotactic language recognizer Temporal Patterns Neural Network (TRAPs/NN) phone decoders, developed by the Brno University of Technology (BUT) for Czech (CZ), Hungarian (HU) and Russian (RU), are applied to perform phone tokenization. BUT recognizers are used along with HTK to produce phone lattices. In the baseline system, phone lattices are modeled by means of SVM. SVM vectors consist of counts of phone n-grams (up to 3-grams).

7 Dynamic feature selection When high-order n-grams are considered, the number of n-grams grows exponentially, leading to huge computational costs and making the baseline SVM approach impracticable. To reduce the dimensionality of the SVM feature vector, feature selection can be applied, but an exhaustive search of the optimal feature set is computationally unfeasible.

8 Dynamic feature selection In this work, we propose a new feature selection method with the following characteristics: –Selection is performed in the target feature space, using an estimate of the feature frequency as criterion. –The algorithm works by periodically updating a ranked list of the most frequent units, so it doesn’t need to index all the possible n-grams but just a relatively small subset of them. –A single parameter is required: M, the total number of units (unigrams + bigrams n-grams). –The process involves accumulating counts until their sum is higher than K and updating the ranked list of units by retaining only those counts higher than a given threshold τ.

9 Dynamic feature selection In this work, we propose a new feature selection method with the following characteristics: –At each update, all the counts lower than τ are implicitly set to zero; this means that the selection process is suboptimal, since many counts are discarded. –The algorithm outputs the M leading items of the ranked list; note that K and τ must be tuned so that enough number of alive counts (at least, M) are kept at each update.

10 Experimental setup Train, development and evaluation datasets Train and development data were limited to those distributed by NIST to all 2007 LRE participants: –the Call-Friend Corpus –the OHSU Corpus provided by NIST for LRE05 –the development corpus provided by NIST for the 2007 LRE For development purposes, 10 conversations per language were randomly selected, the remaining conversations being used for training. Evaluation was carried out on the 2007 LRE evaluation corpus, specifically on the 30-second, closed-set condition (primary evaluation task).

11 Experimental setup Evaluation measures Systems will be compared in terms of Equal Error Rate (EER), which, along with DET curves, is the most common way of comparing the performance of language recognition systems. C LLR, an alternative performance measure used in NIST evaluations. ref : False Match Rate (FMR) False Non-Match Rate (FNMR)

12 Experimental setup Evaluation measures We internally consider C LLR as the most relevant performance indicator, for three reasons: –C LLR allows us to evaluate system performance globally by means of a single numerical value –C LLR does not depend on application costs; instead, it depends on the calibration of scores, an important feature of detection systems; –C LLR has higher statistical significance than EER

13 Results

14 Results Table 1 shows the EER and C LLR performance attained with SVM systems based on the selected features. Note that, due to local effects around the EER region, the EER shows some oscillations. On the other hand, the C LLR, which allows us to evaluate systems globally. In particular, for M = 30000, the average vector size was reduced from to 18888, still yielding 1.36% EER and C LLR = (a relative improvement of 8.5% and 4.6%, respectively, compared to the trigram SVM system).

15 Results

16 Results Finally, the proposed dynamic selection algorithm has been also applied for n = 5, 6, 7, using the two reference values of M. Note that best performance was obtaind for n = 5: % EER (CLLR = ) for M = and % EER (CLLR = ) for M = Moreover, performance does not degrade when increasing the n-gram order, as it was the case of other selection approaches in the literature.

17 Conclusions A dynamic feature selection method has been proposed which allows to perform phonotactic SVM-based language recognition with high-order n-grams. The best performance was obtained when selecting the most frequent units up to 5-grams, which yielded % EER(11.2% improvement with regard to using up to 3-grams). We are currently working on the evaluation of smarter selection criteria under this approach.