1 Dynamic Match Lattice Spotting Spoken Term Detection Evaluation Queensland University of Technology Roy Wallace, Robbie Vogt, Kishan Thambiratnam, Prof.

Slides:



Advertisements
Similar presentations
A Human-Centered Computing Framework to Enable Personalized News Video Recommendation (Oh Jun-hyuk)
Advertisements

Acoustic Model Adaptation Based On Pronunciation Variability Analysis For Non-Native Speech Recognition Yoo Rhee Oh, Jae Sam Yoon, and Hong Kook Kim Dept.
15.0 Utterance Verification and Keyword/Key Phrase Spotting References: 1. “Speech Recognition and Utterance Verification Based on a Generalized Confidence.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Acoustic / Lexical Model Derk Geene. Speech recognition  P(words|signal)= P(signal|words) P(words) / P(signal)  P(signal|words): Acoustic model  P(words):
Building an Intelligent Web: Theory and Practice Pawan Lingras Saint Mary’s University Rajendra Akerkar American University of Armenia and SIBER, India.
ITCS 6010 Spoken Language Systems: Architecture. Elements of a Spoken Language System Endpointing Feature extraction Recognition Natural language understanding.
Minimum Classification Error Networks Based on book chapter 9, by Shigeru Katagiri Jaakko Peltonen, 28 th February, 2002.
Metamorphic Malware Research
Modern Information Retrieval Chapter 4 Query Languages.
A Brief Survey on Face Recognition Systems Amir Omidvarnia March 2007.
DIVINES – Speech Rec. and Intrinsic Variation W.S.May 20, 2006 Richard Rose DIVINES SRIV Workshop The Influence of Word Detection Variability on IR Performance.
IBM Haifa Research Lab © 2008 IBM Corporation Retrieving Spoken Information by Combining Multiple Speech Transcription Methods Jonathan Mamou Joint work.
Introduction to Automatic Speech Recognition
Hubert CARDOTJY- RAMELRashid-Jalal QURESHI Université François Rabelais de Tours, Laboratoire d'Informatique 64, Avenue Jean Portalis, TOURS – France.
CC 2007, 2011 attrbution - R.B. Allen Text and Text Processing.
Rapid and Accurate Spoken Term Detection Owen Kimball BBN Technologies 15 December 2006.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
MINING RELATED QUERIES FROM SEARCH ENGINE QUERY LOGS Xiaodong Shi and Christopher C. Yang Definitions: Query Record: A query record represents the submission.
Utterance Verification for Spontaneous Mandarin Speech Keyword Spotting Liu Xin, BinXi Wang Presenter: Kai-Wun Shih No.306, P.O. Box 1001,ZhengZhou,450002,
7-Speech Recognition Speech Recognition Concepts
Supervised by Prof. LYU, Rung Tsong Michael Department of Computer Science & Engineering The Chinese University of Hong Kong Prepared by: Chan Pik Wah,
Master Thesis Defense Jan Fiedler 04/17/98
1 BILC SEMINAR 2009 Speech Recognition: Is It for Real? Tony Mirabito Defense Language Institute English Language Center (DLIELC) DLIELC.
Avoiding Segmentation in Multi-digit Numeral String Recognition by Combining Single and Two-digit Classifiers Trained without Negative Examples Dan Ciresan.
Temple University QUALITY ASSESSMENT OF SEARCH TERMS IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone, PhD Department of Electrical and Computer.
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
Kishore Prahallad IIIT-Hyderabad 1 Unit Selection Synthesis in Indian Languages (Workshop Talk at IIT Kharagpur, Mar 4-5, 2009)
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
Incorporating Dynamic Time Warping (DTW) in the SeqRec.m File Presented by: Clay McCreary, MSEE.
Department of Computer Science, Graduate School of Information Science & Technology, Osaka University Inoue Laboratory Eunjong Choi 1 Investigating Clone.
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
A Phonetic Search Approach to the 2006 NIST Spoken Term Detection Evaluation Roy Wallace, Robbie Vogt and Sridha Sridharan Speech and Audio Research Laboratory,
Survey of Approaches to Information Retrieval of Speech Message Kenney Ng Spoken Language Systems Group Laboratory for Computer Science Massachusetts Institute.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Audient: An Acoustic Search Engine By Ted Leath Supervisor: Prof. Paul Mc Kevitt School of Computing and Intelligent Systems Faculty of Engineering University.
AMSP : Advanced Methods for Speech Processing An expression of Interest to set up a Network of Excellence in FP6 Prepared by members of COST-277 and colleagues.
March 31, 1998NSF IDM 98, Group F1 Group F Multi-modal Issues, Systems and Applications.
Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.
Experimentation Duration is the most significant feature with around 40% correlation. Experimentation Duration is the most significant feature with around.
Dynamic Tuning Of Language Model Score In Speech Recognition Using A Confidence Measure Sherif Abdou, Michael Scordilis Department of Electrical and Computer.
Behrooz ChitsazLorrie Apple Johnson Microsoft ResearchU.S. Department of Energy.
Improving Search for Emerging Applications * Some techniques current being licensed to Bimaple Chen Li UC Irvine.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
1 ICASSP Paper Survey Presenter: Chen Yi-Ting. 2 Improved Spoken Document Retrieval With Dynamic Key Term Lexicon and Probabilistic Latent Semantic Analysis.
Statistical techniques for video analysis and searching chapter Anton Korotygin.
Confidence Measures As a Search Guide In Speech Recognition Sherif Abdou, Michael Scordilis Department of Electrical and Computer Engineering, University.
STD Approach Two general approaches: word-based and phonetics-based Goal is to rapidly detect the presence of a term in a large audio corpus of heterogeneous.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
Author :K. Thambiratnam and S. Sridharan DYNAMIC MATCH PHONE-LATTICE SEARCHES FOR VERY FAST AND ACCURATE UNRESTRICTED VOCABULARY KEYWORD SPOTTING Reporter.
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
ASSESSING SEARCH TERM STRENGTH IN SPOKEN TERM DETECTION Amir Harati and Joseph Picone Institute for Signal and Information Processing, Temple University.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.
Spell checking. Spelling Correction and Edit Distance Non-word error detection: – detecting “graffe” “ سوژن ”, “ مصواک ”, “ مداا ” Non-word error correction:
1 Minimum Bayes-risk Methods in Automatic Speech Recognition Vaibhava Geol And William Byrne IBM ; Johns Hopkins University 2003 by CRC Press LLC 2005/4/26.
A NONPARAMETRIC BAYESIAN APPROACH FOR
Applying Deep Neural Network to Enhance EMPI Searching
핵심어 검출을 위한 단일 끝점 DTW 알고리즘 Yong-Sun Choi and Soo-Young Lee
College of Engineering
3.0 Map of Subject Areas.
Do-Gil Lee1*, Ilhwan Kim1 and Seok Kee Lee2
Mohamed Kamel Omar and Lidia Mangu ICASSP 2007
EEG Recognition Using The Kaldi Speech Recognition Toolkit
EE513 Audio Signals and Systems
Research on the Modeling of Chinese Continuous Speech Recognition
15-826: Multimedia Databases and Data Mining
2017 APSIPA A Study on Landmark Detection Based on CTC and Its Application to Pronunciation Error Detection Chuanying Niu1, Jinsong Zhang1, Xuesong Yang2.
The Application of Hidden Markov Models in Speech Recognition
Presentation transcript:

1 Dynamic Match Lattice Spotting Spoken Term Detection Evaluation Queensland University of Technology Roy Wallace, Robbie Vogt, Kishan Thambiratnam, Prof Sridha Sridharan Presented by Roy Wallace

2 Overview Phonetic-based index  open-vocabulary Based on lattice-spotting technique Two-tier database Dynamic-match rules Algorithmic optimisations NOTE: Patented technology

3 Concept greasy ? Phone decomposition …………… aenxmdow nxrnayth iysaxrg griys

4 Concept Target sequence: Observed sequences: Costs graxsih thaynrnx owdmnxae …………… Dynamic matching axih griys

5 Indexing Feature Extraction Segmentation Speech Recognition Sequence Generation Lattices Sequence DB Hyper- Sequence Generation Hyper- Sequence DB Audio

6 Hyper-sequence Mapping Map individual phones to “parent” classes –We use Vowels, Fricatives, Glides, Stops and Nasals Simple example –Parent classes: Vowels, Consonants –Map each phone to parent class to create hyper-sequence Sequence DB Hyper- Sequence DB

7 Hyper-sequence Mapping Hyper-sequence DB Search term: Hyper-sequence: groysih tlowpiy nxsehray draxbae bfaxdaa oybraaf ehgriym …………… Sequence DB CCVCV VCCVC CVCVC …………… …………… …………… griys CCVCV

8 Searching Term Sequence DB Hyper- Sequence DB Results Dynamic Matching Keyword Verification Hyper- mapping Phone decomp. Split long terms Merge long terms

9 Dynamic Matching Minimum Edit Distance (MED) i.e. Levenshtein Distance Insertions, deletions, substitutions Finds minimum cost of transformation

10 Dynamic Matching Substitution costs –Derived from phone confusion statistics

11 Optimisations Prefix sequence optimisation Early stopping optimisation Linearised MED search approximation

12 Long Term Merging olympic sites owlihmp ksayts owlihmp kp ksayts Search Merge Results

13 Keyword Verification Acoustic –Use acoustic score from lattice to boost occurrences with high confidence Neural Network –Produce a confidence score by fusing MED score and Acoustic score Term phone length Term phone classes

14 Results Source Type DevSet phone error rate Primary system Contrastive systems No Acous.LTS Only Bnews24% CTS45% Confmtg56% Index size558 MB/Sh (297 MB/Sh for No Acous.) Index speed18x real-time Search speed3 hr searched / CPU-sec Maximum Term-Weighted Value on EvalSet terms

15 Conclusion Open-vocabulary and phone-based Patented technology utilises –sequence and hyper-sequence databases –optimisations for rapid searches Advantages –Other languages –Economy of scale

16 Conclusion Limitations –Indexing speed and size –Need to split long sequences Future work –Keyword Verification Word-level information (e.g. LVCSR) Acoustic features (e.g. prosody) –Indexing/searching frameworks –Spoken Document Retrieval and other semantic applications

17 References 1.A. J. K. Thambiratnam, “Acoustic keyword spotting in speech with applications to data mining”, Ph.D. dissertation, Queensland University of Technology, Qld, March K. Thambiratnam and S. Sridharan, “Rapid Yet Accurate Speech Indexing Using Dynamic Match Lattice Spotting”, IEEE Transactions on Audio, Speech and Language Processing : Accepted for future publication 3.CMU Speech group (1998). The Carnegie Mellon Pronouncing Dictionary. [Online]. Available: 4.S. J. Young, P.C. Woodland, W.J. Byrne (2002). “HTK: Hidden Markov Model Toolkit V3.2”, Cambridge University Engineering Department, Speech Group and Entropic Research Laboratories Inc. 5.V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals”, Soviet Physics Doklady, 10(8), 1966, pp