Keyword Spotting Dynamic Time Warping

Slides:



Advertisements
Similar presentations
Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
Advertisements

Masters Presentation at Griffith University Master of Computer and Information Engineering Magnus Nilsson
Frederico Rodrigues and Isabel Trancoso INESC/IST, 2000 Robust Recognition of Digits and Natural Numbers.
Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)
1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
4/25/2001ECE566 Philip Felber1 Speech Recognition A report of an Isolated Word experiment. By Philip Felber Illinois Institute of Technology April 25,
Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-based Interactive Toy Jacky CHAU Department of Computer Science and Engineering.
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
Real-Time Speech Recognition Thang Pham Advisor: Shane Cotter.
Why is ASR Hard? Natural speech is continuous
A PRESENTATION BY SHAMALEE DESHPANDE
Representing Acoustic Information
Introduction to Automatic Speech Recognition
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
7-Speech Recognition Speech Recognition Concepts
Modeling speech signals and recognizing a speaker.
Comparing Audio Signals Phase misalignment Deeper peaks and valleys Pitch misalignment Energy misalignment Embedded noise Length of vowels Phoneme variance.
Implementing a Speech Recognition System on a GPU using CUDA
Jacob Zurasky ECE5526 – Spring 2011
Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.
Supervisor: Dr. Eddie Jones Co-supervisor: Dr Martin Glavin Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification.
Math 5 Professor Barnett Timothy G. McManus Anthony P. Pastoors.
Basics of Neural Networks Neural Network Topologies.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Chapter 5: Speech Recognition An example of a speech recognition system Speech recognition techniques Ch5., v.5b1.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Speech Recognition Feature Extraction. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project DKE Maastricht University.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
Speech controlled keyboard Instructor: Dr. John G. Harris TA: M. Skowronski Andréa Matsunaga Maurício O. Tsugawa ©2002,
Performance Comparison of Speaker and Emotion Recognition
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
DYNAMIC TIME WARPING IN KEY WORD SPOTTING. OUTLINE KWS and role of DTW in it. Brief outline of DTW What is training and why is it needed? DTW training.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
Speaker Verification System Middle Term Presentation Performed by: Barak Benita & Daniel Adler Instructor: Erez Sabag.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
ADAPTIVE BABY MONITORING SYSTEM Team 56 Michael Qiu, Luis Ramirez, Yueyang Lin ECE 445 Senior Design May 3, 2016.
Sound Controlled Smoke Detector Group 67 Meng Gao, Yihao Zhang, Xinrui Zhu 1.
Speech Processing Dr. Veton Këpuska, FIT Jacob Zurasky, FIT.
Recognition of bumblebee species by their buzzing sound
PATTERN COMPARISON TECHNIQUES
ARTIFICIAL NEURAL NETWORKS
Speech Processing AEGIS RET All-Hands Meeting
핵심어 검출을 위한 단일 끝점 DTW 알고리즘 Yong-Sun Choi and Soo-Young Lee
Artificial Intelligence for Speech Recognition
A presentation on Basics of Speech Recognition Systems
Supervised Time Series Pattern Discovery through Local Importance
Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan
Leigh Anne Clevenger Pace University, DPS ’16
Assistive System Progress Report 1
Isolated word, speaker independent speech recognition
Neuro-Fuzzy and Soft Computing for Speaker Recognition (語者辨識)
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
A maximum likelihood estimation and training on the fly approach
Speech Processing Final Project
Presenter: Shih-Hsiang(士翔)
Auditory Morphing Weyni Clacken
ROBOT CONTROL WITH VOICE
Presentation transcript:

Keyword Spotting Dynamic Time Warping Ali Akbar Jabini Alexandre Mercier-Dalphond Spring 2006

Introduction Speech recognition: Computer can interpret speech Need input to digitalize sounds Microphone People can speak faster than type Commercial systems available since 1990s People prefer Physical interactions Keyboard/Mouse, On/Off switch Low Accuracy for large vocabulary with noise (50%)

Introduction Speech recognition is more and more used for smaller vocabulary banks Credit Card Systems Simple switching commands Directory assistance Cheap to implement High Accuracy Can verify their interpretation Idea: speech recognition for household appliances

OUTLINE Area of investigation Concrete task/Goal Schematic Feature extraction DTW Training Evaluation metrics Conclusion

Area of Investigation Keyword Spotting: Subfield of speech recognition Grammar constrained Keyword Spotting in isolated word recognition Keywords utterances Keyword separated by silence Main technique is DTW

Concrete task/Goal Goal: develop a robust speaker independent keyword spotting scheme to operate household appliances Concrete tasks Digitalize the sound inputs Implementation in MatLab Train the model with the grammar Analyze the performances of our scheme

Schematic Microphone A/D Feature extraction DTW Output Grammar

Feature extraction Pre-emphasis Blocking into frames Windowing Flattening the spectrum of the signal Blocking into frames Length of the Fourier Transform Windowing Sample window (maybe Hamming) Mel frequency Cepstral coefficients More reliable than LPC coefficients This will be imputed in the DTW algorithm

DTW Idea: smallest distance between an input and the training bank Cepstrum features Dynamic programming: the time axis his not linear to account for utterances t0 -> t0+5 t1 -> t1-2

DTW

DTW

Training Need to create our own grammar Use this data with DTW On: Onnn, Honnn, open, opeeenn Off: Hooofff, Hoff, offfff, close As many potential utterances as possible Use this data with DTW

Evaluation metrics Accuracy High noise Low noise Independent speaker Training data speaker Would like to obtain 80% or more

Conclusion Early stage No code implemented yet Many challenges a head Our methodology may change slightly There is a big potential market for such technique -> influence on every day life.