1 Dynamic Time Warping and Minimum Distance Paths for Speech Recognition Isolated word recognition: Task : Want to build an isolated ‘word’ recogniser.

Slides:



Advertisements
Similar presentations
Presented by Erin Palmer. Speech processing is widely used today Can you think of some examples? Phone dialog systems (bank, Amtrak) Computers dictation.
Advertisements

An Algorithm for Determining the Endpoints for Isolated Utterances L.R. Rabiner and M.R. Sambur The Bell System Technical Journal, Vol. 54, No. 2, Feb.
Dynamic Time Warping (DTW)
Learning Trajectory Patterns by Clustering: Comparative Evaluation Group D.
Traveling Salesperson Problem
Word Spotting DTW.
A System for Hybridizing Vocal Performance By Kim Hang Lau.
Verbs and Adverbs: Multidimensional Motion Interpolation Using Radial Basis Functions Presented by Sean Jellish Charles Rose Michael F. Cohen Bobby Bodenheimer.
1 Sensor Relocation in Mobile Sensor Networks Guiling Wang, Guohong Cao, Tom La Porta, and Wensheng Zhang Department of Computer Science & Engineering.
Determinants Bases, linear Indep., etc Gram-Schmidt Eigenvalue and Eigenvectors Misc
1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.
CS 326 A: Motion Planning Planning Exploration Strategies.
CS CS 175 – Week 2 Processing Point Clouds Registration.
An Algorithm for Determining the Endpoints for Isolated Utterances L.R. Rabiner and M.R. Sambur The Bell System Technical Journal, Vol. 54, No. 2, Feb.
ASR Evaluation Julia Hirschberg CS Outline Intrinsic Methods –Transcription Accuracy Word Error Rate Automatic methods, toolkits Limitations –Concept.
Dynamic Programming Optimization Problems Dynamic Programming Paradigm
1 Speech Parametrisation Compact encoding of information in speech Accentuates important info –Attempts to eliminate irrelevant information Accentuates.
Exact Indexing of Dynamic Time Warping
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Variable Penalty Dynamic Time Warping For Aligning Chromatography Data David Clifford Research Scientist June 2009.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
So far: Historical introduction Mathematical background (e.g., pattern classification, acoustics) Feature extraction for speech recognition (and some neural.
TINONS1 Nonlinear SP and Pattern recognition
Chapter 9 Superposition and Dynamic Programming 1 Chapter 9 Superposition and dynamic programming Most methods for comparing structures use some sorts.
CS910: Foundations of Data Analytics Graham Cormode Time Series Analysis.
Qualitative approximation to Dynamic Time Warping similarity between time series data Blaž Strle, Martin Možina, Ivan Bratko Faculty of Computer and Information.
1 TEMPLATE MATCHING  The Goal: Given a set of reference patterns known as TEMPLATES, find to which one an unknown pattern matches best. That is, each.
7-Speech Recognition Speech Recognition Concepts
Dynamic Time Warping Algorithm for Gene Expression Time Series
Implementing a Speech Recognition System on a GPU using CUDA
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Incorporating Dynamic Time Warping (DTW) in the SeqRec.m File Presented by: Clay McCreary, MSEE.
1 CS 552/652 Speech Recognition with Hidden Markov Models Winter 2011 Oregon Health & Science University Center for Spoken Language Understanding John-Paul.
Chapter 5: Speech Recognition An example of a speech recognition system Speech recognition techniques Ch5., v.5b1.
Chapter 9 DTW and VQ Algorithm  9.1 Basic idea of DTW  9.2 DTW algorithm  9.3 Basic idea of VQ  9.4 LBG algorithm  9.5 Improvement of VQ.
What is the determinant of What is the determinant of
Exact indexing of Dynamic Time Warping
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Stream Monitoring under the Time Warping Distance Yasushi Sakurai (NTT Cyber Space Labs) Christos Faloutsos (Carnegie Mellon Univ.) Masashi Yamamuro (NTT.
L3-Network Algorithms L3 – Network Algorithms NGEN06(TEK230) – Algorithms in Geographical Information Systems by: Irene Rangel, updated Nov by Abdulghani.
Dynamic Programming: Edit Distance
A * Search A* (pronounced "A star") is a best first, graph search algorithm that finds the least-cost path from a given initial node to one goal node out.
DTW for Speech Recognition J.-S. Roger Jang ( 張智星 ) MIR Lab ( 多媒體資訊檢索實驗室 ) CS, Tsing Hua Univ. ( 清華大學.
DYNAMIC TIME WARPING IN KEY WORD SPOTTING. OUTLINE KWS and role of DTW in it. Brief outline of DTW What is training and why is it needed? DTW training.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Numerical Analysis – Data Fitting Hanyang University Jong-Il Park.
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
Spanning Trees Dijkstra (Unit 10) SOL: DM.2 Classwork worksheet Homework (day 70) Worksheet Quiz next block.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
1 4.7 TIME ALIGNMENT AND NORMALIZATION Linear time normalization:
Automatic speech recognition What is the task? What are the main difficulties? How is it approached? How good is it? How much better could it be? 2/34.
Text Algorithms (6EAP) Time Warping and sound
핵심어 검출을 위한 단일 끝점 DTW 알고리즘 Yong-Sun Choi and Soo-Young Lee
Supervised Time Series Pattern Discovery through Local Importance
4.7 TIME ALIGNMENT AND NORMALIZATION
LECTURE 15: HMMS – EVALUATION AND DECODING
Majkowska University of California. Los Angeles
Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan
Digital Systems: Hardware Organization and Design
The Functional Space of an Activity Ashok Veeraraghavan , Rama Chellappa, Amit Roy-Chowdhury Avinash Ravichandran.
Isolated word, speaker independent speech recognition
LECTURE 14: HMMS – EVALUATION AND DECODING
4.7 TIME ALIGNMENT AND NORMALIZATION
Dynamic Time Warping and training methods
Connected Word Recognition
Measuring the Similarity of Rhythmic Patterns
Keyword Spotting Dynamic Time Warping
An Algorithm for Determining the Endpoints for Isolated Utterances
Auditory Morphing Weyni Clacken
Presentation transcript:

1 Dynamic Time Warping and Minimum Distance Paths for Speech Recognition Isolated word recognition: Task : Want to build an isolated ‘word’ recogniser e.g. voice dialling on mobile phones Method: 1.Record, parameterise and store vocabulary of reference words 2.Record test word to be recognised and parameterise 3.Measure distance between test word and each reference word 4.Choose reference word ‘closest’ to test word

2 Words are parameterised on a frame-by-frame basis Choose frame length, over which speech remains reasonably stationary Overlap frames e.g. 40ms frames, 10ms frame shift We want to compare frames of test and reference words i.e. calculate distances between them 40ms 20ms

3 Problem: Number of frames won’t always correspond Easy: Sum differences between corresponding frames Calculating Distances

4 Solution 1: Linear Time Warping Stretch shorter sound Problem? Some sounds stretch more than others

5 Solution 2: Dynamic Time Warping (DTW) Test Reference Using a dynamic alignment, make most similar frames correspond Find distances between two utterences using these corresponding frames

6 Digression: Dynamic Programming The shortest route from Dublin to Limerick goes through: –Kildare –Monasterevin –Portlaoise –Mountrath –Roscrea –Nenagh Now consider the shortest route from Dublin to Nenagh –What towns does the route go through?

7 Intercity Example

8

9 351 x 4 x 1 x 743 x 0 x 3 x 935 x 2 x 5 x 321 x 4 x 1 x 51 2 x 1 x Reference TestTest We can also find the path through the grid that minimizes total cost of path 3511 x 8 x 5 x 7410 x 4 x 7 x 93 4 x 9 x 322 x 5 x 4 x 511 x 3 x 4 x Compute minimum distances dist each point and place in mindist matrix: mindist(5,3) = min {1 + mindist(5,2), 1 + mindist(4,2), 1 + mindist(4,3)} TestTest Reference Place distance between frame r of Test and frame c of Reference in cell(r,c) of distance matrix

10 Examples so far are uni-dimensional Speech is multi-dimensional e.g. two dimensions, using points (4,3) and (5,2) x Distance equation for 2 dimensions: Distance equation for multi-dimensional:

11 Constraints Global –Endpoint detection –Path should be close to diagonal Local –Must always travel upwards or eastwards –No jumps –Slope weighting –Consecutive moves upwards/eastwards

12 Global Constraints

13 Local Constraints mindist(r,c) mindist(r,c-1) mindist(r-1,c)mindist(r-1,c-1) weights

14 Points to Note DTW really only suitable for small vocabularies and/or speaker dependent recognition Should normalise for reference length Can use multiple utterances and cluster them Poor performance if recording environment changes High computation cost

15 Evaluation Performance of designs only comparable by evaluation Use a test set For single word recognition we can simply quote % accuracy: In error analysis, it can be helpful to use a confusion matrix

16 Confusion Matrix references test tokens yesno yes242 no321