Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Dynamic Time Warping and Minimum Distance Paths for Speech Recognition Isolated word recognition: Task : Want to build an isolated ‘word’ recogniser.

Similar presentations


Presentation on theme: "1 Dynamic Time Warping and Minimum Distance Paths for Speech Recognition Isolated word recognition: Task : Want to build an isolated ‘word’ recogniser."— Presentation transcript:

1 1 Dynamic Time Warping and Minimum Distance Paths for Speech Recognition Isolated word recognition: Task : Want to build an isolated ‘word’ recogniser e.g. voice dialling on mobile phones Method: 1.Record, parameterise and store vocabulary of reference words 2.Record test word to be recognised and parameterise 3.Measure distance between test word and each reference word 4.Choose reference word ‘closest’ to test word

2 2 Words are parameterised on a frame-by-frame basis Choose frame length, over which speech remains reasonably stationary Overlap frames e.g. 40ms frames, 10ms frame shift We want to compare frames of test and reference words i.e. calculate distances between them 40ms 20ms

3 3 Problem: Number of frames won’t always correspond Easy: Sum differences between corresponding frames Calculating Distances

4 4 Solution 1: Linear Time Warping Stretch shorter sound Problem? Some sounds stretch more than others

5 5 Solution 2: Dynamic Time Warping (DTW) 5 3 9 7 3 4 7 4 Test Reference Using a dynamic alignment, make most similar frames correspond Find distances between two utterences using these corresponding frames

6 6 Digression: Dynamic Programming The shortest route from Dublin to Limerick goes through: –Kildare –Monasterevin –Portlaoise –Mountrath –Roscrea –Nenagh Now consider the shortest route from Dublin to Nenagh –What towns does the route go through?

7 7 Intercity Example

8 8

9 9 351 x 4 x 1 x 743 x 0 x 3 x 935 x 2 x 5 x 321 x 4 x 1 x 51 2 x 1 x 123 474 Reference TestTest We can also find the path through the grid that minimizes total cost of path 3511 x 8 x 5 x 7410 x 4 x 7 x 93 4 x 9 x 322 x 5 x 4 x 511 x 3 x 4 x 123 474 Compute minimum distances dist each point and place in mindist matrix: mindist(5,3) = min {1 + mindist(5,2), 1 + mindist(4,2), 1 + mindist(4,3)} TestTest Reference Place distance between frame r of Test and frame c of Reference in cell(r,c) of distance matrix

10 10 Examples so far are uni-dimensional Speech is multi-dimensional e.g. two dimensions, using points (4,3) and (5,2) 4 5 1 2 3 4 5 5432154321 x Distance equation for 2 dimensions: Distance equation for multi-dimensional:

11 11 Constraints Global –Endpoint detection –Path should be close to diagonal Local –Must always travel upwards or eastwards –No jumps –Slope weighting –Consecutive moves upwards/eastwards

12 12 Global Constraints

13 13 Local Constraints mindist(r,c) mindist(r,c-1) mindist(r-1,c)mindist(r-1,c-1) 1 1 2 weights

14 14 Points to Note DTW really only suitable for small vocabularies and/or speaker dependent recognition Should normalise for reference length Can use multiple utterances and cluster them Poor performance if recording environment changes High computation cost

15 15 Evaluation Performance of designs only comparable by evaluation Use a test set For single word recognition we can simply quote % accuracy: In error analysis, it can be helpful to use a confusion matrix

16 16 Confusion Matrix references test tokens yesno yes242 no321


Download ppt "1 Dynamic Time Warping and Minimum Distance Paths for Speech Recognition Isolated word recognition: Task : Want to build an isolated ‘word’ recogniser."

Similar presentations


Ads by Google