Presentation is loading. Please wait.

Presentation is loading. Please wait.

GW2003, Genoa April, 2003 GesRec3D: A real-time coded gesture-to-speech system with automatic segmentation and recognition thresholding using dissimilarity.

Similar presentations


Presentation on theme: "GW2003, Genoa April, 2003 GesRec3D: A real-time coded gesture-to-speech system with automatic segmentation and recognition thresholding using dissimilarity."— Presentation transcript:

1 GW2003, Genoa April, 2003 GesRec3D: A real-time coded gesture-to-speech system with automatic segmentation and recognition thresholding using dissimilarity measures Michael P. Craven, School of Engineering, University of Technology, Jamaica K. Mervyn Curtis, Department of Mathematics and Computer Science, University of the West Indies, Jamaica Work carried out at University of Nottingham, School of Electrical and Electronic Engineering in collaboration with Access to Communication and Technology, Regional Rehabilitation Centre, Oak Tree Lane Centre, Selly Oak, Birmingham. Funded by Action Research grant.

2 Motivations and Issues
Apply gesture recognition to severely disabled users (cerebral palsy, stroke) Augmentative and Alternative Communication (AAC) e.g. gesture-to-speech environmental control e.g. opening doors, operating appliances replace mouse buttons in PC applications Segment and recognise ‘crude’ gestures be less reliant on fine motor control maintain spatial and temporal differences filter out ‘spurious’ movements Control over recognition confidence robust acceptance/rejection strategy reduce confusion between gestures, but avoid excessive rejection may be safety critical Human factors user fatigue: incremental training, short overall training time understandability: for both disabled users and their helpers

3 GesRec3D gesture-to-speech system

4 GesRec3D: Summary Gesture->Text->Speech system
MS Windows application running on PC with Soundblaster card Polhemus Fastrak tracker (1 to 4 sensors, 20 samples/sec) Up to 30 user-defined gestures linked to a user-defined (or preset) table of words/phrases, spoken by TextAssist speech engine Minimising Fatigue On-line segmentation for fast training & recognition Only 5 examples of each gesture Incremental acquisition (or removal) of gesture examples Other features Speech and/or text to prompt user input Sensitive to differences in scale and duration but invariant to gesture start location User control over segmentation & rejection/confusion trade-off

5 Real-time on-line segmentation
Continuation condition FALSE Min. Duration condition FALSE Time-out condition FALSE Continuation condition FALSE RESET Starting condition TRUE GESTURE Continuation condition FALSE, Min. Duration condition TRUE END [start timer] Time-out condition TRUE Starting condition FALSE Continue condition TRUE Parameters 1. Starting speed 2. Continuation speed 3. Minimum duration 4. Time-out interval 5. Pause interval Add to training set, Pause Recognise

6 On-line segmentation video

7 Training - dissimilarity measure
Compare 2 segmented gestures Ga(x,y,z) and Gb(x,y,z), of lengths ma and mb Dissimilarity measure dab - accumulated ‘city block’ distance same length ) (m = ma = mb ) different lengths dynamic time-warping (non-linear optimal match) - slowest linearly interpolate shorter gesture and use 1) - faster pad shorter gesture with zeros and use 2) - fastest (ma > mb ) 2)

8 Training - rejection threshold
Train C gesture classes, n examples of each Calculate nCnC dissimilarity matrix e.g. 60x60 elements for n=5, C=12 (Note: scales with both n2 and C2) For each class, find worst match internal to class, dint (largest value) find best match external to class, dext (smallest value) calculate rejection threshold Default global rejection parameter K=1 (midpoint threshold) Decrease K for stricter rejection May also set bounds on dth

9 Recognition - algorithms
Best match between unknown gesture and any in training set is minimum distance dmin Single sensor: 1.Acquire gesture and compare with training set for best match dmin 2. Find gesture class corresponding to dmin 3. If dmin<dth select that class, otherwise reject gesture 4. Perform action linked to selected gesture class Multiple sensors: 1. Find class with dmin for each sensor 2. (optional: reject gesture if classes are different) 3. Find dth for each class for all sensors 4. Add the dmin 5. Add the dth 6. If dmin < dth select class corresponding to ‘primary’ sensor, otherwise reject gesture 7. Perform action linked to selected gesture class

10 Experiment 1- Shape gestures
Multiple sensors Fast Slow

11 Results - Shape gestures
Hit rates between 82-96% 100 further arbitrary gestures all rejected Spurious short gestures rejected by segmentation algorithm Fewer misses from confusion than rejection Fast training - 5 minutes to input 60 gestures (5 examples x 12 classes) Fast 60x60 dissimilarity matrix calculation (on Pentium 133MHz): Zero padded sec Linear Interpolation - 0.6sec DP sec

12 Dissimilarity data - one row

13 Experiment 2 - Greeting gestures
Figures in brackets demonstrate use of a stricter threshold to obtain lower confusion - global threshold K reduced by 10%

14 Dissimilarity data - multiple sensors

15

16 Research Directions Design alternative algorithms for multiple sensors e.g. incorporate arm model Use dissimilarity data to suggest ‘better’ gestures Further filter out ‘spurious’ movements e.g. tremor Design mobile tracking device with wireless sensors Improve user interface more intuitive control over recognition parameters esp. for helpers assess user motivation esp. for children investigate memorability of gestures


Download ppt "GW2003, Genoa April, 2003 GesRec3D: A real-time coded gesture-to-speech system with automatic segmentation and recognition thresholding using dissimilarity."

Similar presentations


Ads by Google