Presentation is loading. Please wait.

Presentation is loading. Please wait.

Speech Recognition and Assessment Tomer Meshorer.

Similar presentations


Presentation on theme: "Speech Recognition and Assessment Tomer Meshorer."— Presentation transcript:

1 Speech Recognition and Assessment Tomer Meshorer

2 Agenda This presentation describes the use of speech recognition for: HCI for spastic dysarthria patients [M. Hasegawa-Johnson] Identify progression of Parkinson disease using speech signal[A. Tsanas] Auditory micro switch [G. E. Lancioni] HMM-Based and SVM-Based Recognition of the Speech of Talkers with Spastic Dysarthria Enhanced Classical Dysphonia Measures and Sparse Regression for Telemonitoring of Parkinson's Disease Progression Extending the evaluation of a computer system used as a microswitch for word utterances of persons with multiple disabilities

3

4 MOTIVATION  Dysarthria.  Most common spastic dysarthria. Adults with cerebral palsy which find it hard to type.  Idea : Replace keyboard with ASR.  The paper study three talkers and one control subject. All three have spastic dysarthia due to cereblal palacy.  The subject tends to delete word initial consonants. One subject exhibit slow stutter.  Two algorithms:  Digit recognition using HMM  Digit recognition using SVM.

5 Experiment  Array of 8 mics, 7 were used.  Four types of speech data:  Isolated digits  The letters in the internationl radio alphabet  Nineteen computer command  Read balanced text message (129 words) and 56 sentences(TIMIT)  Total train data: 541 words. 395 distinct words  Performed Intelligibility tests using 40 different words selected from TIMIT sentences.  Listeners are the author and two students

6 Results ListenerF01M01M02M03 L122.5% 90%30% L217.5%20%90%27.5% L317.5%15%97.5%30% Avg19.2% 92.5%29.2%

7 Listener errors  Look at consonant position  Three consonant positions: word-initial, word medial and word final  Three types of consonant errors:  deletion (“sport” heard as “port”)  Insertion(“on” heard as “coin”)  Substitution(“for” heard as “bore”)  Other errors:  Vowel Substitution (“and” heard as “end”)  Number of syllable could change  The entire word can be deleted

8 Listener errors analysis

9 ASR  Four experiments : two speaker depended HMM and two speaker depended SVM  HMM:  First test:  Test data : 19 command words + 26 letters + 10 digits.  Train data : TIMIT sentences + grandfather passage + utterance for each digit  Second Test:  Test data: only digit  Train data: like test 1.

10 HMM ASR Results  H – WRA if all micro-phone are independently recognized  HV- WRA if micro-phone vote to determine the final system output  Word - reports accuracy of one SVM trained to distinguish isolated digits  WF - adds outputs of 170 binary word-feature SVM  WFV - LikeWF, but single-microphone recognizers  vote to determine system output

11 SVM based ASR  Fixed length isolated word recognitions  Tested only digits  Two SVM were used: 10-ary SVM and binary feature SVM.

12 Conclusion  ASR can be used to recognize digits for talker with very low intelligibility.  HMM was successful for two subjects but failed for the subject that delete consant.  SVM was successful for two subject, but fail for the subject with stutter.  Hence, HMM should be used when word length flucte. SVM should be used against deletion of consonants.  But : 10 word vocabulary is two small for HCI.

13

14 MOTIVATION  Parkinson’s Disease (PD) is the second most common neurodegenerative disorder after Alzheimer’s  Strong evidence has emerged linking speech degradation with PD progression.  Current PD progress monitoring is achieved using empirical tests and physical exam which is time consuming and costly.  Results are mapped to Unified Parkinson’s Disease Rating Scale.  Motor-UPDRS 0-108  Total -UPDRS 0-176 – 176 denoting total disability  Goal: Use speech signal processing to map voice disorder to UPDRS scores

15 Data  sustained vowel speech recordings from 52 subjects with idiopathic PD diagnosis  Subject were physically assessed and given UPDRS scores at baseline, three months and six-months into the trial  Subjects took tests at home weekly Intel At Home Testing Device (AHTD).  The subject were required to sustain “ahh” for as long and steady as possible.  Total of 5875 signals. Signal were procecced in matlab.  42 subject. Mean age:64. motor UPDRS: 20.84, Total UPDRS: 11.52

16 Intel @ home

17 Features  Total of 5875 signals. Signal were processed in matlab.  42 subject. Mean age:64. motor UPDRS: 20.84, Total UPDRS: 11.52  Dysphonia measures were calculated using praat  frequency perturbations  Amplitude Perturbations  Also added log of each measure

18 Linear regression  UPDRS values obtained at 0,3 and 6 months but recording were weekly. Hence used linear interpolation to get weekly UPDRS  Map the feature vector x to UPDRS output y  But ended up using Lasso regression.

19 Results  Mapping performance was analyzed by training on 5287 phonations, and testing 588.  Used MAE – mean absolute error.  Ui is the real value of UPDRS  U hat, is the predicted value UPDRS

20 Conclusion  Overall success in prediction (6.6 error for motor UPDRS and (8.4 error for total UPDRS)  Discovered during the paper that a better method exist to measure dysphonia. And hence no need for log transformation.  LASSO regression here clearly shows that log transformed classical dysphonia measures convey superior clinical information compared to the raw measures

21 G. E. Lancioni,at el

22 Motivation  Students with multiple disabilities unable to engage in constructive activity or play a positive role in their daily context  Want to explore the usage of verbal utterances to exert control over environmental events  Microswitches are technical tools that may help them improve their status  Main idea: Build an utterance based Microswitch and test it with students.

23 Participants  Tania, Alex, and Dennis.  18,27,26 years old  Severe intellectual ability  Alex and Dennis totally blind, while Tania can discriminate light.  All of them have normal hearing and can produce number of words / short sentences

24 Device : Auditory Micro-switch  Regular PC with audio output device  Commercial available ASR (Dragon natural speaking).  Proprietary control program that allowed the linking of each target utterance emitted by a participant with the words and phrases that the commercial software matched to it over different occurrences  Categorize the word and phrases emitted to specific categories based on phonetic, stracture and rule length.  The categories served as recognition target and trigger for activation of stimuli.

25 Selection of stimulus  Stimulus is connected to participant target utterance.  Tania : funny story, special song  Alex: singer hit song, person whistling  Dennis: Pet voice, local music  The recognition of an utterance by the computer system produced the stimuli matching that utterance for 10-20 secs

26 Experiment  Baseline  Participant speak the sample of their target utterances. No stimuli sound.  70 times over several days  Recording were made of the word / phrases and reference categories were build  Intervention  Three groups of utterances : Tania(3,2,2 words) Alex and Dennis (4,4,4).  First group, Base line. Second group, base line, Third group base line.  10-20 min sessions. Record recognition.  Post intervention  2 months after intervention.  19-22 sessions such as those occurring during intervention

27 Result

28 Summary  About 80% of the utterances were correctly recognized by the computer system  Some of the utterances had a level of occurrence significantly higher (P < 0.01) than that expected by chance.  Computer system was an adequate microswitch for the participants’ word utterances  the use of the system can be considered a valuable strategy to increase the participants’ constructive verbal engagement and to allow their self- determination in seeking positive environmental stimulation


Download ppt "Speech Recognition and Assessment Tomer Meshorer."

Similar presentations


Ads by Google