Presentation is loading. Please wait.

Presentation is loading. Please wait.

Using Motherese in Speech Recognition EE516 final project Steven Schimmel March 13, 2003.

Similar presentations


Presentation on theme: "Using Motherese in Speech Recognition EE516 final project Steven Schimmel March 13, 2003."— Presentation transcript:

1 Using Motherese in Speech Recognition EE516 final project Steven Schimmel March 13, 2003

2 What is Motherese? The way mothers talk to their children when they are young example:

3 Why use Motherese in Speech Recognition? The exaggerated motherese speech helps infants to distinguish better between phonetic categories examples:

4 Why use Motherese in Speech Recognition?  Mothers provide a great variety in word pronunciation, simulating many different talkers  If an infant can benefit from this, can an ASR benefit too?

5 Presentation Outline  Data preparation  Building the recognizers  Training and testing  Performance  Distance measure  Formant analysis  Conclusions

6 Data preparation  Conversations between mothers and adults, and between mothers and infants  Keyword transcription  Keyword extraction  Screening  Dividing over training and test sets

7 Keywords BeadKeySheep PotSockTop BootShoeSpoon 

8 Dividing data into equal sets mother‘bead’‘boot’‘sock’ A                          B                                          set‘bead’‘boot’‘sock’ 1        2        3        4       

9 Building the recognizers  HTK toolkit  Isolated word recognizers  Speech coding: MFCC, filterbank with 26 channels, 13 coefficients every 10ms  HMM prototype: left-to-right, 8 states, single Gaussians

10 Training and testing  Flat-start initialization: use global mean and variance of training data for all models, assign equal probability to all states  Repeated embedded training: use all training data to simultaneously update all models

11 Performance  Infant-directed recognizer worse than adult-directed recognizer on adult-directed speech  But ID recognizer better on AD material than AD recognizer on ID material AD train AD test ID train ID test AD98.7594.5876.2585.42 ID88.7590.0096.6783.33

12 Distance measure  Distance between HMMs λ 0 and λ 1  O T is an observed sequence of length T of feature vectors generated from λ 0  Ergodicity

13 α-recursion to compute P(O T |λ)  Joint probability b j (o t ) causes underflow  Log probability:

14 Log of summation  Recursively apply  Make sure that the assumption b>a holds at every step of the recursion

15 Distances

16 Formant analysis  Extract vowels from keywords using phone-based speech recognizer  Estimate frequencies of formant 1 and formant 2

17 Formant 1 vs. formant 2

18 Conclusions  In general not worthwhile to train an ASR system on motherese speech  More false positives due to reduced selectivity of HMMs  Motherese not required for coverage of all speech sounds; use multiple speakers  Acquiring natural motherese speech is more difficult due to infants’ presence


Download ppt "Using Motherese in Speech Recognition EE516 final project Steven Schimmel March 13, 2003."

Similar presentations


Ads by Google