Using Motherese in Speech Recognition EE516 final project Steven Schimmel March 13, 2003.

Using Motherese in Speech Recognition EE516 final project Steven Schimmel March 13, 2003

What is Motherese? The way mothers talk to their children when they are young example:

Why use Motherese in Speech Recognition? The exaggerated motherese speech helps infants to distinguish better between phonetic categories examples:

Why use Motherese in Speech Recognition?  Mothers provide a great variety in word pronunciation, simulating many different talkers  If an infant can benefit from this, can an ASR benefit too?

Presentation Outline  Data preparation  Building the recognizers  Training and testing  Performance  Distance measure  Formant analysis  Conclusions

Data preparation  Conversations between mothers and adults, and between mothers and infants  Keyword transcription  Keyword extraction  Screening  Dividing over training and test sets

Keywords BeadKeySheep PotSockTop BootShoeSpoon 

Dividing data into equal sets mother‘bead’‘boot’‘sock’ A                          B                                          set‘bead’‘boot’‘sock’ 1        2        3        4       

Building the recognizers  HTK toolkit  Isolated word recognizers  Speech coding: MFCC, filterbank with 26 channels, 13 coefficients every 10ms  HMM prototype: left-to-right, 8 states, single Gaussians

Training and testing  Flat-start initialization: use global mean and variance of training data for all models, assign equal probability to all states  Repeated embedded training: use all training data to simultaneously update all models

Performance  Infant-directed recognizer worse than adult-directed recognizer on adult-directed speech  But ID recognizer better on AD material than AD recognizer on ID material AD train AD test ID train ID test AD98.7594.5876.2585.42 ID88.7590.0096.6783.33

Distance measure  Distance between HMMs λ 0 and λ 1  O T is an observed sequence of length T of feature vectors generated from λ 0  Ergodicity

α-recursion to compute P(O T |λ)  Joint probability b j (o t ) causes underflow  Log probability:

Log of summation  Recursively apply  Make sure that the assumption b>a holds at every step of the recursion

Distances

Formant analysis  Extract vowels from keywords using phone-based speech recognizer  Estimate frequencies of formant 1 and formant 2

Formant 1 vs. formant 2

Conclusions  In general not worthwhile to train an ASR system on motherese speech  More false positives due to reduced selectivity of HMMs  Motherese not required for coverage of all speech sounds; use multiple speakers  Acquiring natural motherese speech is more difficult due to infants’ presence

Using Motherese in Speech Recognition EE516 final project Steven Schimmel March 13, 2003.

Similar presentations

Presentation on theme: "Using Motherese in Speech Recognition EE516 final project Steven Schimmel March 13, 2003."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Using Motherese in Speech Recognition EE516 final project Steven Schimmel March 13, 2003.

Similar presentations

Presentation on theme: "Using Motherese in Speech Recognition EE516 final project Steven Schimmel March 13, 2003."— Presentation transcript:

Similar presentations

About project

Feedback