Presentation is loading. Please wait.

Presentation is loading. Please wait.

Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.

Similar presentations


Presentation on theme: "Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and."— Presentation transcript:

1 Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering University of Florida October 6, 2004

2 What is ASR? Automatic Speech Recognition is: –A system that converts a raw acoustic signal into phonetically meaningful text. –A combination of engineering, linguistics, statistics, psychoacoustics, and computer science.

3 “seven” Psychoacousticians provide expert knowledge about human acoustic perception. Engineers provide efficient algorithms and hardware. Linguists provide language rules. What is ASR? Feature extractionClassificationLanguage model Computer scientists and statisticians provide optimum modeling.

4 Feature extraction Acoustic-phonetic paradigm (pre 1980): –Holistic features (voicing and frication measures, durations, formants and BW) –Difficult to construct robust classifiers Frame-based paradigm (1980 to today): –Short (20 ms) sliding analysis window, assumes speech frame is quasi-stationarity –Relies on classifier to account for speech nonstationarity –Allows for the inclusion of expert information of speech perception

5 Feature extraction algorithms Cepstrum (1962) Linear prediction (1967) Mel frequency cepstral coefficients (Davis & Mermelstein, 1980) Perceptual linear prediction (Hermansky,1990) Human factor cepstral coefficients (Skowronski & Harris, 2002)

6 “seven” Cepstral domain DCT Log energy Mel-scaled filter bank Fourier x(t) Time Filter # MFCC algorithm

7 Classification Operates on frame-based features Accounts for time variations of speech Uses training data to transform features into symbols (phonemes, bi-/tri-phones, words) Non-parametric: Dynamic time warp (DTW) –No parameters to estimate –Computationally expensive, scaling issues Parametric: Hidden Markov model (HMM) –State-of-the-art model, complements features –Data-intensive, scales well

8 HMM classification A Hidden Markov Model is a piecewise stationary model of a nonstationary signal. Model characteristics states: represent domains of piecewise stationarity interstate connections: defines model architecture parameters: pdf means & covariance

9 HMM diagram Time domain State space Feature space

10 Symbol# ModelsPositiveNegative Word <1000CoarticulationScaling Phoneme40pdf estimationCoarticulation Biphone1400 Triphone40KCoarticulationpdf estimation TRADEOFF HMM output symbols

11 Language models Considers multiple output symbol hypotheses Delays making hard decision on classifier output Uses language-based expert knowledge to predict meaningful words/phrases from classifier output N-phones/word symbols Major research topic since early 1990s with advent of large speech corpora

12 ASR Problems Test/Train mismatch Speaker variations (gender, accent, mood) Weak model assumptions Noise: energetic or informational (babble) Current state-of-the-art does not model the human brain nor function with the accuracy or reliability of humans Most progress of late comes from faster computers, not new ideas

13 Conclusions Automatic speech recognition technology emerges from several diverse disciplines –Acousticians describe how speech is produced and perceived by humans –Computer scientists create machine learning models for signal-to-symbol conversion –Linguists provide language information –Engineers optimize the algorithms and provide the hardware, and put the pieces together


Download ppt "Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and."

Similar presentations


Ads by Google