Presentation is loading. Please wait.

Presentation is loading. Please wait.

Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:

Similar presentations


Presentation on theme: "Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:"— Presentation transcript:

1 Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition: P(W|A) = P(A|W) P(W) / P(A) Recognition Architectures A Communication Theoretic Approach Objective: minimize the word error rate Approach: maximize P(W|A) during training Components: P(A|W) : acoustic model (hidden Markov models, mixtures) P(W) : language model (statistical, finite state networks, etc.) The language model typically predicts a small set of next words based on knowledge of a finite number of previous words (N-grams).

2 Input Speech Recognition Architectures Incorporating Multiple Knowledge Sources Acoustic Front-end Acoustic Front-end The signal is converted to a sequence of feature vectors based on spectral and temporal measurements. Acoustic Models P(A/W) Acoustic Models P(A/W) Acoustic models represent sub-word units, such as phonemes, as a finite- state machine in which states model spectral structure and transitions model temporal structure. Recognized Utterance Search Search is crucial to the system, since many combinations of words must be investigated to find the most probable word sequence. The language model predicts the next set of words, and controls which models are hypothesized. Language Model P(W)

3 Fourier Transform Fourier Transform Cepstral Analysis Cepstral Analysis Perceptual Weighting Perceptual Weighting Time Derivative Time Derivative Time Derivative Time Derivative Energy + Mel-Spaced Cepstrum Delta Energy + Delta Cepstrum Delta-Delta Energy + Delta-Delta Cepstrum Input Speech Incorporate knowledge of the nature of speech sounds in measurement of the features. Utilize rudimentary models of human perception. Acoustic Modeling Feature Extraction Measure features 100 times per sec. Use a 25 msec window for frequency domain analysis. Include absolute energy and 12 spectral measurements. Time derivatives to model spectral change.

4 Job Submission Demo High Bandwidth Requirements High memory and computation requirements (LVCSR). Models (acoustic and language) reside on the client side. CPU intensive computation is done on the server side. Real time applications require audio transfer from the client to the server.

5 Job Submission Demo JSD Overview User interface for starting jobs on the server side. On-line job status and automatic e-mail notification. Automatic recognition results via e-mail. Remote Job Submission Recognition Results

6 Job Submission Demo JSD User Interface A graphical user interface to launch experiments and view the results. Ability to play audio data and view the current jobs/loads on the servers.

7 Job Submission Demo Starting a Recognition Job User selects the type of experiment, data and system parameters. Provides e-mail notification and the ability to e-mail recognition results. Ability to password protect viewing/sending of recognition results.

8 Job Submission Demo Retrieving Recognition Results On-line viewing of recognition results is available. Ability to e-mail recognition results at any stage of the process.

9 Job Submission Demo Future Work Logging and data collection. Handle message passing among distributed servers. Manage progress of utterance through processing stages. Option for storing and publishing global state for interaction. Speech Recognition Application Back-end Spoken Language Understanding Text-to-Speech Conversion Hub Audio Server Dialogue Management


Download ppt "Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:"

Similar presentations


Ads by Google