Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:

Slides:



Advertisements
Similar presentations
10 september 2002 A.Broersen Developing a Virtual Piano Playing Environment By combining distributed functionality among independent Agents.
Advertisements

BravoBrava Mississippi State University Spontaneous telephone speech is still a “grand challenge”. Telephone-quality speech is still central to the problem.
Speech Recognition Part 3 Back end processing. Speech recognition simplified block diagram Speech Capture Speech Capture Feature Extraction Feature Extraction.
ECE 8443 – Pattern Recognition Objectives: Course Introduction Typical Applications Resources: Syllabus Internet Books and Notes D.H.S: Chapter 1 Glossary.
Sean Powers Florida Institute of Technology ECE 5525 Final: Dr. Veton Kepuska Date: 07 December 2010 Controlling your household appliances through conversation.
SPEECH RECOGNITION Kunal Shalia and Dima Smirnov.
Natural Language Processing - Speech Processing -
Speech Translation on a PDA By: Santan Challa Instructor Dr. Christel Kemke.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
COMP 4060 Natural Language Processing Speech Processing.
ITCS 6010 Natural Language Understanding. Natural Language Processing What is it? Studies the problems inherent in the processing and manipulation of.
Dynamic Time Warping Applications and Derivation
Why is ASR Hard? Natural speech is continuous
A PRESENTATION BY SHAMALEE DESHPANDE
Statistical Natural Language Processing. What is NLP?  Natural Language Processing (NLP), or Computational Linguistics, is concerned with theoretical.
Natural Language Understanding
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Audio Processing for Ubiquitous Computing Uichin Lee KAIST KSE.
Introduction to Automatic Speech Recognition
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.
1 Computational Linguistics Ling 200 Spring 2006.
Machine Learning in Spoken Language Processing Lecture 21 Spoken Language Processing Prof. Andrew Rosenberg.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
By: Meghal Bhatt.  Sphinx4 is a state of the art speaker independent, continuous speech recognition system written entirely in java programming language.
17.0 Distributed Speech Recognition and Wireless Environment References: 1. “Quantization of Cepstral Parameters for Speech Recognition over the World.
Csc Lecture 7 Recognizing speech. Geoffrey Hinton.
Speech Recognition LIACS Media Lab Leiden University Seminar Speech Recognition Project Support E.M. Bakker LIACS Media Lab (LML) Leiden University.
Spoken Dialog Systems and Voice XML Lecturer: Prof. Esther Levin.
IRCS/CCN Summer Workshop June 2003 Speech Recognition.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Advanced Topics in Speech Processing (IT60116) K Sreenivasa Rao School of Information Technology IIT Kharagpur.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
AMSP : Advanced Methods for Speech Processing An expression of Interest to set up a Network of Excellence in FP6 Prepared by members of COST-277 and colleagues.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
Seminar Speech Recognition a Short Overview E.M. Bakker
Performance Comparison of Speaker and Emotion Recognition
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
Digital Video Library Network Supervisor: Prof. Michael Lyu Student: Ma Chak Kei, Jacky.
The basics of knowing the difference CLIENT VS. SERVER.
Speech Processing Using HTK Trevor Bowden 12/08/2008.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Christoph Prinz / Automatic Speech Recognition Research Progress Hits the Road.
Speaker Recognition UNIT -6. Introduction  Speaker recognition is the process of automatically recognizing who is speaking on the basis of information.
Speech Processing Dr. Veton Këpuska, FIT Jacob Zurasky, FIT.
Automatic Speech Recognition
Linguistic knowledge for Speech recognition
Online Multiscale Dynamic Topic Models
ARTIFICIAL NEURAL NETWORKS
Speech Recognition UNIT -5.
HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs
8-Speech Recognition Speech Recognition Concepts
A Tutorial on Bayesian Speech Feature Enhancement
EE513 Audio Signals and Systems
HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs
LECTURE 15: REESTIMATION, EM AND MIXTURES
Speech Recognition: Acoustic Waves
Human Speech Communication
Artificial Intelligence 2004 Speech & Natural Language Processing
The Application of Hidden Markov Models in Speech Recognition
Presentation transcript:

Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition: P(W|A) = P(A|W) P(W) / P(A) Recognition Architectures A Communication Theoretic Approach Objective: minimize the word error rate Approach: maximize P(W|A) during training Components: P(A|W) : acoustic model (hidden Markov models, mixtures) P(W) : language model (statistical, finite state networks, etc.) The language model typically predicts a small set of next words based on knowledge of a finite number of previous words (N-grams).

Input Speech Recognition Architectures Incorporating Multiple Knowledge Sources Acoustic Front-end Acoustic Front-end The signal is converted to a sequence of feature vectors based on spectral and temporal measurements. Acoustic Models P(A/W) Acoustic Models P(A/W) Acoustic models represent sub-word units, such as phonemes, as a finite- state machine in which states model spectral structure and transitions model temporal structure. Recognized Utterance Search Search is crucial to the system, since many combinations of words must be investigated to find the most probable word sequence. The language model predicts the next set of words, and controls which models are hypothesized. Language Model P(W)

Fourier Transform Fourier Transform Cepstral Analysis Cepstral Analysis Perceptual Weighting Perceptual Weighting Time Derivative Time Derivative Time Derivative Time Derivative Energy + Mel-Spaced Cepstrum Delta Energy + Delta Cepstrum Delta-Delta Energy + Delta-Delta Cepstrum Input Speech Incorporate knowledge of the nature of speech sounds in measurement of the features. Utilize rudimentary models of human perception. Acoustic Modeling Feature Extraction Measure features 100 times per sec. Use a 25 msec window for frequency domain analysis. Include absolute energy and 12 spectral measurements. Time derivatives to model spectral change.

Job Submission Demo High Bandwidth Requirements High memory and computation requirements (LVCSR). Models (acoustic and language) reside on the client side. CPU intensive computation is done on the server side. Real time applications require audio transfer from the client to the server.

Job Submission Demo JSD Overview User interface for starting jobs on the server side. On-line job status and automatic notification. Automatic recognition results via . Remote Job Submission Recognition Results

Job Submission Demo JSD User Interface A graphical user interface to launch experiments and view the results. Ability to play audio data and view the current jobs/loads on the servers.

Job Submission Demo Starting a Recognition Job User selects the type of experiment, data and system parameters. Provides notification and the ability to recognition results. Ability to password protect viewing/sending of recognition results.

Job Submission Demo Retrieving Recognition Results On-line viewing of recognition results is available. Ability to recognition results at any stage of the process.

Job Submission Demo Future Work Logging and data collection. Handle message passing among distributed servers. Manage progress of utterance through processing stages. Option for storing and publishing global state for interaction. Speech Recognition Application Back-end Spoken Language Understanding Text-to-Speech Conversion Hub Audio Server Dialogue Management