Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.

Slides:



Advertisements
Similar presentations
Building an ASR using HTK CS4706
Advertisements

Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
Masters Presentation at Griffith University Master of Computer and Information Engineering Magnus Nilsson
Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
The 1980’s Collection of large standard corpora Front ends: auditory models, dynamics Engineering: scaling to large vocabulary continuous speech Second.
F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)
4/25/2001ECE566 Philip Felber1 Speech Recognition A report of an Isolated Word experiment. By Philip Felber Illinois Institute of Technology April 25,
A PRESENTATION BY SHAMALEE DESHPANDE
Representing Acoustic Information
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition Mark D. Skowronski.
Introduction to Automatic Speech Recognition
Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition Mark D. Skowronski.
Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
Automatic detection of microchiroptera echolocation calls from field recordings using machine learning algorithms Mark D. Skowronski and John G. Harris.
7-Speech Recognition Speech Recognition Concepts
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering.
Implementing a Speech Recognition System on a GPU using CUDA
Jacob Zurasky ECE5526 – Spring 2011
Csc Lecture 7 Recognizing speech. Geoffrey Hinton.
Biologically Inspired Noise- Robust Speech Recognition for Both Man and Machine Mark D. Skowronski Ph.D. Proposal University of Florida Gainesville, FL,
Basics of Neural Networks Neural Network Topologies.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.
Performance Comparison of Speaker and Emotion Recognition
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.
Predicting Voice Elicited Emotions
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
Performance Analysis of Advanced Front Ends on the Aurora Large Vocabulary Evaluation Authors: Naveen Parihar and Joseph Picone Inst. for Signal and Info.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Detection of Vowel Onset Point in Speech S.R. Mahadeva Prasanna & Jinu Mariam Zachariah Department of Computer Science & Engineering Indian Institute.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition Mark D. Skowronski.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
Speech Processing Using HTK Trevor Bowden 12/08/2008.
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
Guided By, DINAKAR DAS.C.N ( Assistant professor ECE ) Presented by, ARUN.V.S S7 EC ROLL NO: 2 1.
Speech Processing Dr. Veton Këpuska, FIT Jacob Zurasky, FIT.
Recognition of bumblebee species by their buzzing sound
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
ARTIFICIAL NEURAL NETWORKS
Speech Recognition UNIT -5.
Artificial Intelligence for Speech Recognition
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture
Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.
Statistical Models for Automatic Speech Recognition
Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan
Statistical Models for Automatic Speech Recognition
8-Speech Recognition Speech Recognition Concepts
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
Digital Systems: Hardware Organization and Design
Presenter: Shih-Hsiang(士翔)
Keyword Spotting Dynamic Time Warping
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and Computer Engineering University of Florida October 6, 2004

What is ASR? Automatic Speech Recognition is: –A system that converts a raw acoustic signal into phonetically meaningful text. –A combination of engineering, linguistics, statistics, psychoacoustics, and computer science.

“seven” Psychoacousticians provide expert knowledge about human acoustic perception. Engineers provide efficient algorithms and hardware. Linguists provide language rules. What is ASR? Feature extractionClassificationLanguage model Computer scientists and statisticians provide optimum modeling.

Feature extraction Acoustic-phonetic paradigm (pre 1980): –Holistic features (voicing and frication measures, durations, formants and BW) –Difficult to construct robust classifiers Frame-based paradigm (1980 to today): –Short (20 ms) sliding analysis window, assumes speech frame is quasi-stationarity –Relies on classifier to account for speech nonstationarity –Allows for the inclusion of expert information of speech perception

Feature extraction algorithms Cepstrum (1962) Linear prediction (1967) Mel frequency cepstral coefficients (Davis & Mermelstein, 1980) Perceptual linear prediction (Hermansky,1990) Human factor cepstral coefficients (Skowronski & Harris, 2002)

“seven” Cepstral domain DCT Log energy Mel-scaled filter bank Fourier x(t) Time Filter # MFCC algorithm

Classification Operates on frame-based features Accounts for time variations of speech Uses training data to transform features into symbols (phonemes, bi-/tri-phones, words) Non-parametric: Dynamic time warp (DTW) –No parameters to estimate –Computationally expensive, scaling issues Parametric: Hidden Markov model (HMM) –State-of-the-art model, complements features –Data-intensive, scales well

HMM classification A Hidden Markov Model is a piecewise stationary model of a nonstationary signal. Model characteristics states: represent domains of piecewise stationarity interstate connections: defines model architecture parameters: pdf means & covariance

HMM diagram Time domain State space Feature space

Symbol# ModelsPositiveNegative Word <1000CoarticulationScaling Phoneme40pdf estimationCoarticulation Biphone1400 Triphone40KCoarticulationpdf estimation TRADEOFF HMM output symbols

Language models Considers multiple output symbol hypotheses Delays making hard decision on classifier output Uses language-based expert knowledge to predict meaningful words/phrases from classifier output N-phones/word symbols Major research topic since early 1990s with advent of large speech corpora

ASR Problems Test/Train mismatch Speaker variations (gender, accent, mood) Weak model assumptions Noise: energetic or informational (babble) Current state-of-the-art does not model the human brain nor function with the accuracy or reliability of humans Most progress of late comes from faster computers, not new ideas

Conclusions Automatic speech recognition technology emerges from several diverse disciplines –Acousticians describe how speech is produced and perceived by humans –Computer scientists create machine learning models for signal-to-symbol conversion –Linguists provide language information –Engineers optimize the algorithms and provide the hardware, and put the pieces together