ASR Intro: Outline ASR Research History Difficulties and Dimensions Core Technology Components 21st century ASR Research (Next two lectures)

Slides:



Advertisements
Similar presentations
Building an ASR using HTK CS4706
Advertisements

Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
Lecture 15 Hidden Markov Models Dr. Jianjun Hu mleg.cse.sc.edu/edu/csce833 CSCE833 Machine Learning University of South Carolina Department of Computer.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
SPEECH RECOGNITION Kunal Shalia and Dima Smirnov.
Natural Language Processing - Speech Processing -
The 1980’s Collection of large standard corpora Front ends: auditory models, dynamics Engineering: scaling to large vocabulary continuous speech Second.
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
4/25/2001ECE566 Philip Felber1 Speech Recognition A report of an Isolated Word experiment. By Philip Felber Illinois Institute of Technology April 25,
Speaker Adaptation for Vowel Classification
EE2F1 Speech & Audio Technology Sept. 26, 2002 SLIDE 1 THE UNIVERSITY OF BIRMINGHAM ELECTRONIC, ELECTRICAL & COMPUTER ENGINEERING Digital Systems & Vision.
CS 188: Artificial Intelligence Fall 2009 Lecture 21: Speech Recognition 11/10/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.
Signal Modeling for Robust Speech Recognition With Frequency Warping and Convex Optimization Yoon Kim March 8, 2000.
Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.
Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Speech Recognition in Noise
COMP 4060 Natural Language Processing Speech Processing.
Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.
Optimal Adaptation for Statistical Classifiers Xiao Li.
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
Voice Transformation Project by: Asaf Rubin Michael Katz Under the guidance of: Dr. Izhar Levner.
Why is ASR Hard? Natural speech is continuous
A PRESENTATION BY SHAMALEE DESHPANDE
Natural Language Understanding
Representing Acoustic Information
Introduction to Automatic Speech Recognition
Statistical automatic identification of microchiroptera from echolocation calls Lessons learned from human automatic speech recognition Mark D. Skowronski.
EE 225D, Section I: Broad background
Eng. Shady Yehia El-Mashad
What do people perceive? Determine pitch Also determine location (binaural) Seemingly extract envelope (filters) Also evidence for temporal processing.
Overview of NIT HMM-based speech synthesis system for Blizzard Challenge 2011 Kei Hashimoto, Shinji Takaki, Keiichiro Oura, and Keiichi Tokuda Nagoya.
Isolated-Word Speech Recognition Using Hidden Markov Models
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
7-Speech Recognition Speech Recognition Concepts
Automatic Speech Recognition (ASR): A Brief Overview.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
By: Meghal Bhatt.  Sphinx4 is a state of the art speaker independent, continuous speech recognition system written entirely in java programming language.
Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering.
Jacob Zurasky ECE5526 – Spring 2011
Csc Lecture 7 Recognizing speech. Geoffrey Hinton.
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
IRCS/CCN Summer Workshop June 2003 Speech Recognition.
Basics of Neural Networks Neural Network Topologies.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
PhD Candidate: Tao Ma Advised by: Dr. Joseph Picone Institute for Signal and Information Processing (ISIP) Mississippi State University Linear Dynamic.
Introduction to Digital Speech Processing Presented by Dr. Allam Mousa 1 An Najah National University SP_1_intro.
A STUDY ON SPEECH RECOGNITION USING DYNAMIC TIME WARPING CS 525 : Project Presentation PALDEN LAMA and MOUNIKA NAMBURU.
Performance Comparison of Speaker and Emotion Recognition
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,
1 LING 696B: Final thoughts on nonparametric methods, Overview of speech processing.
A Hybrid Model of HMM and RBFN Model of Speech Recognition 길이만, 김수연, 김성호, 원윤정, 윤아림 한국과학기술원 응용수학전공.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
ASR Intro: Outline ASR Research History Difficulties and Dimensions Core Technology Components 21st century ASR Research.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
PATTERN COMPARISON TECHNIQUES
Online Multiscale Dynamic Topic Models
Computational NeuroEngineering Lab
CS 188: Artificial Intelligence Fall 2008
8-Speech Recognition Speech Recognition Concepts
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

ASR Intro: Outline ASR Research History Difficulties and Dimensions Core Technology Components 21st century ASR Research (Next two lectures)

Radio Rex “It consisted of a celluloid dog with an iron base held within its house by an electromagnet against the force of a spring. Current energizing the magnet flowed through a metal bar which was arranged to form a bridge with 2 supporting members. This bridge was sensitive to 500 cps acoustic energy which vibrated it, interrupting the current and releasing the dog. The energy around 500 cps contained in the vowel of the word Rex was sufficient to trigger the device when the dog’s name was called.”

1952 Bell Labs Digits First word (digit) recognizer Approximates energy in formants (vocal tract resonances) over word Already has some robust ideas (insensitive to amplitude, timing variation) Worked very well Main weakness was technological (resistors and capacitors)

Digit Patterns HP filter (1 kHz) Digit Spoken LP filter (800 Hz) Limiting Amplifier Limiting Amplifier Axis Crossing Counter Axis Crossing Counter (Hz) (kHz)

The 60’s Better digit recognition Breakthroughs: Spectrum Estimation (FFT, cepstra, LPC), Dynamic Time Warp (DTW), and Hidden Markov Model (HMM) theory 1969 Pierce letter to JASA: “Whither Speech Recognition?”

Pierce Letter 1969 JASA Pierce led Bell Labs Communications Sciences Division Skeptical about progress in speech recognition, motives, scientific approach Came after two decades of research by many labs

Pierce Letter (Continued) ASR research was government-supported. He asked: Is this wise? Are we getting our money’s worth?

Purpose for ASR Talking to machine had (“gone downhill since…….Radio Rex”) Main point: to really get somewhere, need intelligence, language Learning about speech Main point: need to do science, not just test “mad schemes ”

ARPA Project Focus on Speech Understanding Main work at 3 sites: System Development Corporation, CMU and BBN Other work at Lincoln, SRI, Berkeley Goal was 1000-word ASR, a few speakers, connected speech, constrained grammar, less than 10% semantic error

Results Only CMU Harpy fulfilled goals - used LPC, segments, lots of high level knowledge, learned from Dragon * (Baker) * The CMU system done in the early ‘70’s; as opposed to the company formed in the ‘80’s

Achieved by 1976 Spectral and cepstral features, LPC Some work with phonetic features Incorporating syntax and semantics Initial Neural Network approaches DTW-based systems (many) HMM-based systems (Dragon, IBM)

Automatic Speech Recognition Data Collection Pre-processing Feature Extraction (Framewise) Hypothesis Generation Cost Estimator Decoding

Frame 1 Frame 2 Feature Vector X 1 Feature Vector X 2 Framewise Analysis of Speech

1970’s Feature Extraction Filter banks - explicit, or FFT-based Cepstra - Fourier components of log spectrum LPC - linear predictive coding (related to acoustic tube)

LPC Spectrum

LPC Model Order

Spectral Estimation Filter Banks Cepstral Analysis LPC Reduced Pitch Effects Excitation Estimate Direct Access to Spectra Less Resolution at HF Orthogonal Outputs Peak-hugging Property Reduced Computation X X X XX XX X X X

Dynamic Time Warp Optimal time normalization with dynamic programming Proposed by Sakoe and Chiba, circa 1970 Similar time, proposal by Itakura Probably Vintsyuk was first (1968) Good review article by White, in Trans ASSP April 1976

Nonlinear Time Normalization

HMMs for Speech Math from Baum and others, Applied to speech by Baker in the original CMU Dragon System (1974) Developed by IBM (Baker, Jelinek, Bahl, Mercer,….) ( ) Extended by others in the mid-1980’s

A Hidden Markov Model qqq P(x | q ) P(q | q )

Markov model q q 12 P(x,x |q,q )  P( q ) P(x |q ) P(q | q ) P(x | q )

Markov model (graphical form) q q 12 q q 34 x 1 2 x x 3 4 x

HMM Training Steps Initialize estimators and models Estimate “hidden” variable probabilities Choose estimator parameters to maximize model likelihoods Assess and repeat steps as necessary A special case of Expectation Maximization (EM)