The 1980’s Collection of large standard corpora Front ends: auditory models, dynamics Engineering: scaling to large vocabulary continuous speech Second.

Slides:



Advertisements
Similar presentations
Discriminative Training in Speech Processing Filipp Korkmazsky LORIA.
Advertisements

Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
1990s DARPA Programmes WSJ and BN Dapo Durosinmi-Etti Bo Xu Xiaoxiao Zheng.
Frederico Rodrigues and Isabel Trancoso INESC/IST, 2000 Robust Recognition of Digits and Natural Numbers.
SRI 2001 SPINE Evaluation System Venkata Ramana Rao Gadde Andreas Stolcke Dimitra Vergyri Jing Zheng Kemal Sonmez Anand Venkataraman.
Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections
F 鍾承道 Acoustic Features for Speech Recognition: From Mel-Frequency Cepstrum Coefficients (MFCC) to BottleNeck Features(BNF)
ASR Intro: Outline ASR Research History Difficulties and Dimensions Core Technology Components 21st century ASR Research (Next two lectures)
Signal Modeling for Robust Speech Recognition With Frequency Warping and Convex Optimization Yoon Kim March 8, 2000.
By the Novel Approaches team: Nelson Morgan, ICSI Hynek Hermansky, OGI Dan Ellis, Columbia Kemal Sonmez, SRI Mari Ostendorf, UW Hervé Bourlard, IDIAP/EPFL.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Advances in WP1 and WP2 Paris Meeting – 11 febr
1 USING CLASS WEIGHTING IN INTER-CLASS MLLR Sam-Joo Doh and Richard M. Stern Department of Electrical and Computer Engineering and School of Computer Science.
Why is ASR Hard? Natural speech is continuous
Representing Acoustic Information
Introduction to Automatic Speech Recognition
EE 225D, Section I: Broad background
What do people perceive? Determine pitch Also determine location (binaural) Seemingly extract envelope (filters) Also evidence for temporal processing.
1 International Computer Science Institute Data Sampling for Acoustic Model Training Özgür Çetin International Computer Science Institute Andreas Stolcke.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
So far: Historical introduction Mathematical background (e.g., pattern classification, acoustics) Feature extraction for speech recognition (and some neural.
1 Robust HMM classification schemes for speaker recognition using integral decode Marie Roch Florida International University.
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST.
Speech and Language Processing
Automatic Speech Recognition (ASR): A Brief Overview.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
By: Meghal Bhatt.  Sphinx4 is a state of the art speaker independent, continuous speech recognition system written entirely in java programming language.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering.
Speech, Perception, & AI Artificial Intelligence CMSC March 5, 2002.
Csc Lecture 7 Recognizing speech. Geoffrey Hinton.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Data Sampling & Progressive Training T. Shinozaki & M. Ostendorf University of Washington In collaboration with L. Atlas.
CMU Robust Vocabulary-Independent Speech Recognition System Hsiao-Wuen Hon and Kai-Fu Lee ICASSP 1991 Presenter: Fang-Hui CHU.
From last time …. ASR System Architecture Pronunciation Lexicon Signal Processing Probability Estimator Decoder Recognized Words “zero” “three” “two”
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
1 CS 430: Information Discovery Lecture 22 Non-Textual Materials: Informedia.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Speech, Perception, & AI Artificial Intelligence CMSC February 13, 2003.
SCALING UP: LEARNING LARGE-SCALE RECOGNITION METHODS FROM SMALL-SCALE RECOGNITION TASKS Nelson Morgan, Barry Y Chen, Qifeng Zhu, Andreas Stolcke International.
New Acoustic-Phonetic Correlates Sorin Dusan and Larry Rabiner Center for Advanced Information Processing Rutgers University Piscataway,
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
Performance Comparison of Speaker and Emotion Recognition
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
1 Voicing Features Horacio Franco, Martin Graciarena Andreas Stolcke, Dimitra Vergyri, Jing Zheng STAR Lab. SRI International.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
Cross-Dialectal Data Transferring for Gaussian Mixture Model Training in Arabic Speech Recognition Po-Sen Huang Mark Hasegawa-Johnson University of Illinois.
Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.
ASR Intro: Outline ASR Research History Difficulties and Dimensions Core Technology Components 21st century ASR Research.
Speaker Recognition UNIT -6. Introduction  Speaker recognition is the process of automatically recognizing who is speaking on the basis of information.
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
Hierarchical Multi-Stream Posterior Based Speech Recognition System
Speech Processing Speech Recognition
Frequency Domain Perceptual Linear Predicton (FDPLP)
EEG Recognition Using The Kaldi Speech Recognition Toolkit
Presentation for EEL6586 Automatic Speech Processing
8-Speech Recognition Speech Recognition Concepts
Digital Systems: Hardware Organization and Design
汉语连续语音识别 年1月4日访北京工业大学 973 Project 2019/4/17 汉语连续语音识别 年1月4日访北京工业大学 郑 方 清华大学 计算机科学与技术系 语音实验室
Human Speech Communication
Speaker Identification:
Presentation transcript:

The 1980’s Collection of large standard corpora Front ends: auditory models, dynamics Engineering: scaling to large vocabulary continuous speech Second major (D)ARPA ASR project HMMs become ready for prime time

Standard Corpora Collection Before 1984, chaos TIMIT RM (later WSJ) ATIS NIST, ARPA, LDC

Front Ends in the 1980’s Mel cepstrum (Bridle, Mermelstein) PLP (Hermansky) Delta cepstrum (Furui) Auditory models (Seneff, Ghitza, others)

Mel Frequency Scale

Spectral vs Temporal Processing Processing Analysis (e.g., cepstral) Processing (e.g., mean removal) Time frequency Spectral processing Temporal processing

Dynamic Speech Features temporal dynamics useful for ASR local time derivatives of cepstra “delta’’ features estimated over multiple frames (typically 5) usually augments static features can be viewed as a temporal filter

“Delta” impulse response frames

HMM’s for Continuous Speech Using dynamic programming for cts speech (Vintsyuk, Bridle, Sakoe, Ney….) Application of Baker-Jelinek ideas to continuous speech (IBM, BBN, Philips,...) Multiple groups developing major HMM systems (CMU, SRI, Lincoln, BBN, ATT) Engineering development - coping with data, fast computers

2nd (D)ARPA Project Common task Frequent evaluations Convergence to good, but similar, systems Lots of engineering development - now up to 60,000 word recognition, in real time, on a workstation, with less than 10% word error Competition inspired others not in project - Cambridge did HTK, now widely distributed

Knowledge vs. Ignorance Using acoustic-phonetic knowledge in explicit rules Ignorance represented statistically Ignorance-based approaches (HMMs) “won”, but Knowledge (e.g., segments) becoming statistical Statistics incorporating knowledge

Some 1990’s Issues Independence to long-term spectrum Adaptation Effects of spontaneous speech Information retrieval/extraction with broadcast material Query-style systems (e.g., ATIS) Applying ASR technology to related areas (language ID, speaker verification)

Where Pierce Letter Applies We still need science Need language, intelligence Acoustic robustness still poor Perceptual research, models Fundamentals of statistical pattern recognition for sequences Robustness to accent, stress, rate of speech, ……..

Progress in 25 Years From digits to 60,000 words From single speakers to many From isolated words to continuous speech From no products to many products, some systems actually saving LOTS of money

Real Uses Telephone: phone company services (collect versus credit card) Telephone: call centers for query information (e.g., stock quotes, parcel tracking) Dictation products: continuous recognition, speaker dependent/adaptive

But: Still <97% accurate on “yes” for telephone Unexpected rate of speech causes doubling or tripling of error rate Unexpected accent hurts badly Accuracy on unrestricted speech at 60% Don’t know when we know Few advances in basic understanding

Confusion Matrix for Digit Recognition

Large Vocabulary CSR Error Rate % ‘88‘89‘90‘91‘92‘93‘94 Year RM ( 1K words, PP 60) ___ WSJØ, WSJ1 (5K, 20-60K words, PP 100) Ø 1 ~ ~