Why is ASR Hard? Natural speech is continuous

Slides:



Advertisements
Similar presentations
Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
Advertisements

Building an ASR using HTK CS4706
Frederico Rodrigues and Isabel Trancoso INESC/IST, 2000 Robust Recognition of Digits and Natural Numbers.
Automatic Speech Recognition Slides now available at
Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
Acoustic / Lexical Model Derk Geene. Speech recognition  P(words|signal)= P(signal|words) P(words) / P(signal)  P(signal|words): Acoustic model  P(words):
Natural Language Processing - Speech Processing -
Advances in WP1 Turin Meeting – 9-10 March
Speech Recognition. What makes speech recognition hard?
COMP 4060 Natural Language Processing Speech Processing.
A PRESENTATION BY SHAMALEE DESHPANDE
Automatic Speech Recognition Introduction Readings: Jurafsky & Martin HLT Survey Chapter 1.
ISSUES IN SPEECH RECOGNITION Shraddha Sharma
May 20, 2006SRIV2006, Toulouse, France1 Acoustic Modeling of Accented English Speech for Large-Vocabulary Speech Recognition ATR Spoken Language Communication.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Introduction to Automatic Speech Recognition
EE 225D, Section I: Broad background
Adaptation Techniques in Automatic Speech Recognition Tor André Myrvoll Telektronikk 99(2), Issue on Spoken Language Technology in Telecommunications,
What do people perceive? Determine pitch Also determine location (binaural) Seemingly extract envelope (filters) Also evidence for temporal processing.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Schizophrenia and Depression – Evidence in Speech Prosody Student: Yonatan Vaizman Advisor: Prof. Daphna Weinshall Joint work with Roie Kliper and Dr.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.
Supervisor: Dr. Eddie Jones Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification System for Security.
Speech Recognition Application
Speech and Language Processing
7-Speech Recognition Speech Recognition Concepts
Automatic Speech Recognition (ASR): A Brief Overview.
By: Meghal Bhatt.  Sphinx4 is a state of the art speaker independent, continuous speech recognition system written entirely in java programming language.
Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering.
Survey of ICASSP 2013 section: feature for robust automatic speech recognition Repoter: Yi-Ting Wang 2013/06/19.
Compensating speaker-to-microphone playback system for robust speech recognition So-Young Jeong and Soo-Young Lee Brain Science Research Center and Department.
1 Improved Speaker Adaptation Using Speaker Dependent Feature Projections Spyros Matsoukas and Richard Schwartz Sep. 5, 2003 Martigny, Switzerland.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Definitions Random Signal Analysis (Review) Discrete Random Signals Random.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Recognition of Speech Using Representation in High-Dimensional Spaces University of Washington, Seattle, WA AT&T Labs (Retd), Florham Park, NJ Bishnu Atal.
Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.
Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project DKE Maastricht University.
Learning Long-Term Temporal Feature in LVCSR Using Neural Networks Barry Chen, Qifeng Zhu, Nelson Morgan International Computer Science Institute (ICSI),
Subproject II: Robustness in Speech Recognition. Members (1/2) Hsiao-Chuan Wang (PI) National Tsing Hua University Jeih-Weih Hung (Co-PI) National Chi.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
Introduction to Digital Speech Processing Presented by Dr. Allam Mousa 1 An Najah National University SP_1_intro.
Performance Comparison of Speaker and Emotion Recognition
© 2013 by Larson Technical Services
ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
ASR Intro: Outline ASR Research History Difficulties and Dimensions Core Technology Components 21st century ASR Research.
Detection Of Anger In Telephone Speech Using Support Vector Machine and Gaussian Mixture Model Prepared By : Siti Marahaini Binti Mahamood.
Speech Enhancement Summer 2009
Automatic Speech Recognition Introduction
Liverpool Keele Contribution.
Statistical Models for Automatic Speech Recognition
Speech Recognition Application
Automatic Speech Recognition
Speech Processing Speech Recognition
Statistical Models for Automatic Speech Recognition
8-Speech Recognition Speech Recognition Concepts
EE513 Audio Signals and Systems
Automatic Speech Recognition
汉语连续语音识别 年1月4日访北京工业大学 973 Project 2019/4/17 汉语连续语音识别 年1月4日访北京工业大学 郑 方 清华大学 计算机科学与技术系 语音实验室
Human Speech Communication
Automatic Speech Recognition
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

Why is ASR Hard? Natural speech is continuous Natural speech has disfluencies Natural speech is variable over: global rate, local rate, pronunciation within speaker, pronunciation across speakers, phonemes in different contexts

Why is ASR Hard? (continued) Large vocabularies are confusable Out of vocabulary words inevitable Recorded speech is variable over: room acoustics, channel characteristics, background noise Large training times are not practical User expectations are for equal to or greater than “human performance”

Main Causes of Speech Variability Environment Speaker Input Equipment Speech - correlated noise reverberation, reflection Uncorrelated noise additive noise (stationary, nonstationary) Attributes of speakers dialect, gender, age Manner of speaking breath & lip noise stress Lombard effect rate level pitch cooperativeness Microphone (Transmitter) Distance from microphone Filter Transmission system distortion, noise, echo Recording equipment

ASR Dimensions Speaker dependent, independent Isolated, continuous, keywords Lexicon size and difficulty Task constraints, perplexity Adverse or easy conditions Natural or read speech

Telephone Speech Limited bandwidth (F vs S) Large speaker variability Large noise variability Channel distortion Different handset microphones Mobile and handsfree acoustics

Automatic Speech Recognition Hypothesis Generation Data Collection Pre-processing Feature Extraction Hypothesis Generation Cost Estimator Decoding

Pre-processing Speech Issue: Effect on modeling Room Acoustics Linear Filtering Sampling & Digitization Microphone Issue: Effect on modeling

Auditory Model/ Normalizations Feature Extraction Spectral Analysis Auditory Model/ Normalizations Issue: Design for discrimination

Representations are Important Network Speech waveform 23% frame correct Network PLP features 70% frame correct

Hypothesis Generation cat dog a cat not is adog a dog is not a cat Issue: models of language and task

Cost Estimation Distances -Log probabilities, from discrete distributions Gaussians, mixtures neural networks

Decoding

Pronunciation Models

Language Models Most likely words for largest product P(acousticswords)  P(words) P(words) =  P(wordshistory) bigram, history is previous word trigram, history is previous 2 words n-gram, history is previous n-1 words

Probability Estimator System Architecture Grammar Signal Processing Probability Estimator Decoder Recognized Words “zero” “three” “two” Probabilities “z” -0.81 “th” = 0.15 “t” = 0.03 Cepstrum Speech Signal Pronunciation Lexicon