Speech Recognition: Acoustic Waves

Slides:



Advertisements
Similar presentations
1 CS 388: Natural Language Processing: N-Gram Language Models Raymond J. Mooney University of Texas at Austin.
Advertisements

Particle Filtering Sometimes |X| is too big to use exact inference
Building an ASR using HTK CS4706
Introduction The aim the project is to analyse non real time EEG (Electroencephalogram) signal using different mathematical models in Matlab to predict.
Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections
Hidden Markov Models Bonnie Dorr Christof Monz CMSC 723: Introduction to Computational Linguistics Lecture 5 October 6, 2004.
Natural Language Processing - Speech Processing -
Application of HMMs: Speech recognition “Noisy channel” model of speech.
Speech Recognition. What makes speech recognition hard?
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
CS 188: Artificial Intelligence Fall 2009 Lecture 21: Speech Recognition 11/10/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.
COMP 4060 Natural Language Processing Speech Processing.
A PRESENTATION BY SHAMALEE DESHPANDE
Representing Acoustic Information
Audio Processing for Ubiquitous Computing Uichin Lee KAIST KSE.
LE 460 L Acoustics and Experimental Phonetics L-13
BİL711 Natural Language Processing1 Statistical Language Processing In the solution of some problems in the natural language processing, statistical techniques.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.
NLP Language Models1 Language Models, LM Noisy Channel model Simple Markov Models Smoothing Statistical Language Models.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Prof. Pushpak Bhattacharyya, IIT Bombay.1 Application of Noisy Channel, Channel Entropy CS 621 Artificial Intelligence Lecture /09/05.
Statistical NLP: Lecture 8 Statistical Inference: n-gram Models over Sparse Data (Ch 6)
Speech, Perception, & AI Artificial Intelligence CMSC March 5, 2002.
Chapter6. Statistical Inference : n-gram Model over Sparse Data 이 동 훈 Foundations of Statistic Natural Language Processing.
IRCS/CCN Summer Workshop June 2003 Speech Recognition.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
CS 188: Artificial Intelligence Fall 2007 Lecture 22: Viterbi 11/13/2007 Dan Klein – UC Berkeley.
Speech, Perception, & AI Artificial Intelligence CMSC February 13, 2003.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
1 Introduction to Natural Language Processing ( ) Language Modeling (and the Noisy Channel) AI-lab
1 LSA 352 Summer 2007 LSA 352 Speech Recognition and Synthesis Dan Jurafsky Lecture 6: Feature Extraction and Acoustic Modeling IP Notice: Various slides.
The famous “sprinkler” example (J. Pearl, Probabilistic Reasoning in Intelligent Systems, 1988)
Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.
Natural Language Processing Statistical Inference: n-grams
CS 188: Artificial Intelligence Spring 2007 Speech Recognition 03/20/2007 Srini Narayanan – ICSI and UC Berkeley.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
N-Gram Model Formulas Word sequences Chain rule of probability Bigram approximation N-gram approximation.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
Discriminative n-gram language modeling Brian Roark, Murat Saraclar, Michael Collins Presented by Patty Liu.
Language Modeling Part II: Smoothing Techniques Niranjan Balasubramanian Slide Credits: Chris Manning, Dan Jurafsky, Mausam.
Language Model for Machine Translation Jang, HaYoung.
Automatic Speech Recognition
Learning, Uncertainty, and Information: Learning Parameters
Vocoders.
Statistical Language Models
Automatic Speech Recognition Introduction
Statistical Models for Automatic Speech Recognition
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Lecture 9: Speech Recognition (I) October 26, 2004 Dan Jurafsky
HUMAN LANGUAGE TECHNOLOGY: From Bits to Blogs
Speech Processing Speech Recognition
CS 188: Artificial Intelligence Fall 2008
Statistical Models for Automatic Speech Recognition
N-Gram Model Formulas Word sequences Chain rule of probability
Lecture 7 HMMs – the 3 Problems Forward Algorithm
Lecture 7 HMMs – the 3 Problems Forward Algorithm
Speech Processing Speech Recognition
Lecture 10: Speech Recognition (II) October 28, 2004 Dan Jurafsky
Presented by Wen-Hung Tsai Speech Lab, CSIE, NTNU 2005/07/13
CS 188: Artificial Intelligence Spring 2006
LECTURE 15: REESTIMATION, EM AND MIXTURES
CPSC 503 Computational Linguistics
汉语连续语音识别 年1月4日访北京工业大学 973 Project 2019/4/17 汉语连续语音识别 年1月4日访北京工业大学 郑 方 清华大学 计算机科学与技术系 语音实验室
Speech recognition, machine learning
Speech recognition, machine learning
Presentation transcript:

Speech Recognition: Acoustic Waves Human speech generates a wave like a loudspeaker moving A wave for the words “speech lab” looks like: s p ee ch l a b “l” to “a” transition: Graphs from Simon Arnfield’s web tutorial on speech, Sheffield: http://www.psyc.leeds.ac.uk/research/cogn/speech/tutorial/ June 1999

. . . Acoustic Sampling 10 ms frame (ms = millisecond = 1/1000 second) ~25 ms window around frame [wide band] to allow/smooth signal processing – it let’s you see formants 25 ms . . . 10ms Result: Acoustic Feature Vectors (after transformation, numbers in roughly R14) a1 a2 a3 June 1999

Spectral Analysis Frequency gives pitch; amplitude gives volume sampling at ~8 kHz phone, ~16 kHz mic (kHz=1000 cycles/sec) Fourier transform of wave displayed as a spectrogram darkness indicates energy at each frequency hundreds to thousands of frequency samples s p ee ch l a b amplitude frequency June 1999

The Speech Recognition Problem The Recognition Problem: Noisy channel model We started out with English words, they were encoded as an audio signal, and we now wish to decode. Find most likely sequence w of “words” given the sequence of acoustic observation vectors a Use Bayes’ law to create a generative model and then decode ArgMaxw P(w|a) = ArgMaxw P(a|w) P(w) / P(a) = ArgMaxw P(a|w) P(w) Acoustic Model: P(a|w) Language Model: P(w) June 1999

Probabilistic Language Models Assigns probability P(w) to word sequence w = w1 ,w2,…,wk Can’t directly compute probability of long sequence – one needs to decompose it Chain rule provides a history-based model: P(w1 ,w2,…,wk) = P(w1) P(w2|w1) P(w3|w1,w2) … P(wk|w1,…,wk-1) Cluster histories to reduce number of parameters E.g., just based on the last word (1st order Markov model): P(w1 ,w2,…,wk) = P(w1|<s>) P(w2|w1) P(w3|w2) … P(wk|wk-1) How do we estimate these probabilities? We count in corpora We smooth June 1999

N -gram Language Modeling n-gram assumption clusters based on last n-1 words P(wj|w1,…,wj-1) ~ P(wj|wj-n-1,…,wj-2 ,wj-1) unigrams ~ P(wj) bigrams ~ P(wj|wj-1) trigrams ~ P(wj|wj-2 ,wj-1) Trigrams often interpolated with bigram and unigram: the li typically estimated by maximum likelihood estimation on held out data (F(.|.) are relative frequencies) many other interpolations exist (another standard is a non-linear backoff) June 1999