74.419 Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.

Slides:



Advertisements
Similar presentations
Building an ASR using HTK CS4706
Advertisements

Automatic Speech Recognition II  Hidden Markov Models  Neural Network.
Introduction The aim the project is to analyse non real time EEG (Electroencephalogram) signal using different mathematical models in Matlab to predict.
Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
Hidden Markov Models Theory By Johan Walters (SR 2003)
SPEECH RECOGNITION Kunal Shalia and Dima Smirnov.
Natural Language Processing - Speech Processing -
Artificial Intelligence Speech and Natural Language Processing.
Speech and Natural Language Processing Christel Kemke Department of Computer Science University of Manitobe Presentation for Human-Computer Interaction.
 Christel Kemke 2007/08 COMP 4060 Natural Language Processing Introduction And Overview.
Application of HMMs: Speech recognition “Noisy channel” model of speech.
Probabilistic Pronunciation + N-gram Models CSPP Artificial Intelligence February 25, 2004.
4/25/2001ECE566 Philip Felber1 Speech Recognition A report of an Isolated Word experiment. By Philip Felber Illinois Institute of Technology April 25,
The Beatbox Voice-to-Drum Synthesizer A BSTRACT The Beatbox is a real time voice-to-drum synthesizer intended primarily for the entertainment of small.
Learning, Uncertainty, and Information Big Ideas November 8, 2004.
Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.
CS 188: Artificial Intelligence Fall 2009 Lecture 21: Speech Recognition 11/10/2009 Dan Klein – UC Berkeley TexPoint fonts used in EMF. Read the TexPoint.
1 Hidden Markov Model Instructor : Saeed Shiry  CHAPTER 13 ETHEM ALPAYDIN © The MIT Press, 2004.
COMP 4060 Natural Language Processing Speech Processing.
Why is ASR Hard? Natural speech is continuous
A PRESENTATION BY SHAMALEE DESHPANDE
Natural Language Understanding
Representing Acoustic Information
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Audio Processing for Ubiquitous Computing Uichin Lee KAIST KSE.
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
Eng. Shady Yehia El-Mashad
Isolated-Word Speech Recognition Using Hidden Markov Models
Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Speech Signal Processing
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Midterm Review Spoken Language Processing Prof. Andrew Rosenberg.
Fourier Concepts ES3 © 2001 KEDMI Scientific Computing. All Rights Reserved. Square wave example: V(t)= 4/  sin(t) + 4/3  sin(3t) + 4/5  sin(5t) +
Speech and Language Processing
7-Speech Recognition Speech Recognition Concepts
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
Introduction to CL & NLP CMSC April 1, 2003.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.
Audio processing methods on marine mammal vocalizations Xanadu Halkias Laboratory for the Recognition and Organization of Speech and Audio
Hidden Markov Models: Decoding & Training Natural Language Processing CMSC April 24, 2003.
Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.
Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.
Performance Comparison of Speaker and Emotion Recognition
ARTIFICIAL INTELLIGENCE FOR SPEECH RECOGNITION. Introduction What is Speech Recognition?  also known as automatic speech recognition or computer speech.
CS 188: Artificial Intelligence Spring 2007 Speech Recognition 03/20/2007 Srini Narayanan – ICSI and UC Berkeley.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
Acoustic Phonetics 3/14/00.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
Probabilistic Pronunciation + N-gram Models CMSC Natural Language Processing April 15, 2003.
CS 224S / LINGUIST 285 Spoken Language Processing
Speech Recognition
ARTIFICIAL NEURAL NETWORKS
Speech Processing Speech Recognition
CS 188: Artificial Intelligence Fall 2008
EE513 Audio Signals and Systems
Digital Systems: Hardware Organization and Design
Speech recognition, machine learning
Speech Recognition: Acoustic Waves
Artificial Intelligence 2004 Speech & Natural Language Processing
Speech recognition, machine learning
Presentation transcript:

Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech Recognition acoustic signal as input conversion into written words Spoken Language Understanding analysis of spoken language (transcribed speech)

Speech & Natural Language Processing Areas in Natural Language Processing Morphology Grammar & Parsing (syntactic analysis) Semantics Pragamatics Discourse / Dialogue Spoken Language Understanding Areas in Speech Recognition Signal Processing Phonetics Word Recognition

Speech Production & Reception Sound and Hearing change in air pressure  sound wave reception through inner ear membrane / microphone break-up into frequency components: receptors in cochlea / mathematical frequency analysis (e.g. Fast-Fourier Transform FFT)  Frequency Spectrum perception/recognition of phonemes and subsequently words (e.g. Neural Networks, Hidden-Markov Models)

Speech Recognition Phases Speech Recognition acoustic signal as input signal analysis - spectrogram feature extraction phoneme recognition word recognition conversion into written words

Speech Signal composed of different (sinus) waves with different frequencies and amplitudes Frequency - waves/second  like pitch Amplitude - height of wave  like loudness + noise(not sinus wave) Speech Signal composite signal comprising different frequency components

Waveform (fig. 7.20) Time Amplitude/ Pressure "She just had a baby."

Waveform for Vowel ae (fig. 7.21) Time Amplitude/ Pressure Time

Speech Signal Analysis Analog-Digital Conversion of Acoustic Signal Sampling in Time Frames ( “ windows ” )  frequency = 0-crossings per time frame  e.g. 2 crossings/second is 1 Hz (1 wave)  e.g. 10kHz needs sampling rate 20kHz  measure amplitudes of signal in time frame  digitized wave form  separate different frequency components  FFT (Fast Fourier Transform)  spectrogram  other frequency based representations  LPC (linear predictive coding),  Cepstrum

Waveform and Spectrogram (figs. 7.20, 7.23)

Waveform and LPC Spectrum for Vowel ae (figs. 7.21, 7.22) Energy Frequency Formants Amplitude/ Pressure Time

Speech Signal Characteristics From Signal Representation derive, e.g.  formants - dark stripes in spectrum strong frequency components; characterize particular vowels; gender of speaker  pitch – fundamental frequency baseline for higher frequency harmonics like formants; gender characteristic  change in frequency distribution characteristic for e.g. plosives (form of articulation)

Video of glottis and speech signal in lingWAVES (from

Phoneme Recognition Recognition Process based on features extracted from spectral analysis phonological rules statistical properties of language/ pronunciation Recognition Methods Hidden Markov Models Neural Networks Pattern Classification in general

Pronunciation Networks / Word Models as Probabilistic FAs (fig 5.12)

Pronunciation Network for 'about' (fig 5.13)

Word Recognition with Probabilistic FA / Markov Chain (fig 5.14)

Viterbi-Algorithm - Overview (cf. Jurafsky Ch.5) The Viterbi Algorithm finds an optimal sequence of states in continuous Speech Recognition, given an observation sequence of phones and a probabilistic (weighted) FA (state graph). The algorithm returns the path through the automaton which has maximum probability and accepts the observation sequence. a[s,s'] is the transition probability (in the phonetic word model) from current state s to next state s', and b[s',o t ] is the observation likelihood of s' given o t. b[s',o t ] is 1 if the observation symbol matches the state, and 0 otherwise.

function VITERBI(observations of len T, state-graph) returns best-path num-states  NUM-OF-STATES(state-graph) Create a path probability matrix viterbi[num-states+2,T+2] viterbi[0,0]  1.0 for each time step t from 0 to T do for each state s from 0 to num-states do for each transition s ' from s in state-graph new-score  viterbi[s,t] * a[s,s ' ] * b[s',(o t )] if ((viterbi[s ',t+1] = 0) || (new-score > viterbi[s ',t+1])) thenviterbi[s ',t+1]  new-score back-pointer[s ',t+1]  s Backtrace from highest probability state in the final column of viterbi[] and return path Viterbi-Algorithm (fig 5.19) word model observation (speech recognizer)

The Viterbi Algorithm sets up a probability matrix, with one column for each time index t and one row for each state in the state graph.Each column has a cell for each state q i in the single combined automaton for the competing words (in the recognition process). The algorithm first creates N+2 state columns. The first column is an initial pseudo-observation, the second corresponds to the first observation-phone, the third to the second observation and so on. The final column represents again a pseudo-observation. In the first column, the probability of the Start-state is initially set to 1.0; the other probabilities are 0. Then we move to the next state. For every state in column 0, we compute the probability of moving into each state in column 1. The value viterbi[t, j] is computed by taking the maximum over the extensions of all the paths that lead to the current cell. An extension of a path at state i at time t-1 is computed by multiplying the three factors: the previous path probability from the previous cell forward[t-1,i] the transition probability a i,j from previous state i to current state j the observation likelihood b jt that current state j matches observation symbol t. b jt is 1 if the observation symbol matches the state; 0 otherwise. Viterbi-Algorithm Explanation (cf. Jurafsky Ch.5)

Phoneme Recognition: HMM, Neural Networks Phonemes Acoustic / sound wave Filtering, Sampling Spectral Analysis; FFT Frequency Spectrum Features (Phonemes; Context) Grammar or Statistics Phoneme Sequences / Words Grammar or Statistics for likely word sequences Word Sequence / Sentence Speech Recognition Signal Processing / Analysis

Speech Recognizer Architecture (fig. 7.2)

Speech Processing - Important Types and Characteristics single word vs. continuous speech unlimited vs. large vs. small vocabulary speaker-dependent vs. speaker-independent training Speech Recognition vs. Speaker Identification

Additional References Hong, X. & A. Acero & H. Hon: Spoken Language Processing. A Guide to Theory, Algorithms, and System Development. Prentice-Hall, NJ, 2001 Figures taken from: Jurafsky, D. & J. H. Martin, Speech and Language Processing, Prentice-Hall, 2000, Chapters 5 and 7. lingWAVES (from