SPEECH RECOGNITION BASED ON BAYESIAN NETWORKS WITH ENERGY AS AN AUXILIARY VARIABLE Jaume Escofet Carmona IDIAP, Martigny, Switzerland UPC, Barcelona, Spain.

Slides:

Advertisements

Similar presentations

Building an ASR using HTK CS4706

Advertisements

Character Recognition using Hidden Markov Models Anthony DiPirro Ji Mei Sponsor:Prof. William Sverdlik.

Angelo Dalli Department of Intelligent Computing Systems

Frederico Rodrigues and Isabel Trancoso INESC/IST, 2000 Robust Recognition of Digits and Natural Numbers.

Using Motherese in Speech Recognition EE516 final project Steven Schimmel March 13, 2003.

Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections

Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.

Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –One exception: games with multiple moves In particular, the Bayesian.

Hidden Markov Models. Hidden Markov Model In some Markov processes, we may not be able to observe the states directly.

Hidden Markov Models K 1 … 2. Outline Hidden Markov Models – Formalism The Three Basic Problems of HMMs Solutions Applications of HMMs for Automatic Speech.

Part 6 HMM in Practice CSE717, SPRING 2008 CUBS, Univ at Buffalo.

A PRESENTATION BY SHAMALEE DESHPANDE

(1) A probability model respecting those covariance observations: Gaussian Maximum entropy probability distribution for a given covariance observation.

Automatic Speech Recognition Introduction Readings: Jurafsky & Martin HLT Survey Chapter 1.

May 20, 2006SRIV2006, Toulouse, France1 Acoustic Modeling of Accented English Speech for Large-Vocabulary Speech Recognition ATR Spoken Language Communication.

Read R&N Ch Next lecture: Read R&N

Audio Processing for Ubiquitous Computing Uichin Lee KAIST KSE.

Introduction to Automatic Speech Recognition

Isolated-Word Speech Recognition Using Hidden Markov Models

Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,

Automatic detection of microchiroptera echolocation calls from field recordings using machine learning algorithms Mark D. Skowronski and John G. Harris.

Diamantino Caseiro and Isabel Trancoso INESC/IST, 2000 Large Vocabulary Recognition Applied to Directory Assistance Services.

Automatic Speech Recognition (ASR): A Brief Overview.

A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.

Hierarchical Dirichlet Process (HDP) A Dirichlet process (DP) is a discrete distribution that is composed of a weighted sum of impulse functions. Weights.

Minimum Mean Squared Error Time Series Classification Using an Echo State Network Prediction Model Mark Skowronski and John Harris Computational Neuro-Engineering.

Jacob Zurasky ECE5526 – Spring 2011

Dynamic Bayesian Networks for Meeting Structuring Alfred Dielmann, Steve Renals (University of Sheffield)

LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.

Improving Speech Modelling Viktoria Maier Supervised by Prof. Hynek Hermansky.

Automatic Speech Recognition: Conditional Random Fields for ASR Jeremy Morris Eric Fosler-Lussier Ray Slyh 9/19/2008.

Hidden Markov Classifiers for Music Genres. Igor Karpov Rice University Comp 540 Term Project Fall 2002.

Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 8: GRAPHICAL MODELS.

CS Statistical Machine learning Lecture 24

CHAPTER 8 DISCRIMINATIVE CLASSIFIERS HIDDEN MARKOV MODELS.

Speech Recognition with CMU Sphinx Srikar Nadipally Hareesh Lingareddy.

Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology Mark Hasegawa-Johnson

Probabilistic reasoning over time Ch. 15, 17. Probabilistic reasoning over time So far, we’ve mostly dealt with episodic environments –Exceptions: games.

Lecture 2: Statistical learning primer for biologists

Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.

ICASSP 2007 Robustness Techniques Survey Presenter: Shih-Hsiang Lin.

HMM-Based Speech Synthesis Erica Cooper CS4706 Spring 2011.

Maximum Entropy Model, Bayesian Networks, HMM, Markov Random Fields, (Hidden/Segmental) Conditional Random Fields.

1 CMSC 671 Fall 2001 Class #20 – Thursday, November 8.

Statistical Models for Automatic Speech Recognition Lukáš Burget.

1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.

Structurally Discriminative Graphical Models for ASR The Graphical Models Team WS’2001.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

Dynamic Bayesian Network Fuzzy Systems Lifelog management.

Survey of Robust Speech Techniques in ICASSP 2009 Shih-Hsiang Lin ( 林士翔 ) 1Survey of Robustness Techniques in ICASSP 2009.

A Hybrid Model of HMM and RBFN Model of Speech Recognition 길이만, 김수연, 김성호, 원윤정, 윤아림 한국과학기술원 응용수학전공.

Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:

Automatic Speech Recognition

Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.

Automatic Speech Recognition Introduction

Variational Bayes Model Selection for Mixture Distribution

Read R&N Ch Next lecture: Read R&N

Statistical Models for Automatic Speech Recognition

Speech Processing Speech Recognition

Read R&N Ch Next lecture: Read R&N

Statistical Models for Automatic Speech Recognition

Automatic Speech Recognition: Conditional Random Fields for ASR

A Tutorial on Bayesian Speech Feature Enhancement

AUDIO SURVEILLANCE SYSTEMS: SUSPICIOUS SOUND RECOGNITION

LECTURE 15: REESTIMATION, EM AND MIXTURES

Read R&N Ch Next lecture: Read R&N

Presented by Chen-Wei Liu

Presentation transcript:

SPEECH RECOGNITION BASED ON BAYESIAN NETWORKS WITH ENERGY AS AN AUXILIARY VARIABLE Jaume Escofet Carmona IDIAP, Martigny, Switzerland UPC, Barcelona, Spain

Contents Bayesian Networks Automatic Speech Recognition using Dynamic BNs Auxiliary variables Experiments with energy as an auxiliary variable Conclusions

What is a Bayesian Network? A BN is a type of graphical model composed of: A directed acyclic graph (DAG) A set of variables V = {v 1,…,v N } A set of probability density functions P(v n |parents(v n )) Example: P(V) = P(v 1,v 2,v 3 ) = P(v 1 |v 2 )  P(v 2 )  P(v 3 |v 2 ) v3v3 v1v1 v2v2 Joint distribution of V: P(V) = P(v n |parents(v n ))  n=1 N

Automatic Speech Recognition (ASR) Feature extraction Statistical models X = {x 1,…,x T } LPC, MFCC,...HMM, ANN,... M j = argmax P(M k |X) = argmax P(X|M k )  P(M k ) P(X|M k ) = p(x t |q t )  p(q t |q t-1 ) MjMj M 1 : ‘cat’ M 2 : ‘dog’ … M K : ‘tiger’ {M k }  T t=1

ASR with Dynamic Bayesian Networks acoustics x t phone q t /k//a/ /t/ Equivalent to a standard HMM t = 1t = 2t = 3t = 4

ASR with Dynamic Bayesian Networks P(q t | q t-1 ) q t-1 p(x t |q t =k) ~ N x (  k,  k ) qtqt xtxt x t-1

Auxiliary information (1) Main advantage of BNs: –Flexibility in defining dependencies between variables Energy damage the system performance if it is appended to the feature vector BNs allow us to use it in an alternative way: –Conditioning the emission distributions upon this auxiliary variable –Marginalizing it out in recognition

Auxiliary information (2) p(x t | q t =k, a t =z) ~ N x (  k +B k  z,  k ) The value of a t affects the value of x t qtqt atat xtxt

Auxiliary information (3) p(a t | q t =k) ~ N a (  ak,  ak ) p(x t | q t =k, a t =z) ~ N x (  k +B k z,  k ) The value of the auxiliary variable can be influenced by the hidden state q t qtqt atat xtxt

Auxiliary information (4) qtqt atat xtxt Equivalent to appending the auxiliary variable to the feature vector p(x t, a t | q t =k) ~ N xa (  k xa,  k xa )

qtqt atat xtxt Hiding auxiliary information We can also marginalize out (hide) the auxiliary variable in recognition Useful when: It is noisy It is not accessible p(x t |q t ) = p(x t |q t,a t )  p(a t |q t )da t 

Experimental setup Isolated word recognition Small vocabulary (75 words) Feature extraction: Mel Frequency Cepstral Coefficients (MFCC) p(x t |q t ) modeled with 4 mixtures of gaussians p(a t |q t ) modeled with 1 gaussian

Experiments with Energy as an auxiliary variable WER Observed Energy Hidden Energy System %5.3 % System %5.6 % System %5.9 % Baseline5.9 % E = log s 2 [n]w 2 [n]  N n=1 Baseline System 1 System 2 System 3

Conclusions BNs are more flexible than HMMs. You can easily: –Change the topology of the distributions –Hide variables when necessary Energy can improve the system performance if used in a non-traditional way

Questions?