SPEECH RECOGNITION BASED ON BAYESIAN NETWORKS WITH ENERGY AS AN AUXILIARY VARIABLE Jaume Escofet Carmona IDIAP, Martigny, Switzerland UPC, Barcelona, Spain.

SPEECH RECOGNITION BASED ON BAYESIAN NETWORKS WITH ENERGY AS AN AUXILIARY VARIABLE Jaume Escofet Carmona IDIAP, Martigny, Switzerland UPC, Barcelona, Spain

Contents Bayesian Networks Automatic Speech Recognition using Dynamic BNs Auxiliary variables Experiments with energy as an auxiliary variable Conclusions

What is a Bayesian Network? A BN is a type of graphical model composed of: A directed acyclic graph (DAG) A set of variables V = {v 1,…,v N } A set of probability density functions P(v n |parents(v n )) Example: P(V) = P(v 1,v 2,v 3 ) = P(v 1 |v 2 )  P(v 2 )  P(v 3 |v 2 ) v3v3 v1v1 v2v2 Joint distribution of V: P(V) = P(v n |parents(v n ))  n=1 N

Automatic Speech Recognition (ASR) Feature extraction Statistical models X = {x 1,…,x T } LPC, MFCC,...HMM, ANN,... M j = argmax P(M k |X) = argmax P(X|M k )  P(M k ) P(X|M k ) = p(x t |q t )  p(q t |q t-1 ) MjMj M 1 : ‘cat’ M 2 : ‘dog’ … M K : ‘tiger’ {M k }  T t=1

ASR with Dynamic Bayesian Networks acoustics x t phone q t /k//a/ /t/ Equivalent to a standard HMM t = 1t = 2t = 3t = 4

ASR with Dynamic Bayesian Networks P(q t | q t-1 ) q t-1 p(x t |q t =k) ~ N x (  k,  k ) qtqt xtxt x t-1

Auxiliary information (1) Main advantage of BNs: –Flexibility in defining dependencies between variables Energy damage the system performance if it is appended to the feature vector BNs allow us to use it in an alternative way: –Conditioning the emission distributions upon this auxiliary variable –Marginalizing it out in recognition

Auxiliary information (2) p(x t | q t =k, a t =z) ~ N x (  k +B k  z,  k ) The value of a t affects the value of x t qtqt atat xtxt

Auxiliary information (3) p(a t | q t =k) ~ N a (  ak,  ak ) p(x t | q t =k, a t =z) ~ N x (  k +B k z,  k ) The value of the auxiliary variable can be influenced by the hidden state q t qtqt atat xtxt

Auxiliary information (4) qtqt atat xtxt Equivalent to appending the auxiliary variable to the feature vector p(x t, a t | q t =k) ~ N xa (  k xa,  k xa )

qtqt atat xtxt Hiding auxiliary information We can also marginalize out (hide) the auxiliary variable in recognition Useful when: It is noisy It is not accessible p(x t |q t ) = p(x t |q t,a t )  p(a t |q t )da t 

Experimental setup Isolated word recognition Small vocabulary (75 words) Feature extraction: Mel Frequency Cepstral Coefficients (MFCC) p(x t |q t ) modeled with 4 mixtures of gaussians p(a t |q t ) modeled with 1 gaussian

Experiments with Energy as an auxiliary variable WER Observed Energy Hidden Energy System 1 6.9 %5.3 % System 2 6.1 %5.6 % System 3 5.8 %5.9 % Baseline5.9 % E = log s 2 [n]w 2 [n]  N n=1 Baseline System 1 System 2 System 3

Conclusions BNs are more flexible than HMMs. You can easily: –Change the topology of the distributions –Hide variables when necessary Energy can improve the system performance if used in a non-traditional way

Questions?

SPEECH RECOGNITION BASED ON BAYESIAN NETWORKS WITH ENERGY AS AN AUXILIARY VARIABLE Jaume Escofet Carmona IDIAP, Martigny, Switzerland UPC, Barcelona, Spain.

Similar presentations

Presentation on theme: "SPEECH RECOGNITION BASED ON BAYESIAN NETWORKS WITH ENERGY AS AN AUXILIARY VARIABLE Jaume Escofet Carmona IDIAP, Martigny, Switzerland UPC, Barcelona, Spain."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

SPEECH RECOGNITION BASED ON BAYESIAN NETWORKS WITH ENERGY AS AN AUXILIARY VARIABLE Jaume Escofet Carmona IDIAP, Martigny, Switzerland UPC, Barcelona, Spain.

Similar presentations

Presentation on theme: "SPEECH RECOGNITION BASED ON BAYESIAN NETWORKS WITH ENERGY AS AN AUXILIARY VARIABLE Jaume Escofet Carmona IDIAP, Martigny, Switzerland UPC, Barcelona, Spain."— Presentation transcript:

Similar presentations

About project

Feedback