Presentation is loading. Please wait.

Presentation is loading. Please wait.

SPEECH RECOGNITION BASED ON BAYESIAN NETWORKS WITH ENERGY AS AN AUXILIARY VARIABLE Jaume Escofet Carmona IDIAP, Martigny, Switzerland UPC, Barcelona, Spain.

Similar presentations


Presentation on theme: "SPEECH RECOGNITION BASED ON BAYESIAN NETWORKS WITH ENERGY AS AN AUXILIARY VARIABLE Jaume Escofet Carmona IDIAP, Martigny, Switzerland UPC, Barcelona, Spain."— Presentation transcript:

1 SPEECH RECOGNITION BASED ON BAYESIAN NETWORKS WITH ENERGY AS AN AUXILIARY VARIABLE Jaume Escofet Carmona IDIAP, Martigny, Switzerland UPC, Barcelona, Spain

2 Contents Bayesian Networks Automatic Speech Recognition using Dynamic BNs Auxiliary variables Experiments with energy as an auxiliary variable Conclusions

3 What is a Bayesian Network? A BN is a type of graphical model composed of: A directed acyclic graph (DAG) A set of variables V = {v 1,…,v N } A set of probability density functions P(v n |parents(v n )) Example: P(V) = P(v 1,v 2,v 3 ) = P(v 1 |v 2 )  P(v 2 )  P(v 3 |v 2 ) v3v3 v1v1 v2v2 Joint distribution of V: P(V) = P(v n |parents(v n ))  n=1 N

4 Automatic Speech Recognition (ASR) Feature extraction Statistical models X = {x 1,…,x T } LPC, MFCC,...HMM, ANN,... M j = argmax P(M k |X) = argmax P(X|M k )  P(M k ) P(X|M k ) = p(x t |q t )  p(q t |q t-1 ) MjMj M 1 : ‘cat’ M 2 : ‘dog’ … M K : ‘tiger’ {M k }  T t=1

5 ASR with Dynamic Bayesian Networks acoustics x t phone q t /k//a/ /t/ Equivalent to a standard HMM t = 1t = 2t = 3t = 4

6 ASR with Dynamic Bayesian Networks P(q t | q t-1 ) q t-1 p(x t |q t =k) ~ N x (  k,  k ) qtqt xtxt x t-1

7 Auxiliary information (1) Main advantage of BNs: –Flexibility in defining dependencies between variables Energy damage the system performance if it is appended to the feature vector BNs allow us to use it in an alternative way: –Conditioning the emission distributions upon this auxiliary variable –Marginalizing it out in recognition

8 Auxiliary information (2) p(x t | q t =k, a t =z) ~ N x (  k +B k  z,  k ) The value of a t affects the value of x t qtqt atat xtxt

9 Auxiliary information (3) p(a t | q t =k) ~ N a (  ak,  ak ) p(x t | q t =k, a t =z) ~ N x (  k +B k z,  k ) The value of the auxiliary variable can be influenced by the hidden state q t qtqt atat xtxt

10 Auxiliary information (4) qtqt atat xtxt Equivalent to appending the auxiliary variable to the feature vector p(x t, a t | q t =k) ~ N xa (  k xa,  k xa )

11 qtqt atat xtxt Hiding auxiliary information We can also marginalize out (hide) the auxiliary variable in recognition Useful when: It is noisy It is not accessible p(x t |q t ) = p(x t |q t,a t )  p(a t |q t )da t 

12 Experimental setup Isolated word recognition Small vocabulary (75 words) Feature extraction: Mel Frequency Cepstral Coefficients (MFCC) p(x t |q t ) modeled with 4 mixtures of gaussians p(a t |q t ) modeled with 1 gaussian

13 Experiments with Energy as an auxiliary variable WER Observed Energy Hidden Energy System 1 6.9 %5.3 % System 2 6.1 %5.6 % System 3 5.8 %5.9 % Baseline5.9 % E = log s 2 [n]w 2 [n]  N n=1 Baseline System 1 System 2 System 3

14 Conclusions BNs are more flexible than HMMs. You can easily: –Change the topology of the distributions –Hide variables when necessary Energy can improve the system performance if used in a non-traditional way

15 Questions?


Download ppt "SPEECH RECOGNITION BASED ON BAYESIAN NETWORKS WITH ENERGY AS AN AUXILIARY VARIABLE Jaume Escofet Carmona IDIAP, Martigny, Switzerland UPC, Barcelona, Spain."

Similar presentations


Ads by Google