Presentation is loading. Please wait.

Presentation is loading. Please wait.

REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.

Similar presentations


Presentation on theme: "REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications."— Presentation transcript:

1 REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications University of Granada (SPAIN) Presenter: Chen, Hung-Bin ICASSP 2007

2 2 Outline Introduction Multiple Observation Likelihood Ratio Test Analysis Of The Proposed Algorithm (IEEE SIGNAL PROCESSING 2005) Revised MO-LRT (ICASSP 2007) Experimental

3 3 Introduction This paper is based on a revised contextual likelihood ratio test (LRT) and defined over a multiple observation window The new approach not only evaluates the two hypothesis consisting on all the observations to be speech or nonspeech The proposed method showed a speech/non-speech discrimination over a wide range of SNR conditions

4 4 Likelihood Ratio Test two-hypothesis test –Given an observation vector to be classified, the problem is reduced to selecting the class ( or ) with the largest posterior probability –Likelihood ratio test (LRT) is defined as: –where the observation vector is classified as if the likelihood ratio is greater than the ratio between the a priori class probabilities, otherwise it is classified as

5 5 Multiple Observation Likelihood Ratio Test To improve, a LRT for detecting the presence of speech in a noisy signal based on a Gaussian model –the multiple observation LRT (MO-LRT) considers not just a single observation vector measured at a frame t, but also an N-frame neighborhood –This test involves the evaluation of an N-th order LRT incorporating contextual information to the decision rule and exhibits significant improvements in speech/pause discrimination over the original LRT –Multiple Observation LRT (MO-LRT) is defined as: IEEE SIGNAL PROCESSING 2005

6 6 Multiple Observation Likelihood Ratio Test This test involves the evaluation of an N-th order LRT that enables a computationally efficient evaluation when the individual measurements are independent In this case An equivalent log LRT can be defined by taking logarithms IEEE SIGNAL PROCESSING 2005

7 7 Multiple Observation Likelihood Ratio Test The use of the MO-LRT for voice activity detection is mainly motivated by the multiple observation vector to improvements in robustness against the acoustic noise present in the environment. The MO-LRT is defined over the observation vectors The decision rule is defined by IEEE SIGNAL PROCESSING 2005

8 8 Multiple Observation Likelihood Ratio Test In order to evaluate the proposed MO-LRT VAD on an incoming signal, an adequate statistical model for the feature vectors in presence The model selected is similar to that assumes the discrete Fourier transform (DFT) coefficients of the clean speech and the noise to be asymptotically independent Gaussian random variables IEEE SIGNAL PROCESSING 2005

9 9 Analysis Of The Proposed Algorithm For this experiment –The AURORA subset of the Spanish SpeechDat-Car (SDC) was used –This database consists of recordings from distant and close-talking microphones in car environments at different driving conditions –The most unfavorable scenario (i.e., distant microphone at high speed over good road) with a 5 dB average SNR was considered IEEE SIGNAL PROCESSING 2005

10 10 Analysis Of The Proposed Algorithm Fig. 1(a) shows the distributions of the LRT during presence and absence of speech for increasing values of m. (a) Speech/Nonspeech distributions of the multiple observation likelihood ratio for different number of observations m IEEE SIGNAL PROCESSING 2005

11 11 Analysis Of The Proposed Algorithm Fig. 1(b) the speech, nonspeech, and global detection errors are shown as a function of m Note that the speech detection error is clearly reduced when increasing the value of m The global error exhibits a minimum value for m =6 frames (b) Error probabilities as a function of m IEEE SIGNAL PROCESSING 2005

12 12 Analysis Of The Proposed Algorithm Fig. 2 shows the nonspeech hit rate (HR0) versus the false alarm rate (FAR0=1- HR1, where HR1 denotes the speech hit rate) for recordings from the distant microphone at an average SNR of about 5 dB. It is clearly shown that increasing the number of observation vectors in the MO-LRT improves the performance of the proposed VAD IEEE SIGNAL PROCESSING 2005

13 13 Revised MO-LRT In this way, the test evaluates the probability that “all” the observations in the N frame of the central frame to be non-speech or speech This is the reason to revise the method in order to evaluate not just the two previous hypothesis G 0 and G 1 –the decision is made in favor of one of the two hypothesis: ICASSP 2007

14 14 Revised MO-LRT the multiple observation vector that is reindexed as for convenience of the presentation Each hypothesis H m can be defined in terms of a binary integer representation: ICASSP 2007

15 15 Revised MO-LRT Thus, each hypothesis H m consists of 2N+1 individual hypothesis involving the 2N+1 observations The classification problem is then reformulated as selecting the class i with the current frame depending on the bit b N+1 to assigning speech (G 1 ) or non- speech (G 0 ) If the set of all the possible hypothesis is splitted depending on the value of the central frame bit bN+1 as: the posterior probabilities are defined to be: ICASSP 2007

16 16 Revised MO-LRT Using the Bayes rule: a revised LRT can be defined as: Approximation to the statistical test replace the summation by the maximum value of the probability of the hypothesis in M 1 and M 0 : By taking logarithms this test is expressed in a more compact form: ICASSP 2007

17 17 Revised MO-LRT restrict the possible hypothesis –speech to non-speech or non-speech to speech transition in the N-frame neighborhood –K is the Hankel matrix: ICASSP 2007

18 18 Revised MO-LRT The matrix K can be splitted into two submatrices K 0 and K 1 then the test is easily reduced to: ICASSP 2007

19 19 Revised MO-LRT As an example for N = 1, the matrices K, K 0 and K 1 are defined to be: and L 1 and L 0 are computed by: ICASSP 2007

20 20 Revised MO-LRT Finally –The algorithm for voice activity detection is based on a comparison of a likelihood ratio to a given threshold : ICASSP 2007

21 21 Experimental Results The AURORA subset of the Spanish SpeechDat-Car (SDC) was used –Fig. 1 shows an utterance of the database in clean conditions (25 dB SNR) –Fig. 2 under the noisiest conditions (5 dB SNR) ICASSP 2007

22 22 Experimental Results ROC curves in quiet noise conditions (stopped car and engine running) and close talking microphone ICASSP 2007

23 23 Experimental Results ROC curves in high noise conditions (high speed over a good road) and distant talking microphone ICASSP 2007

24 24 Conclusion The new approach not only evaluates the two hypothesis consisting on all the observations to be speech or non-speech, but all the possible hypothesis defined over the individual observations Hankel matrix was introduced into the revised statistical test to smoothing process and reduced variance of the decision variable ICASSP 2007

25 25

26 26 Revised MO-LRT The algorithm is adaptive and suitable for non-stationary noise environments since the statistical properties are updated when the frame is classified as a non-speech frame. In this way, the variance of the noise is updated as:


Download ppt "REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications."

Similar presentations


Ads by Google