REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications.

Slides:

Advertisements

Similar presentations

ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.

Advertisements

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.

Author :Panikos Heracleous, Tohru Shimizu AN EFFICIENT KEYWORD SPOTTING TECHNIQUE USING A COMPLEMENTARY LANGUAGE FOR FILLER MODELS TRAINING Reporter :

Fast Bayesian Matching Pursuit Presenter: Changchun Zhang ECE / CMR Tennessee Technological University November 12, 2010 Reading Group (Authors: Philip.

Pattern Classification. Chapter 2 (Part 1): Bayesian Decision Theory (Sections ) Introduction Bayesian Decision Theory–Continuous Features.

Hidden Markov Models Reading: Russell and Norvig, Chapter 15, Sections

An Energy Search Approach to Variable Frame Rate Front-End Processing for Robust ASR Julien Epps and Eric H. C. Choi National ICT Australia Presenter:

AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION Michael L. Seltzer, Dong Yu Yongqiang Wang ICASSP 2013 Presenter : 張庭豪.

Model-Based Fusion of Bone and Air Sensors for Speech Enhancement and Robust Speech Recognition John Hershey, Trausti Kristjansson, Zhengyou Zhang, Alex.

HIWIRE MEETING Nancy, July 6-7, 2006 José C. Segura, Ángel de la Torre.

HIWIRE MEETING Torino, March 9-10, 2006 José C. Segura, Javier Ramírez.

Speaker Adaptation for Vowel Classification

Multiple Human Objects Tracking in Crowded Scenes Yao-Te Tsai, Huang-Chia Shih, and Chung-Lin Huang Dept. of EE, NTHU International Conference on Pattern.

Speech Enhancement Based on a Combination of Spectral Subtraction and MMSE Log-STSA Estimator in Wavelet Domain LATSI laboratory, Department of Electronic,

HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR.

Advances in WP1 and WP2 Paris Meeting – 11 febr

HIWIRE MEETING Trento, January 11-12, 2007 José C. Segura, Javier Ramírez.

Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.

Jacinto C. Nascimento, Member, IEEE, and Jorge S. Marques

Decision Tree Models in Data Mining

Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.

Data Selection In Ad-Hoc Wireless Sensor Networks Olawoye Oyeyele 11/24/2003.

METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.

EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.

HMM-BASED PSEUDO-CLEAN SPEECH SYNTHESIS FOR SPLICE ALGORITHM Jun Du, Yu Hu, Li-Rong Dai, Ren-Hua Wang Wen-Yi Chu Department of Computer Science & Information.

A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST

Isolated-Word Speech Recognition Using Hidden Markov Models

Improving Utterance Verification Using a Smoothed Na ï ve Bayes Model Reporter : CHEN, TZAN HWEI Author :Alberto Sanchis, Alfons Juan and Enrique Vidal.

POWER CONTROL IN COGNITIVE RADIO SYSTEMS BASED ON SPECTRUM SENSING SIDE INFORMATION Karama Hamdi, Wei Zhang, and Khaled Ben Letaief The Hong Kong University.

SVCL Automatic detection of object based Region-of-Interest for image compression Sunhyoung Han.

Cepstral Vector Normalization based On Stereo Data for Robust Speech Recognition Presenter: Shih-Hsiang Lin Luis Buera, Eduardo Lleida, Antonio Miguel,

Segmental Hidden Markov Models with Random Effects for Waveform Modeling Author: Seyoung Kim & Padhraic Smyth Presentor: Lu Ren.

Ekapol Chuangsuwanich and James Glass MIT Computer Science and Artificial Intelligence Laboratory,Cambridge, Massachusetts 02139,USA 2012/07/2 汪逸婷.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.

LOG-ENERGY DYNAMIC RANGE NORMALIZATON FOR ROBUST SPEECH RECOGNITION Weizhong Zhu and Douglas O’Shaughnessy INRS-EMT, University of Quebec Montreal, Quebec,

ECE 8433: Statistical Signal Processing Detection of Uncovered Background and Moving Pixels Detection of Uncovered Background & Moving Pixels Presented.

ICASSP Speech Discrimination Based on Multiscale Spectro–Temporal Modulations Nima Mesgarani, Shihab Shamma, University of Maryland Malcolm Slaney.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING Nobuyuki Miyake, Tetsuya Takiguchi and Yasuo Ariki Department of Computer and System.

Estimation of Number of PARAFAC Components

Additive Data Perturbation: the Basic Problem and Techniques.

1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.

Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.

Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,

Boosted Particle Filter: Multitarget Detection and Tracking Fayin Li.

Lecture 2: Statistical learning primer for biologists

© Devi Parikh 2008 Devi Parikh and Tsuhan Chen Carnegie Mellon University April 3, ICASSP 2008 Bringing Diverse Classifiers to Common Grounds: dtransform.

ICASSP 2006 Robustness Techniques Survey ShihHsiang 2006.

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

Presented by: Fang-Hui Chu Discriminative Models for Speech Recognition M.J.F. Gales Cambridge University Engineering Department 2007.

Chapter 13 (Prototype Methods and Nearest-Neighbors )

A New Approach to Utterance Verification Based on Neighborhood Information in Model Space Author :Hui Jiang, Chin-Hui Lee Reporter : 陳燦輝.

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:

- A Maximum Likelihood Approach Vinod Kumar Ramachandran ID:

Utterance verification in continuous speech recognition decoding and training Procedures Author :Eduardo Lleida, Richard C. Rose Reporter : 陳燦輝.

Logistic Regression: Regression with a Binary Dependent Variable.

Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.

Lecture 1.31 Criteria for optimal reception of radio signals.

Speech Enhancement with Binaural Cues Derived from a Priori Codebook

朝陽科技大學資訊工程系謝政勳 Application of GM(1,1) Model to Speech Enhancement and Voice Activity Detection 朝陽科技大學資訊工程系謝政勳

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

On the Integration of Speech Recognition into Personal Networks

EE513 Audio Signals and Systems

Parametric Methods Berlin Chen, 2005 References:

Multivariate Methods Berlin Chen

NON-NEGATIVE COMPONENT PARTS OF SOUND FOR CLASSIFICATION Yong-Choon Cho, Seungjin Choi, Sung-Yang Bang Wen-Yi Chu Department of Computer Science &

Presenter: Shih-Hsiang(士翔)

Combination of Feature and Channel Compensation (1/2)

Presentation transcript:

REVISED CONTEXTUAL LRT FOR VOICE ACTIVITY DETECTION Javier Ram’ırez, Jos’e C. Segura and J.M. G’orriz Dept. of Signal Theory Networking and Communications University of Granada (SPAIN) Presenter: Chen, Hung-Bin ICASSP 2007

2 Outline Introduction Multiple Observation Likelihood Ratio Test Analysis Of The Proposed Algorithm (IEEE SIGNAL PROCESSING 2005) Revised MO-LRT (ICASSP 2007) Experimental

3 Introduction This paper is based on a revised contextual likelihood ratio test (LRT) and defined over a multiple observation window The new approach not only evaluates the two hypothesis consisting on all the observations to be speech or nonspeech The proposed method showed a speech/non-speech discrimination over a wide range of SNR conditions

4 Likelihood Ratio Test two-hypothesis test –Given an observation vector to be classified, the problem is reduced to selecting the class ( or ) with the largest posterior probability –Likelihood ratio test (LRT) is defined as: –where the observation vector is classified as if the likelihood ratio is greater than the ratio between the a priori class probabilities, otherwise it is classified as

5 Multiple Observation Likelihood Ratio Test To improve, a LRT for detecting the presence of speech in a noisy signal based on a Gaussian model –the multiple observation LRT (MO-LRT) considers not just a single observation vector measured at a frame t, but also an N-frame neighborhood –This test involves the evaluation of an N-th order LRT incorporating contextual information to the decision rule and exhibits significant improvements in speech/pause discrimination over the original LRT –Multiple Observation LRT (MO-LRT) is defined as: IEEE SIGNAL PROCESSING 2005

6 Multiple Observation Likelihood Ratio Test This test involves the evaluation of an N-th order LRT that enables a computationally efficient evaluation when the individual measurements are independent In this case An equivalent log LRT can be defined by taking logarithms IEEE SIGNAL PROCESSING 2005

7 Multiple Observation Likelihood Ratio Test The use of the MO-LRT for voice activity detection is mainly motivated by the multiple observation vector to improvements in robustness against the acoustic noise present in the environment. The MO-LRT is defined over the observation vectors The decision rule is defined by IEEE SIGNAL PROCESSING 2005

8 Multiple Observation Likelihood Ratio Test In order to evaluate the proposed MO-LRT VAD on an incoming signal, an adequate statistical model for the feature vectors in presence The model selected is similar to that assumes the discrete Fourier transform (DFT) coefficients of the clean speech and the noise to be asymptotically independent Gaussian random variables IEEE SIGNAL PROCESSING 2005

9 Analysis Of The Proposed Algorithm For this experiment –The AURORA subset of the Spanish SpeechDat-Car (SDC) was used –This database consists of recordings from distant and close-talking microphones in car environments at different driving conditions –The most unfavorable scenario (i.e., distant microphone at high speed over good road) with a 5 dB average SNR was considered IEEE SIGNAL PROCESSING 2005

10 Analysis Of The Proposed Algorithm Fig. 1(a) shows the distributions of the LRT during presence and absence of speech for increasing values of m. (a) Speech/Nonspeech distributions of the multiple observation likelihood ratio for different number of observations m IEEE SIGNAL PROCESSING 2005

11 Analysis Of The Proposed Algorithm Fig. 1(b) the speech, nonspeech, and global detection errors are shown as a function of m Note that the speech detection error is clearly reduced when increasing the value of m The global error exhibits a minimum value for m =6 frames (b) Error probabilities as a function of m IEEE SIGNAL PROCESSING 2005

12 Analysis Of The Proposed Algorithm Fig. 2 shows the nonspeech hit rate (HR0) versus the false alarm rate (FAR0=1- HR1, where HR1 denotes the speech hit rate) for recordings from the distant microphone at an average SNR of about 5 dB. It is clearly shown that increasing the number of observation vectors in the MO-LRT improves the performance of the proposed VAD IEEE SIGNAL PROCESSING 2005

13 Revised MO-LRT In this way, the test evaluates the probability that “all” the observations in the N frame of the central frame to be non-speech or speech This is the reason to revise the method in order to evaluate not just the two previous hypothesis G 0 and G 1 –the decision is made in favor of one of the two hypothesis: ICASSP 2007

14 Revised MO-LRT the multiple observation vector that is reindexed as for convenience of the presentation Each hypothesis H m can be defined in terms of a binary integer representation: ICASSP 2007

15 Revised MO-LRT Thus, each hypothesis H m consists of 2N+1 individual hypothesis involving the 2N+1 observations The classification problem is then reformulated as selecting the class i with the current frame depending on the bit b N+1 to assigning speech (G 1 ) or non- speech (G 0 ) If the set of all the possible hypothesis is splitted depending on the value of the central frame bit bN+1 as: the posterior probabilities are defined to be: ICASSP 2007

16 Revised MO-LRT Using the Bayes rule: a revised LRT can be defined as: Approximation to the statistical test replace the summation by the maximum value of the probability of the hypothesis in M 1 and M 0 : By taking logarithms this test is expressed in a more compact form: ICASSP 2007

17 Revised MO-LRT restrict the possible hypothesis –speech to non-speech or non-speech to speech transition in the N-frame neighborhood –K is the Hankel matrix: ICASSP 2007

18 Revised MO-LRT The matrix K can be splitted into two submatrices K 0 and K 1 then the test is easily reduced to: ICASSP 2007

19 Revised MO-LRT As an example for N = 1, the matrices K, K 0 and K 1 are defined to be: and L 1 and L 0 are computed by: ICASSP 2007

20 Revised MO-LRT Finally –The algorithm for voice activity detection is based on a comparison of a likelihood ratio to a given threshold : ICASSP 2007

21 Experimental Results The AURORA subset of the Spanish SpeechDat-Car (SDC) was used –Fig. 1 shows an utterance of the database in clean conditions (25 dB SNR) –Fig. 2 under the noisiest conditions (5 dB SNR) ICASSP 2007

22 Experimental Results ROC curves in quiet noise conditions (stopped car and engine running) and close talking microphone ICASSP 2007

23 Experimental Results ROC curves in high noise conditions (high speed over a good road) and distant talking microphone ICASSP 2007

24 Conclusion The new approach not only evaluates the two hypothesis consisting on all the observations to be speech or non-speech, but all the possible hypothesis defined over the individual observations Hankel matrix was introduced into the revised statistical test to smoothing process and reduced variance of the decision variable ICASSP 2007

25

26 Revised MO-LRT The algorithm is adaptive and suitable for non-stationary noise environments since the statistical properties are updated when the frame is classified as a non-speech frame. In this way, the variance of the noise is updated as: