Presentation is loading. Please wait.

Presentation is loading. Please wait.

Variation of aspect ratio Voice section Correct voice section Voice Activity Detection by Lip Shape Tracking Using EBGM Purpose What is EBGM ? Experimental.

Similar presentations


Presentation on theme: "Variation of aspect ratio Voice section Correct voice section Voice Activity Detection by Lip Shape Tracking Using EBGM Purpose What is EBGM ? Experimental."— Presentation transcript:

1 Variation of aspect ratio Voice section Correct voice section Voice Activity Detection by Lip Shape Tracking Using EBGM Purpose What is EBGM ? Experimental results Masaki AOKI Ken MASUDA Hiroyoshi MATSUDA Tetsuya TAKIGUCHI Yasuo ARIKI (Kobe University, Japan) Approach Voice activity detection (VAD) ・ Driver’s voice ・ Other’s voice ・ Lip movement without voice ・ Noise Voice activity detection of the target speaker Voice activity detection ( Speech GMM ) Lip shape extraction ( EBGM ) Lip movement ( Aspect Ratio ) Face Graph Gabor Wavelet Bunch Graph EBGM If the voice section extracted by speech GMM and the voice section extracted by EBGM exists on same time, the time section is regarded as the proposed voice section. Face graph Bunch Graph Bunch of Jets Original image Gabor wavelets Real part Convolution result Magnitude Gabor wavelets can extract global and local features by changing spatial frequency, and can extract features related to wavelet's orientation. A jet is a set of convolution coefficients obtained by applying Gabor kernels with different frequencies and orientations to a point in an image. This graph is used for grasping lip shape and reducing computational time etc. Using a bunch graph, the location of facial feature points can be searched for under several condition. Lip movement ・ The difference of the aspect ratio between consecutive frames is computed as the lip movement. Lip shape tracking using EBGM Coarse search for determining initial facial feature point. Local search starts using the greedy-like method Local search Bunch graph is pasted Face graph is extracted Input test data. Detect all Detect true Recall (%) Precision (%) Proposed: Male 106100 94 Proposed: Female 116100 85 Difference of lip region 133100 75 Speech GMM only: Male 200100 50 Speech GMM only: Female 200100 50 Voice section is separated or combined by utterance of small lip movement and noise. Since the performance of VAD by lip shape tracking only is poor, we integrated lip movement and acoustic processing. Further experiments on larger data sets. Realization of dynamic thresholding to the change of the aspect ratio. Elastic Bunch Graph Matching (EBGM) is employed to extract the detailed lip shape. Speech GMM Calculation of voice likelihood ratio is as follows. If this voice likelihood ratio exceeds some threshold, it is regarded as voice section. Future work ・ One Japanese male ・ One Japanese female 100 words of Japanese city name were uttered in the car under the idling condition in the daytime. The acoustic signal included driver’s voice and car noise, but not other person’s voices. Thus, the simulated voice of 100 words were manually inserted into the intervals between the driver’s voice sections. An infrared camera is used to cope with the change of lighting environment. The thresholds are manually specified. Test data We would like to detect only the driver’s utterance in a car for car navigation system. But, it is difficult to judge whether the detected voice is driver’s or not when only the acoustic signal is proceeded, due to the car noise, music and voices other than the driver. Therefore we integrated visual and acoustic information to deal with this problem. We propose a new method that can track the driver’s lip movement and calculate the dynamics of the lip aspect ratio. Jet Real part Imaginary part Orientations Frequency low high Frequency low high Jet Map Jet to disk Correct voice section VAD from speech GMM only VAD from EBGM Proposed voice section


Download ppt "Variation of aspect ratio Voice section Correct voice section Voice Activity Detection by Lip Shape Tracking Using EBGM Purpose What is EBGM ? Experimental."

Similar presentations


Ads by Google