Variation of aspect ratio Voice section Correct voice section Voice Activity Detection by Lip Shape Tracking Using EBGM Purpose What is EBGM ？ Experimental.

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

Advanced Image Processing Student Seminar: Lipreading Method using color extraction method and eigenspace technique ( Yasuyuki Nakata and Moritoshi Ando.

Face Recognition and Biometric Systems Elastic Bunch Graph Matching.

ECE738 Advanced Image Processing Face Recognition by Elastic Bunch Graph Matching IEEE Trans. PAMI, July 1997.

Face Recognition By Sunny Tang.

Digital Image Processing In The Name Of God Digital Image Processing Lecture3: Image enhancement M. Ghelich Oghli By: M. Ghelich Oghli

Robust 3D Head Pose Classification using Wavelets by Mukesh C. Motwani Dr. Frederick C. Harris, Jr., Thesis Advisor December 5 th, 2002 A thesis submitted.

Amir Hosein Omidvarnia Spring 2007 Principles of 3D Face Recognition.

Adviser ： Ming-Yuan Shieh Student ID ： M Student ： Chung-Chieh Lien VIDEO OBJECT SEGMENTATION AND ITS SALIENT MOTION DETECTION USING ADAPTIVE BACKGROUND.

Fingerprint Imaging: Wavelet-Based Compression and Matched Filtering Grant Chen, Tod Modisette and Paul Rodriguez ELEC 301 : Rice University, Houston,

Face Recognition & Biometric Systems, 2005/2006 Face recognition process.

December 5, 2013Computer Vision Lecture 20: Hidden Markov Models/Depth 1 Stereo Vision Due to the limited resolution of images, increasing the baseline.

Survey of INTERSPEECH 2013 Reporter: Yi-Ting Wang 2013/09/10.

BEYOND SIMPLE FEATURES: A LARGE-SCALE FEATURE SEARCH APPROACH TO UNCONSTRAINED FACE RECOGNITION Nicolas Pinto Massachusetts Institute of Technology David.

LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.

Qian Chen, Guangtao Zhai, Xiaokang Yang, and Wenjun Zhang ISCAS,2008.

Efficient Moving Object Segmentation Algorithm Using Background Registration Technique Shao-Yi Chien, Shyh-Yih Ma, and Liang-Gee Chen, Fellow, IEEE Hsin-Hua.

Laurent Itti: CS599 – Computational Architectures in Biological Vision, USC Lecture 6: Low-level features 1 Computational Architectures in Biological.

Pores and Ridges: High- Resolution Fingerprint Matching Using Level 3 Features Anil K. Jain Yi Chen Meltem Demirkus.

Gaze Awareness for Videoconferencing: A Software Approach Nicolas Werro.

September 28 th 2004University of Utah1 A preliminary look Karthik Ramani Power and Temperature-Aware Microarchitecture.

Signal Analysis and Processing for SmartPET D. Scraggs, A. Boston, H Boston, R Cooper, A Mather, G Turk University of Liverpool C. Hall, I. Lazarus Daresbury.

Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.

Introduction to Wavelets -part 2

Object Detection Using the Statistics of Parts Henry Schneiderman Takeo Kanade Presented by : Sameer Shirdhonkar December 11, 2003.

A Vision-Based System that Detects the Act of Smoking a Cigarette Xiaoran Zheng, University of Nevada-Reno, Dept. of Computer Science Dr. Mubarak Shah,

Oral Defense by Sunny Tang 15 Aug 2003

Facial Recognition CSE 391 Kris Lord.

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING MARCH 2010 Lan-Ying Yeh

A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST

3D Fingertip and Palm Tracking in Depth Image Sequences

EE 492 ENGINEERING PROJECT LIP TRACKING Yusuf Ziya Işık & Ashat Turlibayev Yusuf Ziya Işık & Ashat Turlibayev Advisor: Prof. Dr. Bülent Sankur Advisor:

International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.

Texture. Texture is an innate property of all surfaces (clouds, trees, bricks, hair etc…). It refers to visual patterns of homogeneity and does not result.

TEMPLATE DESIGN © Zhiyao Duan 1,2, Lie Lu 1, and Changshui Zhang 2 1. Microsoft Research Asia (MSRA), Beijing, China.2.

1 Multiple Classifier Based on Fuzzy C-Means for a Flower Image Retrieval Keita Fukuda, Tetsuya Takiguchi, Yasuo Ariki Graduate School of Engineering,

Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.

1 Ying-li Tian, Member, IEEE, Takeo Kanade, Fellow, IEEE, and Jeffrey F. Cohn, Member, IEEE Presenter: I-Chung Hung Advisor: Dr. Yen-Ting Chen Date:

A Two-level Pose Estimation Framework Using Majority Voting of Gabor Wavelets and Bunch Graph Analysis J. Wu, J. M. Pedersen, D. Putthividhya, D. Norgaard,

NOISE DETECTION AND CLASSIFICATION IN SPEECH SIGNALS WITH BOOSTING Nobuyuki Miyake, Tetsuya Takiguchi and Yasuo Ariki Department of Computer and System.

 Detecting system  Training system Human Emotions Estimation by Adaboost based on Jinhui Chen, Tetsuya Takiguchi, Yasuo Ariki （ Kobe University ） User's.

Estimation of Sound Source Direction Using Parabolic Reflection Board 2008 RISP International Workshop on Nonlinear Circuits and Signal Processing (NCSP’08)

A Statistical Method for 3D Object Detection Applied to Face and Cars CVPR 2000 Henry Schneiderman and Takeo Kanade Robotics Institute, Carnegie Mellon.

Dynamic Captioning: Video Accessibility Enhancement for Hearing Impairment Richang Hong, Meng Wang, Mengdi Xuy Shuicheng Yany and Tat-Seng Chua School.

Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.

Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,

Active Microphone with Parabolic Reflection Board for Estimation of Sound Source Direction Tetsuya Takiguchi, Ryoichi Takashima and Yasuo Ariki Organization.

ICVGIP 2012 ICVGIP 2012 Speech training aids Visual feedback of the articulatory efforts during acquisition of speech production by a hearing-impaired.

Face Detection Using Skin Color and Gabor Wavelet Representation Information and Communication Theory Group Faculty of Information Technology and System.

A Tutorial on using SIFT Presented by Jimmy Huff (Slightly modified by Josiah Yoder for Winter )

Content Based Color Image Retrieval vi Wavelet Transformations Information Retrieval Class Presentation May 2, 2012 Author: Mrs. Y.M. Latha Presenter:

CS332 Visual Processing Department of Computer Science Wellesley College High-Level Vision Face Recognition I.

Glossary of Technical Terms Correlation filter: a set of carefully designed correlation templates with regard to shift invariance as well as distortion-

Image Enhancement Objective: better visualization of remotely sensed images visual interpretation remains to be the most powerful image interpretation.

Face Detection M 方科植. Method To detect front face, a face sketch is fetched and used to determine the real front face, and the sketch is processed.

“Articulatory Talking Head” Showcase Project, INRIA, KTH. Articulatory Talking Head driven by Automatic Speech Recognition INRIA, Parole Team KTH, Centre.

Spectral subtraction algorithm and optimize Wanfeng Zou 7/3/2014.

Face Detection & Recognition

Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.

Automated Detection of Human Emotion

Object Recognition in the Dynamic Link Architecture

Outline Linear Shift-invariant system Linear filters

Multimedia Information Retrieval

A maximum likelihood estimation and training on the fly approach

EE 492 ENGINEERING PROJECT

Anthor: Andreas Tsiartas, Prasanta Kumar Ghosh,

Automated Detection of Human Emotion

Presenter: Shih-Hsiang(士翔)

Measuring the Similarity of Rhythmic Patterns

Auditory Morphing Weyni Clacken

Presentation transcript:

Variation of aspect ratio Voice section Correct voice section Voice Activity Detection by Lip Shape Tracking Using EBGM Purpose What is EBGM ？ Experimental results Masaki AOKI Ken MASUDA Hiroyoshi MATSUDA Tetsuya TAKIGUCHI Yasuo ARIKI (Kobe University, Japan) Approach Voice activity detection (VAD) ・ Driver’s voice ・ Other’s voice ・ Lip movement without voice ・ Noise Voice activity detection of the target speaker Voice activity detection （ Speech GMM ） Lip shape extraction （ EBGM ） Lip movement （ Aspect Ratio ） Face Graph Gabor Wavelet Bunch Graph EBGM If the voice section extracted by speech GMM and the voice section extracted by EBGM exists on same time, the time section is regarded as the proposed voice section. Face graph Bunch Graph Bunch of Jets Original image Gabor wavelets Real part Convolution result Magnitude Gabor wavelets can extract global and local features by changing spatial frequency, and can extract features related to wavelet's orientation. A jet is a set of convolution coefficients obtained by applying Gabor kernels with different frequencies and orientations to a point in an image. This graph is used for grasping lip shape and reducing computational time etc. Using a bunch graph, the location of facial feature points can be searched for under several condition. Lip movement ・ The difference of the aspect ratio between consecutive frames is computed as the lip movement. Lip shape tracking using EBGM Coarse search for determining initial facial feature point. Local search starts using the greedy-like method Local search Bunch graph is pasted Face graph is extracted Input test data. Detect all Detect true Recall (%) Precision (%) Proposed: Male Proposed: Female Difference of lip region Speech GMM only: Male Speech GMM only: Female Voice section is separated or combined by utterance of small lip movement and noise. Since the performance of VAD by lip shape tracking only is poor, we integrated lip movement and acoustic processing. Further experiments on larger data sets. Realization of dynamic thresholding to the change of the aspect ratio. Elastic Bunch Graph Matching (EBGM) is employed to extract the detailed lip shape. Speech GMM Calculation of voice likelihood ratio is as follows. If this voice likelihood ratio exceeds some threshold, it is regarded as voice section. Future work ・ One Japanese male ・ One Japanese female 100 words of Japanese city name were uttered in the car under the idling condition in the daytime. The acoustic signal included driver’s voice and car noise, but not other person’s voices. Thus, the simulated voice of 100 words were manually inserted into the intervals between the driver’s voice sections. An infrared camera is used to cope with the change of lighting environment. The thresholds are manually specified. Test data We would like to detect only the driver’s utterance in a car for car navigation system. But, it is difficult to judge whether the detected voice is driver’s or not when only the acoustic signal is proceeded, due to the car noise, music and voices other than the driver. Therefore we integrated visual and acoustic information to deal with this problem. We propose a new method that can track the driver’s lip movement and calculate the dynamics of the lip aspect ratio. Jet Real part Imaginary part Orientations Frequency low high Frequency low high Jet Map Jet to disk Correct voice section VAD from speech GMM only VAD from EBGM Proposed voice section