Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.

Slides:



Advertisements
Similar presentations
© Fraunhofer FKIE Corinna Harwardt Automatic Speaker Recognition in Military Environment.
Advertisements

Frederico Rodrigues and Isabel Trancoso INESC/IST, 2000 Robust Recognition of Digits and Natural Numbers.
Vineel Pratap Girish Govind Abhilash Veeragouni. Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
Emotions and Voice Quality: Experiments with Sinusoidal Modeling Authors: Carlo Drioli, Graziano Tisato, Piero Cosi, Fabio Tesser Institute of Cognitive.
Toward Semantic Indexing and Retrieval Using Hierarchical Audio Models Wei-Ta Chu, Wen-Huang Cheng, Jane Yung-Jen Hsu and Ja-LingWu Multimedia Systems,
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Speaker Adaptation for Vowel Classification
Signal Modeling for Robust Speech Recognition With Frequency Warping and Convex Optimization Yoon Kim March 8, 2000.
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,
Why is ASR Hard? Natural speech is continuous
Advisor: Prof. Tony Jebara
Representing Acoustic Information
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
Acoustic and Linguistic Characterization of Spontaneous Speech Masanobu Nakamura, Koji Iwano, and Sadaoki Furui Department of Computer Science Tokyo Institute.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
All features considered separately are relevant in a speech / music classification task. The fusion allows to raise the accuracy rate up to 94% for speech.
Schizophrenia and Depression – Evidence in Speech Prosody Student: Yonatan Vaizman Advisor: Prof. Daphna Weinshall Joint work with Roie Kliper and Dr.
Study of Word-Level Accent Classification and Gender Factors
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
INTRODUCTION  Sibilant speech is aperiodic.  the fricatives /s/, / ʃ /, /z/ and / Ʒ / and the affricatives /t ʃ / and /d Ʒ /  we present a sibilant.
VBS Documentation and Implementation The full standard initiative is located at Quick description Standard manual.
Diamantino Caseiro and Isabel Trancoso INESC/IST, 2000 Large Vocabulary Recognition Applied to Directory Assistance Services.
Modeling speech signals and recognizing a speaker.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
IAFPA 2007 Plymouth, July 22-25, 2007 Developments in automatic speaker recognition at the BKA Michael Jessen, Bundeskriminalamt Franz Broß, Univ. Applied.
Jacob Zurasky ECE5526 – Spring 2011
Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.
Csc Lecture 7 Recognizing speech. Geoffrey Hinton.
Supervisor: Dr. Eddie Jones Co-supervisor: Dr Martin Glavin Electronic Engineering Department Final Year Project 2008/09 Development of a Speaker Recognition/Verification.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Korean Phoneme Discrimination Ben Lickly Motivation Certain Korean phonemes are very difficult for English speakers to distinguish, such as ㅅ and ㅆ.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Speaker Verification Speaker verification uses voice as a biometric to determine the authenticity of a user. Speaker verification systems consist of two.
A Baseline System for Speaker Recognition C. Mokbel, H. Greige, R. Zantout, H. Abi Akl A. Ghaoui, J. Chalhoub, R. Bayeh University Of Balamand - ELISA.
Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter.
SVMs for (x) Recognition (From Moghaddam / Yang’s “Gender Classification with SVMs”) Brian Whitman.
July Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Performance Comparison of Speaker and Emotion Recognition
ICASSP 2006 Robustness Techniques Survey ShihHsiang 2006.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
Arlindo Veiga Dirce Celorico Jorge Proença Sara Candeias Fernando Perdigão Prosodic and Phonetic Features for Speaking Styles Classification and Detection.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
Chapter 7 Speech Recognition Framework  7.1 The main form and application of speech recognition  7.2 The main factors of speech recognition  7.3 The.
Noise Reduction in Speech Recognition Professor:Jian-Jiun Ding Student: Yung Chang 2011/05/06.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
1 Voicing Features Horacio Franco, Martin Graciarena Andreas Stolcke, Dimitra Vergyri, Jing Zheng STAR Lab. SRI International.
Cross-Dialectal Data Transferring for Gaussian Mixture Model Training in Arabic Speech Recognition Po-Sen Huang Mark Hasegawa-Johnson University of Illinois.
Detection Of Anger In Telephone Speech Using Support Vector Machine and Gaussian Mixture Model Prepared By : Siti Marahaini Binti Mahamood.
Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.
CS 445/656 Computer & New Media
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
ARTIFICIAL NEURAL NETWORKS
Spoken Digit Recognition
Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.
PROJECT PROPOSAL Shamalee Deshpande.
3. Applications to Speaker Verification
Mel-spectrum to Mel-cepstrum Computation A Speech Recognition presentation October Ji Gu
Presentation for EEL6586 Automatic Speech Processing
Sfax University, Tunisia
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
Audio and Speech Computers & New Media.
Digital Systems: Hardware Organization and Design
John H.L. Hansen & Taufiq Al Babba Hasan
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008

1 Previous work Characteristic acoustic features Jitter and Shimmer (C. Müller et. al.) Phonetic cues (S. Schoetz) Cepstral coefficients Motivation and intuition behind this work Features such as cepstral coefficients characterize the exact content of the signal. Much of this Information is not useful for age/gender classification, e.g. we can identify age/gender from a speech in a foriegn language that we do not understand. Therefore, features which characterize slowly varying temporal envelope should be more advantageous.

2 Mel Cepstrum Modulation Spectrum features (V. Tyagi et. al.) n: time instant k: cepstral coefficient index q: Modulation freuency index P: Context Window Width (11 frames) Cepstrum Computation Input Speech P Frames n k MCMS Features

Experimental Setup. Task 7 Target classes: Children (<= 13 years) Young Male (>13, <=20 years) Young Female (>13, <=20 years) Adult Male (>20, <=65 years) Adult Female (>20, <=65 years) Senior Male (> 65 years) Senior Female (> 65 years)

4 Experimental Setup Dataset German SpeechDat Corpus 4000 Native German Speakers 80 speaker of each class were used for training, ~44 utterances each. 20 speakers of each class were used for testing. Data from different domain (VoiceClass, 660 utterances) was also used for testing. Total ~6000 utterances used for testing. A human-labelling experiment on a subset of test data yielded ~55% Overall classification accuracy.

5 Performance Both systems have equal dimension (21) of Feature vectors and hence same number Of parameters. Both systems are based on GMM (Gaussian Mixture Model) acoustic model and maximumLikelihood classifier.

6 Analysis Performance of MCMS features as function of duration and in/out-domain data. Classification accuracy saturates at 3 modulation frequencies (3-14 Hz) and starts dropping after 4 Modulation frequencies. This also explains why MFCC Features perform worse than MCMS features. Modulation Frequency response of first 3 MCMS filters. These 3 filters provide complimentary Information. For speech recognition, 7 filters (3-22 Hz) provide best performance.

7 Questions? Thank You.