Vineel Pratap Girish Govind Abhilash Veeragouni. Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic.

Slides:



Advertisements
Similar presentations
ECG Signal processing (2)
Advertisements

Relevant characteristics extraction from semantically unstructured data PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Supervisor:
Look Who’s Talking Now SEM Exchange, Fall 2008 October 9, Montgomery College Speaker Identification Using Pitch Engineering Expo Banquet /08/09.
Speech and speaker normalization (in vowel normalization)
Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.
AUTOMATIC SPEECH CLASSIFICATION TO FIVE EMOTIONAL STATES BASED ON GENDER INFORMATION ABSTRACT We report on the statistics of global prosodic features of.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Nov 4, Detection, Classification and Tracking of Targets in Distributed Sensor Networks Presented by: Prabal Dutta Dan Li, Kerry Wong,
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Neural Net Algorithms for SC Vowel Recognition Presentation for EE645 Neural Networks and Learning Algorithms Spring 2003 Diana Stojanovic.
Page 0 of 8 Time Series Classification – phoneme recognition in reconstructed phase space Sanjay Patil Intelligent Electronics Systems Human and Systems.
Speaker Adaptation for Vowel Classification
Optimal Adaptation for Statistical Classifiers Xiao Li.
5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.
HIWIRE Progress Report – July 2006 Technical University of Crete Speech Processing and Dialog Systems Group Presenter: Alex Potamianos Technical University.
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,
A PRESENTATION BY SHAMALEE DESHPANDE
Advisor: Prof. Tony Jebara
Computational Analysis of USA Swimming Data Junfu Xu School of Computer Engineering and Science, Shanghai University.
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
LE 460 L Acoustics and Experimental Phonetics L-13
Kinect Player Gender Recognition from Speech Analysis
Eng. Shady Yehia El-Mashad
ENN: Extended Nearest Neighbor Method for Pattern Recognition
Topics covered in this chapter
Isolated-Word Speech Recognition Using Hidden Markov Models
Occasion:HUMAINE / WP4 / Workshop "From Signals to Signs of Emotion and Vice Versa" Santorin / Fira, 18th – 22nd September, 2004 Talk: Ronald Müller Speech.
Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.
„Bandwidth Extension of Speech Signals“ 2nd Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 22nd and 23rd June.
This week: overview on pattern recognition (related to machine learning)
Classification of place of articulation in unvoiced stops with spectro-temporal surface modeling V. Karjigi , P. Rao Dept. of Electrical Engineering,
1 Speech Perception 3/30/00. 2 Speech Perception How do we perceive speech? –Multifaceted process –Not fully understood –Models & theories attempt to.
Ajay Kumar, Member, IEEE, and David Zhang, Senior Member, IEEE.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Chapter 14 Speaker Recognition 14.1 Introduction to speaker recognition 14.2 The basic problems for speaker recognition 14.3 Approaches and systems 14.4.
Comparing Audio Signals Phase misalignment Deeper peaks and valleys Pitch misalignment Energy misalignment Embedded noise Length of vowels Phoneme variance.
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Advanced Topics in Speech Processing (IT60116) K Sreenivasa Rao School of Information Technology IIT Kharagpur.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Speaker Verification Speaker verification uses voice as a biometric to determine the authenticity of a user. Speaker verification systems consist of two.
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
VOCODERS. Vocoders Speech Coding Systems Implemented in the transmitter for analysis of the voice signal Complex than waveform coders High economy in.
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
July Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme.
Singer Similarity Doug Van Nort MUMT 611. Goal Determine Singer / Vocalist based on extracted features of audio signal Classify audio files based on singer.
Performance Comparison of Speaker and Emotion Recognition
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
Predicting Voice Elicited Emotions
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
Phone-Level Pronunciation Scoring and Assessment for Interactive Language Learning Speech Communication, 2000 Authors: S. M. Witt, S. J. Young Presenter:
RESEARCH MOTHODOLOGY SZRZ6014 Dr. Farzana Kabir Ahmad Taqiyah Khadijah Ghazali (814537) SENTIMENT ANALYSIS FOR VOICE OF THE CUSTOMER.
Speech Recognition through Neural Networks By Mohammad Usman Afzal Mohammad Waseem.
Detection Of Anger In Telephone Speech Using Support Vector Machine and Gaussian Mixture Model Prepared By : Siti Marahaini Binti Mahamood.
ARTIFICIAL NEURAL NETWORKS
Vocoders.
Spoken Digit Recognition
Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.
Supervised Time Series Pattern Discovery through Local Importance
Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan
PROJECT PROPOSAL Shamalee Deshpande.
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
EE513 Audio Signals and Systems
Presentation transcript:

Vineel Pratap Girish Govind Abhilash Veeragouni

Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic message ---- speaker's personality, emotional state, gender, age, dialect and the status of his/her health Gender classification is useful in speech and speaker recognition ---- better performance has been reported when gender-dependent acoustic-phonetic models are used by decreasing the word error rate of a baseline speech recognition system by 1.6%

No! Factors that limit gender classification include our inability to identify acoustic features sensitive to the task and yet robust enough to accommodate speaker articulation differences, vocal tract differences, prosodic variations The selected features should be time-invariant, phoneme independent, and identity-independent for speakers of the same gender There is always some NOISE

Physiological Vocal tract length of females is less than that of males The differences in physiological parameters lead to differences in acoustical parameters Acoustic  Pitch  Formant Frequencies  Zero Crossing Rate …

Vocal Tract can be modeled as a resonance tube with shape being varied according to phoneme uttered Fundamental frequency inversely proportion to length So, Pitch of female > Pitch of male

The perception of voice gender primarily relies on the fundamental frequency [f0] that is on average higher by an octave in female than male voices; yet, pitch overlaps considerably between male and female voices Although voice pitch and gender are linked, other information is used to recognise an individual’s gender from his/her voice.

The techniques used to process the speech signals can be classified as  Time domain analysis  Frequency domain analysis In Time domain analysis, the measurements are performed directly on the speech signal to extract information In Frequency domain analysis, the information is extracted from the frequency content of the signal to form the spectrum

Formant features: The distinction between men and women have been represented by the location in the frequency domain of the first 3 formants for vowels. Hence the set of formant features comprised by the statistics of the 4 formant frequency contours namely: Mean, minimum, maximum and variance values of the first four formants are considered. LPC coefficients and Cepstral Coefficients

The LPC coefficients make up a model of the vocal tract shape that produced the original speech signal.  An order of 13 is good enough to represent speech spectrum Where, H() = Z-transform n is the order w is the frequency a k s are the LPC coefficients k

Male Voice Female Voice STE Plots

Female Voice Male Voice ZCR Plots

a Female Voice Male Voice STAC Plots

These differences in the parameters obtained by short-time analysis of the male and female voice samples is used as the working principle of a Gender classifier which predicts the gender of a speaker in a voice signal Output of a gender classifier is a prediction value (label) of the actual speaker’s gender

Using the parameters obtained from the short time analysis of speech signal, classifiers can be implemented using following approaches: Naïve Bayes classifier Probabilistic Neural Networks (PNNs) Support Vector Machines (SVMs) K-NNs classifier GMM based classifier

A total of 11 classifiers tested are: 1)Naïve Bayes, 2)PNN, SVMs with 5 different kernels; 3): Gaussian RBF, denoted SVM1, 4): Multilayer perceptron, denoted SVM2, 5): Quadratic, denoted SVM3, 6): Linear, denoted SVM4, 7): Cubic polynomial, denoted SVM5, four K-NNs with different distance functions such as; 8): Euclidean, denoted KNN1, 9): Cityblock( i.e., sum of absolute differences), denoted KNN2, 10): Cosine-based (i.e., one minus the cosine of the included angle between patterns), denoted KNN3, and 11): correlation-based (i.e., one minus the sample correlation between patterns), denoted KNN4 respectively. The above mentioned classifiers are tested on a English Language Speech Database for Speech Recognition (ELSDSR)

The fig. (next slide) shows the Correct Gender classification rates for different classifiers on ELSDSR database when the size of test utterances is 20% of the total utterances (232) For each classifier, columns “Total”, “Male”, and “Female” correspond to the total correct gender classification rate, the rate of correct matches between the actual gender and the predicted one by the classifier for utterances by male speakers, and female speakers respectively. The arrows indicate the best rates

At the level of speech analysis, the short-time analysis is most basic approach to obtain the required parameters for the gender classification problem The differences in parameters is used as the working principle of a Gender classifier which can be implemented on any of the mentioned approaches The SVM with a suitable kernel (SVM1) has demonstrated to yield the most accurate results for gender classification with an accuracy more than 90% Main challenge faced by the Gender classifiers is the high frequency noise in the speech signal which leads to confusions in the gender prediction

[1] Douglas O’shaughnessy, “Speech Communications: Human and Machine”, IEEE Press, Newyork, [2] Sedaaghi M. H., “Gender classification in emotional speech”, In Speech Recognition: Technologies and Applications, pp. 363– 376, Itech, Vienna, Austria, [3] BhagyaLaxmi Jena, Abhishek Majhi, Beda Prakash Panigrahi, “Gender classification by Speech Analysis”. [4] M.Gomathy, K.Meena and K.R. Subramaniam, “Performance Analysis of Gender Clustering and Classification Algorithms”, International Journal on Computer Science and Engineering (IJCSE), 2009.

Thank You!