Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.

Slides:



Advertisements
Similar presentations
Slides from: Doug Gray, David Poole
Advertisements

1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Multilayer Perceptrons 1. Overview  Recap of neural network theory  The multi-layered perceptron  Back-propagation  Introduction to training  Uses.
Vineel Pratap Girish Govind Abhilash Veeragouni. Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic.
Introduction The aim the project is to analyse non real time EEG (Electroencephalogram) signal using different mathematical models in Matlab to predict.
Speech Sound Production: Recognition Using Recurrent Neural Networks Abstract: In this paper I present a study of speech sound production and methods for.
AN INVESTIGATION OF DEEP NEURAL NETWORKS FOR NOISE ROBUST SPEECH RECOGNITION Michael L. Seltzer, Dong Yu Yongqiang Wang ICASSP 2013 Presenter : 張庭豪.
Machine Learning Neural Networks
Simple Neural Nets For Pattern Classification
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Speaker Adaptation for Vowel Classification
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Classification of Music According to Genres Using Neural Networks, Genetic Algorithms and Fuzzy Systems.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Radial Basis Function (RBF) Networks
Representing Acoustic Information
EE513 Audio Signals and Systems Statistical Pattern Classification Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Eng. Shady Yehia El-Mashad
Artificial Neural Networks (ANN). Output Y is 1 if at least two of the three inputs are equal to 1.
Multiple-Layer Networks and Backpropagation Algorithms
Artificial Neural Networks
© Negnevitsky, Pearson Education, Will neural network work for my problem? Will neural network work for my problem? Character recognition neural.
Outline What Neural Networks are and why they are desirable Historical background Applications Strengths neural networks and advantages Status N.N and.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
NEURAL NETWORKS FOR DATA MINING
Artificial Intelligence Techniques Multilayer Perceptrons.
Jacob Zurasky ECE5526 – Spring 2011
Basics of Neural Networks Neural Network Topologies.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Handwritten Recognition with Neural Network Chatklaw Jareanpon, Olarik Surinta Mahasarakham University.
Multi-Layer Perceptron
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
Singer similarity / identification Francois Thibault MUMT 614B McGill University.
Performance Comparison of Speaker and Emotion Recognition
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
EEE502 Pattern Recognition
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
Kim HS Introduction considering that the amount of MRI data to analyze in present-day clinical trials is often on the order of hundreds or.
Research Methodology Proposal Prepared by: Norhasmizawati Ibrahim (813750)
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Speech Recognition through Neural Networks By Mohammad Usman Afzal Mohammad Waseem.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Neural networks.
Big data classification using neural network
Multiple-Layer Networks and Backpropagation Algorithms
Recognition of bumblebee species by their buzzing sound
Deep Feedforward Networks
Artificial Neural Networks
Deep Learning Amin Sobhani.
ARTIFICIAL NEURAL NETWORKS
Intelligent Information System Lab
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Machine Learning Today: Reading: Maria Florina Balcan
Artificial Neural Network & Backpropagation Algorithm
Neuro-Computing Lecture 4 Radial Basis Function Network
Neural Network - 2 Mayank Vatsa
Ala’a Spaih Abeer Abu-Hantash Directed by Dr.Allam Mousa
An Improved Neural Network Algorithm for Classifying the Transmission Line Faults Slavko Vasilic Dr Mladen Kezunovic Texas A&M University.
EE513 Audio Signals and Systems
Department of Electrical Engineering
Digital Systems: Hardware Organization and Design
John H.L. Hansen & Taufiq Al Babba Hasan
COSC 4335: Part2: Other Classification Techniques
Combination of Feature and Channel Compensation (1/2)
Akram Bitar and Larry Manevitz Department of Computer Science
Presentation transcript:

Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in Information and Comm. Engineering Pulchowk Campus 1

Abstract The analysis of pathological voice is a challenging and an important area of research in speech processing. It provides a method for the identification and classification of pathological voice using ANN. Multilayer Perceptron Neural Network (MLPNN), Generalised Regression Neural Network (GRNN) and Probabilistic Neural Network (PNN) methods are used for classifying the pathological voices. Mel – Frequency Cepstral Coefficients (MFCC) features extracted from audio recordings are used for this purpose. 2

Introduction Speech signal is produced as a result of time varying excitation of the time varying vocal tract. Speech pathology is a field which deals with evaluation of speech, languages, and voice disorder. Acoustic parameters like pitch, jitter and shimmer are based on fundamental frequency. In recent years, Mel Frequency Cepstral Coefficient (MFCC) is a trustworthy parameter for pathological voice detection. Diagnosis of pathological voice is one of the most important issues in biomedical applications of speech technology. 3

Related Works The classification of pathological voice from normal voice is implemented using support vector machine (SVM). A classification technique is proposed which focus on the acoustic features of the speech using Mel Frequency Cepstral Coefficients (MFCC) and Gaussian Mixture Model. The adaptation of Mel-frequency Cepstral Coefficients (MFCC) and Support Vector Machine (SVM) are used for the diagnosis of neurological disorder in the voice signal. A hidden Markov model is used to classify speeches into two classes: the normal and the pathological. Two hidden Markov models are trained based on these two classes of speech and then the trained models are used to classify the dataset. Pathological voice consists of voice signals from the patients who have disease on their vocal folds. 4

Scope of the work Classification of pathological voices from the normal voices using MFCC features and Artificial Neural Network. Mel-frequency cepstral coefficients are the widely used features to characterize voice signals. The cepstral representation of the signal allows us to characterize the vocal tract as a source-filter modal and the Mel frequency characterizes the human auditory system. The MFCC feature vectors are used by Artificial Neural Network to identify and classify the pathological voices. 5

Proposed Work The speech samples for extracting the MFCCs were obtained from audio recordings of people with normal voice and pathological voice. The MFCC features are obtained by a standard short-term speech analysis, along with frame-level pitch, to form the feature vectors. Artificial Neural network classifiers (MLPNN, GRNN and PNN) are considered for the assessment of feature vectors. The architecture of the proposed system is 6

Feature Extraction Features plays a major role in identifying the voice and making a decision that highly depends on how much we are successful in extracting useful information from the voice in a way that enables our system to differentiate between voices and identify them according to their feature. Fig: MFCC feature extraction 7

Feature Extraction Pre-emphasis Boost the amount of energy in the high frequency. Boosting the high frequency energy makes information from these higher formants available to the acoustic model. Framing Segmenting the speech samples obtained from the analog to digital conversion (ADC), into small frames with the time length within the range of ms. It enables the non stationary speech signal to be segmented into quasi- stationary frames, and enables Fourier Transform of the speech signal. Windowing It is meant to window each individual frame, in order to minimize the signal discontinuities at the beginning and the end of the frame. Fast Fourier Transform It converts the convolution of the glottal pulse and the vocal tract impulse response in the time domain into multiplication in the frequency domain. 8

Feature Extraction Mel Scaled Filter Bank The Mel-frequency scale is linear frequency spacing below 1000Hz and a logarithmic spacing above 1000Hz. We can use the following equation to compute the Mel for a given frequency f in Hz: Mel(f) = 2595 log 10 ( 1 + (f / 700) ) 9 Mel Filter Banks

Feature Extraction Logarithm It has the effect of changing multiplication into addition. Therefore, this step simply converts the multiplication of the magnitude in the Fourier transform into addition. Discrete Cosine Transform (DCT) The IFFT needs complex arithmetic, the DCT does not. The DCT implements the same function as the FFT more efficiently by taking advantage of the redundancy in a real signal. The DCT is more efficient computationally. The MFCCs may be calculated using this 10

Multilayer Perceptron Neural Network (MLPNN) Its composed of three layers an input layer, one or more hidden layers and an output layer. The number of neurons in the hidden layer is dependent on the size of the input vector. The output layer has one neuron. MFCC features from the normal and pathological speech samples are input to the ANN for training. The input samples are propagated in a forward direction on a layer-by-layer basis. The network computes its output pattern, and if there is an error the weights are adjusted to reduce this error. In a back-propagation, the learning algorithm has two phases. First, a training input pattern is presented to the network input layer. The network propagates the input pattern from layer to layer until the output pattern is generated by the output layer. If the pattern is different from the desired output, an error is calculated and then propagated backwards through the network from the output layer to the input layer. The weights are modified as the error is propagated. 11

Multilayer Perceptron Neural Network Process of Back-Propagation: Step 1: Initialization - Set all the weights and threshold levels of the network. Step 2: Activation - Activate the back-propagation neural network by applying inputs and observing desired outputs. Calculate the actual outputs of the neurons in the hidden layer and the output layer. Step 3: Weight training - Update the weights in the back-propagation network propagating backward the errors associated with output neurons. Step 4: Iteration – Increase iteration one by one, go back to step 2 and repeat the process until the selected error criterion is satisfied. 12

General Regression Neural Network A GRNN network is a three-layer network that contains one hidden neuron for each training pattern. There is a smoothing factor that is used when the network is applied to new data. The smoothing factor determines how tightly the network matches its predictions to the data in the training patterns. 13 Input layer: There is one neuron in the input layer for each predictor variable. Hidden layer: When it is presented with the x vector of input values from the input layer, a hidden neuron computes the Euclidean distance of the test case from the neuron’s center point and then applies the RBF Kernel Function using the sigma value(s) Decision layer: the weighted value coming out of a hidden neuron is fed only to the pattern neuron that corresponds to the hidden neuron’s category.

Probabilistic Neural Network (PNN) A PNN is a feed forward neural network used in classification problem. When an input is present, the first layer computes the distance from the input vector to the training input vectors. This produces a vector where its elements indicate how close the input is to the training input. The second layer sums the contribution for each class of inputs and produces its net output as a vector of probabilities. Finally, a complete transfer function on the output of the second layer picks the maximum if these probabilities, and produces a 1 (positive identification) for that class and a 0 (negative identification) for non-targeted classes. 14

Feature Extraction 20 samples are taken for analysis ( 10 for training and 10 for testing) MFCC features are extracted from the samples after applying the techniques such as pre-emphasis, framing, windowing, FFT, Mel Filter banks, log and DCT. The speech signals and the extracted features are shown in fig below. 15 Normal Voice Feature Extraction for Normal Voice Pathological Voice Feature Extraction for Pathological Voice

Classification The MFCC feature vectors derived from the speech samples (both normal and pathological voices) are used as input for the Neural Network for training and testing. 16 There are two hidden layers in the network, Number of outputs from the first hidden layer is 10 and that of the second hidden layer is 7. The output produces either 1 (for pathological voice) and 0 (for normal voice).

Performance of MLPNN Mean Square Error Vs Number of Epochs for training, validating, testing and best fit for Multilayer Perceptron Neural Network (MLPNN) 17

Performance of GRNN Mean Square Error Vs Number of Epochs for training, validating, testing and best fit for General Regression Neural Network (GRNN) 18

Performance of PNN Mean Square Error Vs Number of Epochs for training, validating, testing and best fit for Probabilistic Neural Network (PNN) 19

Performance of MLPNN, GRNN and PNN The Mean Square Error Vs Number of Epochs for training, validating, testing and best fit for MLPNN, GRNN and PNN are shown in few slides above. It is observed that GRNN and PNN behave in similar manner. The MLPNN shows a better result than GRNN and PNN. The classification accuracy of MLPNN is nearly 100 % in identifying pathological voice compared to GRNN and PNN which are lesser. 20

Conclusion It aims at developing automated pathological voice recognition system. Mel – Frequency Cepstral coefficients are extracted from the the audio recordings. These MFCC features are used as input for various Neural Network models for classification. The behavior of MLPNN, GRNN and PNN in the classification of pathological voices from normal voices has been analyzed. The PNN behaves in similar way to GRNN. Its found that MLPNN with MFCC features performs better than PNN and GRNN in the classification of pathological voices. Further numbers of features are selected for training the sample values are justified by the expected result of 100 % accuracy in the classification of voices in Multilayer Perceptron Neural Network (MLPNN). 21

References C.M. Vikram and K. Umarani, “Pathological Voice Analysis To Detect Neurological Disorders Using MFCC & SVM” Cheolwoo Jo, “Source Analysis of Pathological Voice” D. Pravena, S. Dhivya, A. Durga Devi, “Pathological Voice Recognition for Vocal Fold Disease” 22

Thank You !!! 23