Performance Comparison of Speaker and Emotion Recognition

Slides:



Advertisements
Similar presentations
Building an ASR using HTK CS4706
Advertisements

KARAOKE FORMATION Pratik Bhanawat (10bec113) Gunjan Gupta Gunjan Gupta (10bec112)
Voiceprint System Development Design, implement, test unique voiceprint biometric system Research Day Presentation, May 3 rd 2013 Rahul Raj (Team Lead),
INSTRUCTOR:Dr.Veton Kepuska STUDENT:Dileep Narayan.Koneru YES/NO RECOGNITION SYSTEM.
Vineel Pratap Girish Govind Abhilash Veeragouni. Human listeners are capable of extracting information from the acoustic signal beyond just the linguistic.
Introduction The aim the project is to analyse non real time EEG (Electroencephalogram) signal using different mathematical models in Matlab to predict.
Speaker Recognition Sharat.S.Chikkerur Center for Unified Biometrics and Sensors
Speech in Multimedia Hao Jiang Computer Science Department Boston College Oct. 9, 2007.
December 2006 Cairo University Faculty of Computers and Information HMM Based Speech Synthesis Presented by Ossama Abdel-Hamid Mohamed.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Natural Language Processing - Speech Processing -
Advanced Technology Center Stuttgart EMOTIONAL SPACE IMPROVES EMOTION RECOGNITION Raquel Tato, Rocio Santos, Ralf Kompe Man Machine Interface Lab Advance.
COMP 4060 Natural Language Processing Speech Processing.
Modeling of Mel Frequency Features for Non Stationary Noise I.AndrianakisP.R.White Signal Processing and Control Group Institute of Sound and Vibration.
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
A PRESENTATION BY SHAMALEE DESHPANDE
Authors: Anastasis Kounoudes, Anixi Antonakoudi, Vasilis Kekatos
Database Construction for Speech to Lip-readable Animation Conversion Gyorgy Takacs, Attila Tihanyi, Tamas Bardi, Gergo Feldhoffer, Balint Srancsik Peter.
Representing Acoustic Information
Toshiba Update 04/09/2006 Data-Driven Prosody and Voice Quality Generation for Emotional Speech Zeynep Inanoglu & Steve Young Machine Intelligence Lab.
May 20, 2006SRIV2006, Toulouse, France1 Acoustic Modeling of Accented English Speech for Large-Vocabulary Speech Recognition ATR Spoken Language Communication.
Introduction to Automatic Speech Recognition
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
Eng. Shady Yehia El-Mashad
Knowledge Base approach for spoken digit recognition Vijetha Periyavaram.
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
June 28th, 2004 BioSecure, SecurePhone 1 Automatic Speaker Verification : Technologies, Evaluations and Possible Future Gérard CHOLLET CNRS-LTCI, GET-ENST.
International Conference on Intelligent and Advanced Systems 2007 Chee-Ming Ting Sh-Hussain Salleh Tian-Swee Tan A. K. Ariff. Jain-De,Lee.
A brief overview of Speech Recognition and Spoken Language Processing Advanced NLP Guest Lecture August 31 Andrew Rosenberg.
Reporter: Shih-Hsiang( 士翔 ). Introduction Speech signal carries information from many sources –Not all information is relevant or important for speech.
Jacob Zurasky ECE5526 – Spring 2011
Robust Speech Feature Decorrelated and Liftered Filter-Bank Energies (DLFBE) Proposed by K.K. Paliwal, in EuroSpeech 99.
Csc Lecture 7 Recognizing speech. Geoffrey Hinton.
LOG-ENERGY DYNAMIC RANGE NORMALIZATON FOR ROBUST SPEECH RECOGNITION Weizhong Zhu and Douglas O’Shaughnessy INRS-EMT, University of Quebec Montreal, Quebec,
Compression No. 1  Seattle Pacific University Data Compression Kevin Bolding Electrical Engineering Seattle Pacific University.
Speech Parameter Generation From HMM Using Dynamic Features Keiichi Tokuda, Takao Kobayashi, Satoshi Imai ICASSP 1995 Reporter: Huang-Wei Chen.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Artificial Intelligence 2004 Speech & Natural Language Processing Natural Language Processing written text as input sentences (well-formed) Speech.
Speaker Recognition by Habib ur Rehman Abdul Basit CENTER FOR ADVANCED STUDIES IN ENGINERING Digital Signal Processing ( Term Project )
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Overview ► Recall ► What are sound features? ► Feature detection and extraction ► Features in Sphinx III.
Artificial Intelligence 2004 Speech & Natural Language Processing Speech Recognition acoustic signal as input conversion into written words Natural.
July Age and Gender Recognition from Speech Patterns Based on Supervised Non-Negative Matrix Factorization Mohamad Hasan Bahari Hugo Van hamme.
Speaker Identification by Combining MFCC and Phase Information Longbiao Wang (Nagaoka University of Technologyh, Japan) Seiichi Nakagawa (Toyohashi University.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
Combining Speech Attributes for Speech Recognition Jeremy Morris November 9, 2006.
MINIMUM WORD CLASSIFICATION ERROR TRAINING OF HMMS FOR AUTOMATIC SPEECH RECOGNITION Yueng-Tien, Lo Speech Lab, CSIE National.
Automatic Speech Recognition A summary of contributions from multiple disciplines Mark D. Skowronski Computational Neuro-Engineering Lab Electrical and.
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
Predicting Voice Elicited Emotions
BY KALP SHAH Sentence Recognizer. Sphinx4 Sphinx4 is the best and versatile recognition system. Sphinx4 is a speech recognition system which is written.
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
Statistical Models for Automatic Speech Recognition Lukáš Burget.
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Message Source Linguistic Channel Articulatory Channel Acoustic Channel Observable: MessageWordsSounds Features Bayesian formulation for speech recognition:
RESEARCH MOTHODOLOGY SZRZ6014 Dr. Farzana Kabir Ahmad Taqiyah Khadijah Ghazali (814537) SENTIMENT ANALYSIS FOR VOICE OF THE CUSTOMER.
By: Nicole Cappella. Why I chose Speech Recognition  Always interested me  Dr. Phil Show Manti Teo Girlfriend Hoax  Three separate voice analysts proved.
Online Multiscale Dynamic Topic Models
Spectral and Temporal Modulation Features for Phonetic Recognition Stephen A. Zahorian, Hongbing Hu, Zhengqing Chen, Jiang Wu Department of Electrical.
ARTIFICIAL NEURAL NETWORKS
Vocoders.
Sharat.S.Chikkerur S.Anand Mantravadi Rajeev.K.Srinivasan
Visual Recognition of American Sign Language Using Hidden Markov Models 문현구 문현구.
Presented by Chen-Wei Liu
Presenter: Shih-Hsiang(士翔)
Auditory Morphing Weyni Clacken
Presentation transcript:

Performance Comparison of Speaker and Emotion Recognition

Abstract This research paper mainly focus on the Hidden Markov Model tool kit (HTK) which recognizes the speech , speaker and emotional speeches using the MFCC. HTK based technique gives the good result when compared to other techniques for the voice recognition, emotional speech recognition and also evaluates the the noisy test speech.

The speech signal mainly focuses on the information like age, gender, social status, accent and emotional state of the speaker. The main challenging task is to recognize the speaker , speech and the emotion from the speeches. If the user experiences the negative emotion the system has to adjust itself the need for the user or pass the control for the human agents for alternate convenient reply from the user.

Introduction The emotion of the speech focuses on the use of MFCC and HTK modelling technique for recognising the speech, speaker and emotion by the speech database. In this research paper volvo , white and f16 noises are considered for the work to evaluate the emotion of independent and speaker noisy emotional speech recognition system.

we can get better results through the implementation of the additional pre processing techniques using the adaptive technique of the RLS filters prior to the conventional pre processing stages. noisy data are evaluated through the combination of the short time energy and zero crossing rate parameters for developing a technique which reduces the effects of the noise speech

Features based on CEPSTRUM The short-time speech spectrum for voiced speech sound which has two components: 1) harmonic peaks due to the periodicity of voiced speech 2) glottal pulse shape. The excitation source decides the periodicity of voiced speech. It reflects the characteristics of speaker. The spectral envelope is shaped by formants which reflect the resonances of vocal tract.

This represents the source characteristics of speech signal and based on the known variation of the human ear's. critical bandwidth with frequencies, filters spaced linearly at low frequencies and logarithmically at high frequencies preferred to extract phonetically important characteristics of speech.

Characteristics of Emotional speech The structural part of the speech contains linguistic information which reveals the characteristics of the pronunciation of the utterances based on the rules of the language. Paralinguistic information refers to the internal structure messages such as emotional state of the speaker. Speeches of the emotions such as anger, fear and happy are displaying the psychological behavior of the speaker such as high blood pressure and high heart rate.

Speech signals are converted into frames and frequency analysis is done on the frames. Frequencies are calculated on the basis of choosing the frequency bin which has high spectral energy. It is indicated that emotions such as anger, fear and happy have more number of frames with high frequency energy and the emotion sadness has very few frames with high frequency energy

Frequency distribution of speech in different emotions.

SPEECH RECOGNITION USING HTK Utterances are chosen from 10 different actors and ten different texts. Ten emotional utterances are collected from five male and Female speakers respectively in the age ranging from 21 to 35 years. They are required to utter ten different utterances in Berlin in seven different emotions such as anger, boredom, disgust, fear, happy, neutral and sad. Speech recognition system generally involves the realization of speech signal as the message encoded as the sequence of one or more symbols.

Then the MFCC features are extracted Then the MFCC features are extracted. For each training model corresponding to continuous speeches, training set of K utterances are used, where each utterance constitutes an observation sequence of some appropriate spectral or temporal representation.

HCopy is the HTK tool used for MFCC extraction. HCompV is a tool for computing overall mean and variance and generating proto type HMM. HInit is a tool used to read all the bootstrapped training data for initializing a single Hidden markov model using a segmental K-means algorithm.

HVite is a general purpose speech recognizer HVite is a general purpose speech recognizer. It matches the speech file against a network of HMMs and output a transcription for each speech. The HTK recognizer actually requires a network to be defined using a HTK lattice format in which word to word transcription is done.

HResuIt is HTK performance analysis tool which reads the set of label files and compares them with the corresponding reference transcription.

Noise Characteristics White noise signal Frequency distribution of white noise

Emotion independent speaker recognition

NOISY SPEECH RECOGNITION Noisy speech recognition in this paper is such that performance of the noisy speech recognition system, noises such as volvo, white and F16 are taken from "Noisex-92" database. The noise reduced speech closely resembling the clean test speech is obtained. Then, conventional preprocessing techniques are applied on the noise reduced speeches and features are extracted.

RESULTS AND DISCUSSION Hear in this paper, the performance of emotion independent speech recognition is evaluated by considering the ten utterances spoken by ten actors. Overall accuracy for emotion independent speech recognition and speaker independent speech recognition are 91 % and 87% respectively for F16 noise added to the test speeches.

CONCLUSION The performance of speech and speaker recognition systems is found to be good and is slightly low for emotion recognition. This is probably due to the usage of same set of speech of same set of speakers in different emotions. The variance of the noise reduced speech is almost same as that of the clean speech. Features extracted from the noise-reduced speeches are applied to models of the clean data and performance is measured in terms of recognition accuracy.

Thank you..!