Entropy and Dynamism Criteria for Voice Quality Classification Applications Authors: Peter D. Kukharchik, Igor E. Kheidorov, Hanna M. Lukashevich, Denis.

Slides:



Advertisements
Similar presentations
Building an ASR using HTK CS4706
Advertisements

Toward Automatic Music Audio Summary Generation from Signal Analysis Seminar „Communications Engineering“ 11. December 2007 Patricia Signé.
Modelling and Analyzing Multimodal Dyadic Interactions Using Social Networks Sergio Escalera, Petia Radeva, Jordi Vitrià, Xavier Barò and Bogdan Raducanu.
Designing Facial Animation For Speaking Persian Language Hadi Rahimzadeh June 2005.
Supervised Learning Recap
Dual-domain Hierarchical Classification of Phonetic Time Series Hossein Hamooni, Abdullah Mueen University of New Mexico Department of Computer Science.
Introduction The aim the project is to analyse non real time EEG (Electroencephalogram) signal using different mathematical models in Matlab to predict.
Hidden Markov Models Theory By Johan Walters (SR 2003)
Hidden Markov Model based 2D Shape Classification Ninad Thakoor 1 and Jean Gao 2 1 Electrical Engineering, University of Texas at Arlington, TX-76013,
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Content-Based Classification, Search & Retrieval of Audio Erling Wold, Thom Blum, Douglas Keislar, James Wheaton Presented By: Adelle C. Knight.
LYU0103 Speech Recognition Techniques for Digital Video Library Supervisor : Prof Michael R. Lyu Students: Gao Zheng Hong Lei Mo.
Speaker Adaptation for Vowel Classification
Minimum Classification Error Networks Based on book chapter 9, by Shigeru Katagiri Jaakko Peltonen, 28 th February, 2002.
Optimal Adaptation for Statistical Classifiers Xiao Li.
A PRESENTATION BY SHAMALEE DESHPANDE
SOMTIME: AN ARTIFICIAL NEURAL NETWORK FOR TOPOLOGICAL AND TEMPORAL CORRELATION FOR SPATIOTEMPORAL PATTERN LEARNING.
Database Construction for Speech to Lip-readable Animation Conversion Gyorgy Takacs, Attila Tihanyi, Tamas Bardi, Gergo Feldhoffer, Balint Srancsik Peter.
May 20, 2006SRIV2006, Toulouse, France1 Acoustic Modeling of Accented English Speech for Large-Vocabulary Speech Recognition ATR Spoken Language Communication.
Eng. Shady Yehia El-Mashad
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
Advanced Signal Processing 2, SE Professor Horst Cerjak, Andrea Sereinig Graz, Basics of Hidden Markov Models Basics of HMM-based.
Overview of NIT HMM-based speech synthesis system for Blizzard Challenge 2011 Kei Hashimoto, Shinji Takaki, Keiichiro Oura, and Keiichi Tokuda Nagoya.
Audio classification Discriminating speech, music and environmental audio Rajas A. Sambhare ECE 539.
Isolated-Word Speech Recognition Using Hidden Markov Models
1 7-Speech Recognition (Cont’d) HMM Calculating Approaches Neural Components Three Basic HMM Problems Viterbi Algorithm State Duration Modeling Training.
9 th Conference on Telecommunications – Conftele 2013 Castelo Branco, Portugal, May 8-10, 2013 Sara Candeias 1 Dirce Celorico 1 Jorge Proença 1 Arlindo.
7-Speech Recognition Speech Recognition Concepts
Jacob Zurasky ECE5526 – Spring 2011
Csc Lecture 7 Recognizing speech. Geoffrey Hinton.
MUMT611: Music Information Acquisition, Preservation, and Retrieval Presentation on Timbre Similarity Alexandre Savard March 2006.
Hyperparameter Estimation for Speech Recognition Based on Variational Bayesian Approach Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee and Keiichi.
Speech Parameter Generation From HMM Using Dynamic Features Keiichi Tokuda, Takao Kobayashi, Satoshi Imai ICASSP 1995 Reporter: Huang-Wei Chen.
LML Speech Recognition Speech Recognition Introduction I E.M. Bakker.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Modeling and Generation of Accentual Phrase F 0 Contours Based on Discrete HMMs Synchronized at Mora-Unit Transitions Atsuhiro Sakurai (Texas Instruments.
Hidden Markov Classifiers for Music Genres. Igor Karpov Rice University Comp 540 Term Project Fall 2002.
Korean Phoneme Discrimination Ben Lickly Motivation Certain Korean phonemes are very difficult for English speakers to distinguish, such as ㅅ and ㅆ.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Look who’s talking? Project 3.1 Yannick Thimister Han van Venrooij Bob Verlinden Project DKE Maastricht University.
Multi-Speaker Modeling with Shared Prior Distributions and Model Structures for Bayesian Speech Synthesis Kei Hashimoto, Yoshihiko Nankaku, and Keiichi.
Duraid Y. Mohammed Philip J. Duncan Francis F. Li. School of Computing Science and Engineering, University of Salford UK Audio Content Analysis in The.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
Face Image-Based Gender Recognition Using Complex-Valued Neural Network Instructor :Dr. Dong-Chul Kim Indrani Gorripati.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Maximum Entropy Model, Bayesian Networks, HMM, Markov Random Fields, (Hidden/Segmental) Conditional Random Fields.
Bayesian Speech Synthesis Framework Integrating Training and Synthesis Processes Kei Hashimoto, Yoshihiko Nankaku, and Keiichi Tokuda Nagoya Institute.
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida March 31,
1 Electrical and Computer Engineering Binghamton University, State University of New York Electrical and Computer Engineering Binghamton University, State.
Automated Speach Recognotion Automated Speach Recognition By: Amichai Painsky.
Speaker Verification System Middle Term Presentation Performed by: Barak Benita & Daniel Adler Instructor: Erez Sabag.
A Hybrid Model of HMM and RBFN Model of Speech Recognition 길이만, 김수연, 김성호, 원윤정, 윤아림 한국과학기술원 응용수학전공.
1 7-Speech Recognition Speech Recognition Concepts Speech Recognition Approaches Recognition Theories Bayse Rule Simple Language Model P(A|W) Network Types.
CPH Dr. Charnigo Chap. 11 Notes Figure 11.2 provides a diagram which shows, at a glance, what a neural network does. Inputs X 1, X 2,.., X P are.
Speech Recognition through Neural Networks By Mohammad Usman Afzal Mohammad Waseem.
NTNU Speech and Machine Intelligence Laboratory 1 Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models 2016/05/31.
Automatic Classification of Audio Data by Carlos H. L. Costa, Jaime D. Valle, Ro L. Koerich IEEE International Conference on Systems, Man, and Cybernetics.
Mr. Darko Pekar, Speech Morphing Inc.
Online Multiscale Dynamic Topic Models
EEL 6586: AUTOMATIC SPEECH PROCESSING Hidden Markov Model Lecture
Presentation on Artificial Neural Network Based Pathological Voice Classification Using MFCC Features Presenter: Subash Chandra Pakhrin 072MSI616 MSC in.
RECURRENT NEURAL NETWORKS FOR VOICE ACTIVITY DETECTION
3. Applications to Speaker Verification
Hidden Markov Models Part 2: Algorithms
Presenter: Simon de Leon Date: March 2, 2006 Course: MUMT611
Advances in Deep Audio and Audio-Visual Processing
Measuring the Similarity of Rhythmic Patterns
Combination of Feature and Channel Compensation (1/2)
DNN-BASED SPEAKER-ADAPTIVE POSTFILTERING WITH LIMITED ADAPTATION DATA FOR STATISTICAL SPEECH SYNTHESIS SYSTEMS Mirac Goksu Ozturk1, Okan Ulusoy1, Cenk.
Random Neural Network Texture Model
Presentation transcript:

Entropy and Dynamism Criteria for Voice Quality Classification Applications Authors: Peter D. Kukharchik, Igor E. Kheidorov, Hanna M. Lukashevich, Denis L. Mitrofanov Belarusian State University, Radiophysics Department, Minsk, Belarus VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS GENEVA - AUGUST 27-29, 2003 ISCA Tutorial and Research Workshop International Speech Communication Association

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association Voice Quality Classification Applications Introduction System design Experiment Conclusion

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association Introduction Audio is a large and extremely variable data class. The range of sounds is large, from music genres to animal cries to synthesizer samples. Any of the above can and will occur in combination.

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association Existing Approaches Signal Processing Techniques Spectrum Modulation spectrum Temporal Information Decision Making Bayesian Information Criterion (BIC) Log Likelihood Ratio Hidden Markov Model (HMM)

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association Block diagram of the proposed system Feature vector extraction Neural network Entropy & Dynamism HMM Input Data (Wave file) Segments Vectors (Mel Cepstra) Probability of Russian phonemes Entropy and Dynamism

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association Definitions Entropy and averaged entropy Entropy is measure of the uncertainty or disorder in a given distribution We use N=40

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association Definitions Dynamism and average dynamism Dynamism is a measure of the rate of change of a quantity

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association Feature Vectors extraction We use 12 Mel Cepstra coefficients in 30ms window with shifting of frame 10ms, for 4-15min wave files of russian speech, non-russian speech and music.

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association S0S0 S1S1 S2S2 S3S3 S4S4 S5S5 S6S6 HMM Define HMM for signal – one HMM state for every segment we want to find Perform a Viterbi search of an optimal path using probabilities from previous step Determine segment boundaries as a moments of HMM states change Hidden Markov Model

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association  Neural network for probabilities generation : grounds  Neural networks can model probabilities distribution with a high accuracy due to their ability to approximate a large variety of functions  If training neural network doesn’t stop in local minimum  the outputs can be considered as classes probabilities Neural Network

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association Neural network for probabilities generation : structure Fully connected mutilayer perceptron –Input layer size equals to feature vector size –Output layer size equals to probability of phonemes –Number and sizes of hidden layers varies –Tangent activation for hidden neurons –Softmax activation for output neurons Mutilayer Perceptron

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association Results Music Entropy histogram

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association Results - Russian Speech

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association Results - Foreign

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association Results - Russian and Foreign Blue is Russian, pink is French

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association Results Two Russian speakers (blue and brown) and Music (others) Russian speaker (blue) and Music (pink)

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association Results Pure Russian & “Czech” Russian There some difference even between native speech and Russian with Czech accent

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association Results Entropy histograms of “normal” (brown) and “rough” (blue) French speech

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association Results Entropy histograms for “normal” (brown), “rough” (blue) and “lips” (lips) French speech

VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association Conclusion Further research Parameter vectors, their size, number of context frames Specialized HMM structures for a certain type of speech signals Conclusion Entropy and Dynamism features, as experiments show, can be successfully used for automatic signal segmentation. Further research in this area can lead to better practical results.