Presentation is loading. Please wait.

Presentation is loading. Please wait.

Entropy and Dynamism Criteria for Voice Quality Classification Applications Authors: Peter D. Kukharchik, Igor E. Kheidorov, Hanna M. Lukashevich, Denis.

Similar presentations


Presentation on theme: "Entropy and Dynamism Criteria for Voice Quality Classification Applications Authors: Peter D. Kukharchik, Igor E. Kheidorov, Hanna M. Lukashevich, Denis."— Presentation transcript:

1 Entropy and Dynamism Criteria for Voice Quality Classification Applications Authors: Peter D. Kukharchik, Igor E. Kheidorov, Hanna M. Lukashevich, Denis L. Mitrofanov Belarusian State University, Radiophysics Department, Minsk, Belarus VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS GENEVA - AUGUST 27-29, 2003 ISCA Tutorial and Research Workshop International Speech Communication Association

2 VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association Voice Quality Classification Applications Introduction System design Experiment Conclusion

3 VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association Introduction Audio is a large and extremely variable data class. The range of sounds is large, from music genres to animal cries to synthesizer samples. Any of the above can and will occur in combination.

4 VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association Existing Approaches Signal Processing Techniques Spectrum Modulation spectrum Temporal Information Decision Making Bayesian Information Criterion (BIC) Log Likelihood Ratio Hidden Markov Model (HMM)

5 VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association Block diagram of the proposed system Feature vector extraction Neural network Entropy & Dynamism HMM Input Data (Wave file) Segments Vectors (Mel Cepstra) Probability of Russian phonemes Entropy and Dynamism

6 VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association Definitions Entropy and averaged entropy Entropy is measure of the uncertainty or disorder in a given distribution We use N=40

7 VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association Definitions Dynamism and average dynamism Dynamism is a measure of the rate of change of a quantity

8 VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association Feature Vectors extraction We use 12 Mel Cepstra coefficients in 30ms window with shifting of frame 10ms, for 4-15min wave files of russian speech, non-russian speech and music.

9 VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association S0S0 S1S1 S2S2 S3S3 S4S4 S5S5 S6S6 HMM Define HMM for signal – one HMM state for every segment we want to find Perform a Viterbi search of an optimal path using probabilities from previous step Determine segment boundaries as a moments of HMM states change Hidden Markov Model

10 VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association  Neural network for probabilities generation : grounds  Neural networks can model probabilities distribution with a high accuracy due to their ability to approximate a large variety of functions  If training neural network doesn’t stop in local minimum  the outputs can be considered as classes probabilities Neural Network

11 VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association Neural network for probabilities generation : structure Fully connected mutilayer perceptron –Input layer size equals to feature vector size –Output layer size equals to probability of phonemes –Number and sizes of hidden layers varies –Tangent activation for hidden neurons –Softmax activation for output neurons Mutilayer Perceptron

12 VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association Results Music Entropy histogram

13 VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association Results - Russian Speech

14 VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association Results - Foreign

15 VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association Results - Russian and Foreign Blue is Russian, pink is French

16 VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association Results Two Russian speakers (blue and brown) and Music (others) Russian speaker (blue) and Music (pink)

17 VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association Results Pure Russian & “Czech” Russian There some difference even between native speech and Russian with Czech accent

18 VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association Results Entropy histograms of “normal” (brown) and “rough” (blue) French speech

19 VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association Results Entropy histograms for “normal” (brown), “rough” (blue) and “lips” (lips) French speech

20 VOICE QUALITY: FUNCTIONS, ANALYSIS AND SYNTHESIS ISCA Tutorial and Research Workshop International Speech Communication Association Conclusion Further research Parameter vectors, their size, number of context frames Specialized HMM structures for a certain type of speech signals Conclusion Entropy and Dynamism features, as experiments show, can be successfully used for automatic signal segmentation. Further research in this area can lead to better practical results.


Download ppt "Entropy and Dynamism Criteria for Voice Quality Classification Applications Authors: Peter D. Kukharchik, Igor E. Kheidorov, Hanna M. Lukashevich, Denis."

Similar presentations


Ads by Google