Presentation on theme: "Emotional Speech Analysis using Artificial Neural Networks IMCSIT-AAIA October 18-20, 2010 – Wisla, Poland. Jana Tuckova & Martin Sramka Department of."— Presentation transcript:
Emotional Speech Analysis using Artificial Neural Networks IMCSIT-AAIA October 18-20, 2010 – Wisla, Poland. Jana Tuckova & Martin Sramka Department of Circuit Theory, CTU – FEE in Prague Laboratory of Artificial Neural Network Applications 1/14
Overview IMCSIT-AAIA Acknowledgment: This work was supported by the Czech Science Foundation 102/09/0989 grant. Wisla, Poland Introduction Method - The patterns based on time and frequency characteristics - The patterns based on musical theory - Combination of both previous approaches Experiments and Results Conclusion and future work 2/14
IMCSIT-AAIA Wisla, Poland Introduction A classification of speech emotions. Our aim: 3/14 Why ANN? - The robustness of the solution for real methods by ANN is a great advantage, for example, in the area of noise signal processing. - It is possible treat various input data type currently.
By a description of speech signals which are formulated by: - standard speech processing methods - music theory - combination of both methods By ANN approach IMCSIT-AAIA Wisla, Poland Introduction Which way ? 4/14 MLNNKSOM
IMCSIT-AAIA Wisla, Poland Introduction MLNN – with one hidden layer – the input layer is given by the key linguistic parameters – the outputs are the various clasees of emotions KSOM- SSOM – the training algorithm: Scaled Conjugate Gradient with superlinear convergence rate 5/14
which combines aspects of the VQ method with the topology preserving ordering of the quantization vectors. only for well-known input data for well-known classes of input data T he database forANN 216 patterns for training 72 for validation 72 for test IMCSIT-AAIAWisla, Poland Introduction 6/14 KSOM- SSOM
Corpus creation IMCSIT-AAIA Wisla, Poland Database of Utterances 7/14 Words (in Czech ) Words - translation Jé.Whoah. Má ?Got it ? Nevím.I don´t know. Vidíš?See you ? Povídej !Tell me ! Poezie.Poetry. Sentences (in Czech) Sentences - translation To mi nevadí.I don´t mind. Neumím si to vysvětlit. I don´t know to explain this. To bude světový rekord. It will be a world record. Jak se ti to líbí ?How do you like it ? Podívej se na nebe ! Look up at the heavens ! Až přijdeš, uvidíš. When you come, you´ll see.
Corpus creation IMCSIT-AAIA Wisla, Poland Recorded emotion speech was subjectively evaluated by 4 persons. The final database contained 720 patterns: 360 patterns for one-word sentences 360 patterns for multiword sentences) Emotions: 1- anger, 2- boredom, 3- pleasure 4- sadness H N R S The sentences was read by professional actors (2 f + 1 m) Speech recording: in a professional recording studio format “wav“ sampling frequence 44.1 kHz, 24bit 8/14
Method : The Patterns Based on Music Theory. IMCSIT-AAIA Wisla, Poland The method is based on the idea of the musical interval: The frequency difference between a specific n-tone and reference tone. Example: quint is frequency ratio of the fifth tone divided by the first tone = Int.1st2nd3rd4th5th6th7th8th Var.Min Maj Min Maj Min Maj Min Maj FR /14
IMCSIT-AAIAWisla, Poland Method: The Patterns Based on Musical Theory. The reference frequency (F0) is given by the choices in each utterance feature. The frequency ratios are compared with the music intervals. fifth circle fifth = f3/f2 geometric series tone affinity – decrease from n=1 to n=7 - increase from n=8 to n=13 10/14
Experimental Results IMCSIT-AAIA Wisla, Poland U-matrix H - anger R - pleasure S - sadness N - boredom 11/14 One-word sentencesMulti-word sentences
Wisla, Poland IMCSIT-AAIA Conclusion – for music theory Comparison to some publications: Success classifications 54-64% standard classifier 81 % ANN hight note versus 12 half tones Korea language Our results - success classifications: 74% (MLNN) QE / TE QE / TE / / (SSOM) 1 word sentence multiword sentence 12/14
Wisla, Poland IMCSIT-AAIA 13/14 Conclusion – future work Our effort in future work: ANN application in prosody modelling: we want to apply results from the described experiments with emotional speech to the improvement of synthetic speech naturalness ANN application in children’s disordered speech analysis developmental dysphasia
These different domain of the application influence the database creation. Multiword sentences are more acceptable for prosody modelling. One-word sentences is suitable for the analysis of children’s disordered speech. WHY? often a speech malfunction is manifested in an inability to pronounce whole sentences Wisla, Poland IMCSIT-AAIA 0 Conclusion – future work 14/14
Wisla, Poland IMCSIT-AAIA Thank you for your attention The End