Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dr. O. Dakkak & Dr. N. Ghneim: HIAST M. Abu-Zleikha & S. Al-Moubyed: IT fac., Damascus U. Prosodic Feature Introduction and Emotion Incorporation in an.

Similar presentations


Presentation on theme: "Dr. O. Dakkak & Dr. N. Ghneim: HIAST M. Abu-Zleikha & S. Al-Moubyed: IT fac., Damascus U. Prosodic Feature Introduction and Emotion Incorporation in an."— Presentation transcript:

1

2 Dr. O. Dakkak & Dr. N. Ghneim: HIAST M. Abu-Zleikha & S. Al-Moubyed: IT fac., Damascus U. Prosodic Feature Introduction and Emotion Incorporation in an Arabic TTS Presented by Dr. O. Al Dakkak

3 Outline Arabic TTS Why Prosody generation? Prosody Analysis and Rule Extraction Emotion Inclusion Results Conclusion

4 Arabic Text-to-Speech System –Arabic Text-to-Phonemes (ATOPH) Including open /E/, /O/ phonemes and emphatic vowels –Use of MBROLA Diphone units to synthesize speech Till our semi- syllables are ready ( Corpus is currently being recorded) –Prosody Generation and Emotion Inclusion

5 Arabic Text-to-Speech System –MBROLA permits to synthesize phonemes. With control on duration and F0 contour (a set of segments) and we implemented a tool to control the Amplitude. –Absent phonemes are replaced by the nearest present phonemes –Possibility to generate and test prosody

6 Why Prosody Generation? Increase intelligibility & expressionality. Provides the context in which speech is interpreted Signals speaker intentions (special aids) Man-machine communication (airports,..) Doublage*

7 Methodology Based on the punctuation marks (‘,’, ‘.’, ‘?’ and ‘!’) we classify sentences into: continuous affirmation, long affirmation, interrogative, exclamation; respectively. Recording a corpus and Analysis of its sentences to produce F0, and intensity curves Statistical study of the curves and Rule extraction to generate them automatically.

8 The corpus Use of a pre-recorded corpus, of 12 short sentences for each type, 5 speakers (4 m. & 1 f.). Each sentence has 14 phonemes at most. Recording of other 10 sentences of variable lengths pronounced by 3 speakers. –short : 4-20 phonemes, –medium : 20-40 phonemes –long : more than 40 phonemes. The curves of F0, intensity were available for the pre-recorded corpus and were computed for the further set of recording.

9 Rules Extraction Re-definition of the length concept, using fuzzy sets:

10 Rules Extraction Curve stylization after stochastic analysis, ex:

11 Emotion Inclusion Recording a corpus of 5 different emotional sentences (joy, anger, sadness, fear & surprise) with their emotionless versions (20 sentences/emotion). Measures of prosodic features F0, duration and intensity, with their variations (Praat). Extraction of rules to automatically produce emotion on synthetic speech. Rules Validation.

12 أَهُوَ ذَنْبِي أَنْ أَتَحَمَّلَ أَنَا ذلك؟ Is it my fault to bear it? Pitch: variation of F0 Range: difference between F0max & F0min F0 Averag: Mean value Contour slope: shape of contour slope (range variation). Variability: deg. Of it (high, low..). Jitter: Irregularities between successive glottal pulses

13 Example: Anger emotion F0 mean: + 40%-75% F0 range: + 50%-100% F0 at vowels and semi-vowels: + 30% F0 slope: + Speech rate: + Silence rate: - Duration of vowels and semi-vowels: + Intensity mean: + Intensity monotonous with F0 Others: F0 variability: +, F0 jitter: +

14 Analysis & Rule Extraction: Anger With emotion emotionless

15 Emotion Synthesis: Anger F0 mean: + 30% F0 range: + 30% F0 at vowels and semi-vowels: +100% Speech rate: +75%-80% Duration of vowels and semi-vowels: +30% Duration of fricatives: +20%

16 Synthetic examples emotionlesswith emotion -Anger: -Joy: -Sadness : -Fear: -Surprise: “who do you think you are?” “no more clouds in the sky” “I’m so sad today” “What a scary scene!” “What a beautiful scene!”

17 EmoGen Normal text to MBROLA text Converter (NTMTC) Prosody Generator Emotion Generator Mbrola Player interface Input Text Voice Interface Text Editor Speech and emotion properties

18 Results Five sentences for each emotion were synthesized and listened by 10 people. Each listener gives the perceived emotion for each sentence (we don’t provide our list of emotions)

19 Results

20 Conclusion An automated tool for emotional Arabic synthesis has been developed The prosodic model proposed and tested in this work proved to be successful. Especially in conversational context: Further work will follow to include other emotions: Disgust, Annoyance,…


Download ppt "Dr. O. Dakkak & Dr. N. Ghneim: HIAST M. Abu-Zleikha & S. Al-Moubyed: IT fac., Damascus U. Prosodic Feature Introduction and Emotion Incorporation in an."

Similar presentations


Ads by Google