Řízení prozodie řečového syntezátoru Václav Šebesta, Ústav informatiky AV ČR, v.v.i. MFF UK 11.12.2009.

Řízení prozodie řečového syntezátoru Václav Šebesta, Ústav informatiky AV ČR, v.v.i. vasek@cs.cas.cz MFF UK 11.12.2009

Content of contribution 1.Introduction 2.Speech synthesizer 3.Phonetic and phonologic features of speech units 4.GUHA method 5.Sensitivity approach 6.Weight approach 7.Combination of above mentioned approaches 8.Experimental results 9.Listening tests

MFF UK 11.12.2009 The text-to-speech synthesizer of the Czech language is based on the concatenation of the elementary speech units (phonemes, diphones or triphones). The text-to-speech synthesizer of the Czech language is based on the concatenation of the elementary speech units (phonemes, diphones or triphones). The speech output from such a synthesizer, represented by fundamental frequency F 0 and duration D of each speech unit, is monotonous and sounds very synthetically. The speech output from such a synthesizer, represented by fundamental frequency F 0 and duration D of each speech unit, is monotonous and sounds very synthetically. The naturalness of speech is considerably dependent on the implementation of the prosodic features. The naturalness of speech is considerably dependent on the implementation of the prosodic features. In this contribution one can see several possible ways for selection of suitable parameters for the training of F 0 and D and we will compare these ways. In this contribution one can see several possible ways for selection of suitable parameters for the training of F 0 and D and we will compare these ways. Introduction

MFF UK 11.12.2009 Phonetic transcription Search of F 0 in database Database of speech units Speech production F FUN F 0 PROS Duration Prosody controlspeech Written text F0F0

MFF UK 11.12.2009 Speech signal Database Segmentation to speech units F 0, D (target values) Text Database Segmentation Transcription Phonetic and phonological properties Input data for ANN Segmentation Transcription Synthesizer Input text ANN F0 D Speech Training of ANN

MFF UK 11.12.2009 Speech signal Database Segmentation to speech units F 0, D (target values) Text Database Segmentation Transcription Phonetic and phonological properties Input data for ANN Segmentation Transcription Synthesizer Input processing text ANN F0 D Speech Prosody control by ANN

MFF UK 11.12.2009 P1P11P21Identifikace mezery P2P12P22Identifikace přízvučné jednotky P3P13P23Identifikace středu slabiky P4P14P24Identifikace oddělovacího znaménka P5P15P25Identifikace fonému P6P16P26Výška samohlásky P7P17P27Délka samohlásky P8P18P28Znělost souhlásky P9P19P29Způsob vytváření souhlásky P10P20P30Počet řečových jednotek ve slově Fonetické a fonologické vlastnosti řečových jednotek a vliv jejich kontextu

Pruning according to the GUHA Method. The GUHA method may be used for the determination of relations in experimental data. The processed data form a rectangular binary matrix of attributes, where the rows correspond to the different objects (speech units). The GUHA method may be used for the determination of relations in experimental data. The processed data form a rectangular binary matrix of attributes, where the rows correspond to the different objects (speech units). Columns correspond to the different investigated parameters (phonetic and phonological parameters, e.g. type of phoneme, length of world, pause identification, stress unit identification, etc.). Columns correspond to the different investigated parameters (phonetic and phonological parameters, e.g. type of phoneme, length of world, pause identification, stress unit identification, etc.). S Non S Aab a + b Non A cd c + d a + c a + c b + d b + d n = a + b + c + d n = a + b + c + d MFF UK 11.12.2009

GUHA Method Quantifier FIMPLE (Found almost implication quantifier) is valid iff a / (a + b )  FBOUND. a / (a + b )  FBOUND. Quantifier LIMPLE (Lower critical implication quantifier) is valid iff  p i (1 – p) a+b-i < LBOUND Quantifier FISCHER (Fischer test) is valid iff a  a min, a. d > b. c and Quantifier FISCHER (Fischer test) is valid iff a  a min, a. d > b. c and a + c i b + d a + b – i n a + b i = a min (a + b, a + c)  Fisch a + b i MFF UK 11.12.2009

GUHA Method It is possible to determine different levels of hypotheses, e.g.: „weak hypotheses“ when p = 0.8, LBOUND = 0.5, a Fisch = 10 -20 „weak hypotheses“ when p = 0.8, LBOUND = 0.5, a Fisch = 10 -20 „stronger hypotheses“ when p = 0.9, LBOUND = 0.1, a Fisch = 10 -25 „stronger hypotheses“ when p = 0.9, LBOUND = 0.1, a Fisch = 10 -25 „strong hypotheses“ when p = 0.95, LBOUND = 0.01, a Fisch = 10 -30 „strong hypotheses“ when p = 0.95, LBOUND = 0.01, a Fisch = 10 -30 „extremely strong hypotheses“ when p = 0.99, LBOUND = 0.01, a Fisch = 10 -35 „extremely strong hypotheses“ when p = 0.99, LBOUND = 0.01, a Fisch = 10 -35 MFF UK 11.12.2009

Pruning according to the sensitivity approach The determination of input parameters, which can be omitted according to the comparison of sums of absolute values of output signals derivatives :

MFF UK 11.12.2009 Standard pruning approach according to the weights The determination of input parameters, which can be omitted according to the comparison of sums of absolute values of weights : all inputs of neuron neuron

MFF UK 11.12.2009 Combination of above mentioned 3 approaches: If some attributes of proposed ANN (for F0 and D) for 2 from 3 approaches give recommendation for pruning, then input parameter have been pruned. We can prune the ANN by a lot of known methods: Mozer, Smolensky (1989) „sensitivity calculation“ Karnin (1990) „sensitivity of error function“ Le Cun et all (1990) „saliency“ Chauvin (1989) „penalty terms“ Weigend, Rumelhard, Hubermann (1991) Sietsma, Dow (1991) „heuristics“ Šebesta (1993) „statistical derivatives“

# Parameter Criterion GUHA GUHASensitivityWeights P1 Silent pause identification + P2 Stress unit identification X + P3 Syllable nucleus identification X + P4 Punctuation mark identification + X + P5 Phoneme identification X + P6 Height of vowel X + P7 Length of vowel X P8 Voice of consonant X + P9 Creation mode for consonant X + + P10 Number of phonemes X + MFF UK 11.12.2009 X..... F0 +.......D  can be omitted

# Parameter Criterion GUHA GUHASensitivityWeights P11 Silent pause identification X + P12P12P12P12 Stress unit identification X + X + P13P13P13P13 Syllable nucleus identification X + + + P14P14P14P14 Punctuation mark identification X + P15P15P15P15 Phoneme identification P16P16P16P16 Height of vowel X + P17P17P17P17 Length of vowel X X X + P18P18P18P18 Voice of consonant P19P19P19P19 Creation mode for consonant X + X + P20P20P20P20 Number of phonemes MFF UK 11.12.2009 X..... F0 +.......D *........common  can be omitted

# Parameter Criterion GUHA GUHASensitivityWeights P21P21P21P21 Silent pause identification X + P22P22P22P22 Stress unit identification P23P23P23P23 Syllable nucleus identification + + P24P24P24P24 Punctuation mark identification X + X + X + P25P25P25P25 Phoneme identification P26P26P26P26 Height of vowel XX P27P27P27P27 Length of vowel X + X + P28P28P28P28 Voice of consonant + X + P29P29P29P29 Creation mode for consonant X + P30P30P30P30 Number of phonemes X + MFF UK 11.12.2009 X..... F0 +.......D *........common  can be omitted

MFF UK 11.12.2009

The histogram of the numbers of sentences with the best prosody in dependency on the number of input neurons

MFF UK 11.12.2009 Sentence: Repeat entire operation once again! (test sentence)

MFF UK 11.12.2009 Sentence: Really do you want to finish? (training sentence)

MFF UK 11.12.2009

Listening test of synthetic speech 1.Prosody controlled by non-pruned neural net. 2.Prosody controlled by pruned neural net trained according the weights. 3.Prosody controlled by pruned neural net trained according the GUHA method. 4.Prosody controlled by pruned neural net trained according the sensitivities. 5.Prosody controlled by optimal values F 0 and durations. 6. Direct speech. MFF UK 11.12.2009

1 2 3 4 5 6 1. Weather forecasting for the night and tomorrow. Předpověď počasí na noc a zítřek. 2. Eastern wind from two to five meters per second. Východní vítr dva až pět metrů za sekundu. 3. Pressure tendency: week increase, afternoon week decrease. Tlaková tendence: slabý vzestup, odpoledne slabý pokles. 4. Maximal value of ultra-violet index is five point eight. Maximální hodnota UV-indexu pět celých osm. 5. There was the news of the Czech radio one- Radiojournal. To byly zprávy Českého rozhlasu 1 – Radiožurnálu.

Řízení prozodie řečového syntezátoru Václav Šebesta, Ústav informatiky AV ČR, v.v.i. MFF UK 11.12.2009.

Similar presentations

Presentation on theme: "Řízení prozodie řečového syntezátoru Václav Šebesta, Ústav informatiky AV ČR, v.v.i. MFF UK 11.12.2009."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Řízení prozodie řečového syntezátoru Václav Šebesta, Ústav informatiky AV ČR, v.v.i. MFF UK 11.12.2009.

Similar presentations

Presentation on theme: "Řízení prozodie řečového syntezátoru Václav Šebesta, Ústav informatiky AV ČR, v.v.i. MFF UK 11.12.2009."— Presentation transcript:

Similar presentations

About project

Feedback