Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data-driven approach to rapid prototyping Xhosa speech synthesis Albert Visagie Justus Roux Centre for Language and Speech Technology Stellenbosch University.

Similar presentations


Presentation on theme: "Data-driven approach to rapid prototyping Xhosa speech synthesis Albert Visagie Justus Roux Centre for Language and Speech Technology Stellenbosch University."— Presentation transcript:

1

2 Data-driven approach to rapid prototyping Xhosa speech synthesis Albert Visagie Justus Roux Centre for Language and Speech Technology Stellenbosch University South Africa

3 Introduction Japan-South African Intergovernmental Science and Technology Cooperation Programme. Goals: –Understand what is needed from a linguistic and technology standpoint. –Build a text-analysis front-end. –Experimental platform.

4 Outline Xhosa: –orthography, –phonetics, –tone Approach: –Text analysis, –HTS.

5 Xhosa Xhosa is spoken in South Africa, by about 8 million people. One of the official languages of South Africa Writing system is relatively young, and based on English letters. Many dialects. Borrowed clicks from Khoisan.

6 Xhosa: Orthography Agglutinative language. Nouns: –15 classes (including plural & singular). –Nouns affixed for dimunitive. Verbs: –Verbs affixed according to subject, tense, negative etc. Examples: teach: -fund- preacher (teacher): umfundisi  u + m(u) + fund + is + i small preacher: umfundisana  u + m(u) + fund + is + ana He/she will teach them: uzakubafundisa  u + za + ku + ba + fund + is + a

7 Xhosa: Phonetics Consonants: Implosive /b/ Ejectives and aspirated versions of stops. 15 Clicks Vowels Five basic vowels, including long versions.

8 Xhosa: Tone According to the literature, it’s a tone language. High, Low, and Falling tones. Recent dictionary: has tone marked for root morphemes, rules can be constructed to predict movement under morphological composition. Recent work: –Downing, Roux, argue for accent. –Kuun: Statistical experiment suggests highly regular structure. Observed regularity on pitch rises and duration increase gives a simple method to use in a first prototype.

9 Approach Focus on language dependent components: –Build the text analyser, –use an existing synthesiser. Choice: HTS 2.0 –Model driven, trainable synthesiser. –Contains language independent F0 and duration models –Good use of synthesis database by predicting spectrum, F0 and segment duration separately.

10 HTS

11 HTS: Symbolic Features Each segment of audio (HMM state) is labelled according to its linguistic context Examples: Phonetic context: labels of preceding and following phones. Parts-of-speech. Stress or canonical tone. Counting.

12 Text Analyser Components Components: –Orthographic to phonetic –Morphological analysis –Parts-of-speech –Canonical tone marks

13 Orthographic to Phonetic The orthography is very young, and highly consistent with the pronunciation. Hand-written letter-to-sound rewrite rules. Lexicon for loan words.

14 Morphology Specially bootstrapped from a Zulu version for this project. Requires a lexicon of root morphemes. Works with isolated words. Ambiguous! Ideal: root morpheme boundaries, affix types, POS tagger for disambiguation. Implemented: None

15 Parts-of-Speech Morphological analysis. Ideal: POS tagger. Implemented: Exhaustive lists of closed sets – pronouns, conjunctions, prepositions, etc.

16 Tone A printed dictionary with canonical tone markings for root morphemes is available. Rules can be constructed to determine movement of at least High tones, under morphological composition. Highly regular structure: 3 rd -from-last syllable starts high pitch excursion, 2 nd -from-last syllable lengthened. Ideal: Exhaustive specification of set tones Implemented: Word-level syllable counts (3-1, 2-2, 1-3)

17 Tests Basic intelligibility test: Listeners asked to transcribe what they hear. –Incomplete phrases. –Two versions of the question set, and natural utterances (recoded) –Mother-tongue and second language speakers. Impressions: –“He’s from the townships.” –“That’s perfect, there’s nothing wrong with that.” –Also frowns and repeats.

18 Next Steps Comprehension test? Impressions. Baseline comparative/preference test. Improvements –Question phrases. –Information from morphological analysis. –Canonical tone markings. Zulu

19 Conclusion The system worked very well, considering the bare minimum of knowledge currently incorporated. Data driven approach with HTS well suited to bootstrapping a new language. Got experimental platform

20 Demos “Ubangele amadoda amaninzi kule lali,” –Natural: –Synthesised: “waqalisa ukunqwenela ukuba nomzi.” –Natural: –Synthesised: Click song:


Download ppt "Data-driven approach to rapid prototyping Xhosa speech synthesis Albert Visagie Justus Roux Centre for Language and Speech Technology Stellenbosch University."

Similar presentations


Ads by Google