Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 4705 Lecture 4 CS4705 Sound Systems and Text-to- Speech.

Similar presentations


Presentation on theme: "CS 4705 Lecture 4 CS4705 Sound Systems and Text-to- Speech."— Presentation transcript:

1

2 CS 4705 Lecture 4 CS4705 Sound Systems and Text-to- Speech

3 Sound Systems of Language Phonetics –The sounds (phones) of the world’s languages, the phonemes they map to, and how they are produced Phonology –Rules that govern how phones are realized differently in different contexts Technologies: –Automatic Speech Recognition (ASR) systems take sounds as input and output word hypotheses –Text-to-Speech (TTS) systems take text as input and produce speech

4 Letters and Sounds same spelling = different sounds o comb, tomb, bomboo blood, food, good c court, center, cheeses reason, surreal, shy same sound = different spellings [i] sea, see, scene, receive, thief[s] cereal, same, miss [u] true, few, choose, lieu, do[ay] prime, buy, rhyme, lie combination of letters = single sound ch child, beachth that, bathe oo good, footgh laugh single letter = combination of sounds x exit, Texasu use, music ‘silent’ letters k knife, knowp psycho, pterodactyl e moose, bonegh through

5 Articulators lips teeth Alveolar ridge velum uvula pharyngeal vocal folds:glottis larynx trachea palate

6 Articulators in action “Why did Ken set the soggy net on top of his deck?” (Sample from the Queen’s University / ATR Labs X-ray Film Database)

7 Vocal fold vibration [UCLA Phonetics Lab demo]

8 Places of articulation http://www.chass.utoronto.ca/~danhall/phonetics/sammy.html labial dental alveolar post-alveolar/palatal velar uvular pharyngeal laryngeal/glottal

9 Articulatory parameters for English consonants (in ARPAbet) MANNER OF ARTICULATION VOICING: voiced voiceless

10 American English vowel space FRONTBACK HIGH LOW ey ow aw oy ay iy ih eh ae aa ao uw uh ah ax ixux

11 Acoustic landmarks “Patricia and Patsy and Sally” [p][t][p][t] [p][t] [l][sh][s] [n] [ix] [ih] [ax][ae][iy] [ae]

12 Syllables Syllabification important for –pronunciation: deny/denim –speaking rate calculation: syllables per second –word recognition in ASR (onset) + nucleus + (coda): –c a t –a –a t –t o Lexical stress: primary, secondary, terciary –telephone

13 Phonological Rules Not all instances of a given phone [x] sound/look alike Phoneme /x/ may have many allophones Phonological rules map phonemes in context to allophones, e.g.in context –simple rules: /{t,d}/ --> [  V’ _ V –FSA’s, FST’s –declarative constraints: t:  V’ _ V

14 Allophones of /t/ What we would consider a single ‘sound’ can be pronounced differently depending on the phonetic context. For example, the phoneme /t/: Figure 4.8: Jurafsky & Martin (2000), page 104.

15 Application: Word Pronunciation for TTS Pronouncing dictionaries (the: [‘dhax],[‘dhiy]) Problems: –Homographs (bass/bass, wind/wind, desert/desert) –Abbreviation (dr., st.) –Numbers (2125551212) –Acronyms (NAACL, IDIAP) –Morphological variation (unrelentingly) –Proper names and unknown words rules + dictionaries/dictionaries + rules

16 Hybrid model: –FSTs model individual word pronunciation in lexicon (e.g. reg-noun-stem entry c:k a:ae t:t) –FSAs model morphology (e.g. reg-noun-stem + s) –FSTs for pronunciation rules (e.g. s--> z) –special rules to model name and acronym pronunciation –default letter2sound rules for other words

17 Inventive (and sometimes useful) Approaches for Pronouncing Unknown Words Rhyming analogy: varoom/room, todo/dodo Linguistic origin: Infiniti, vingt, Perez Abbreviation expansion: –spacious living/dining rm w/frplc/dining room with fireplace –pls?

18 Summary Phones realize phonemes in different contexts –Different places and manners of articulation result in acoustic differences that can be detected by ASR systems as well as people Versatile FSTs can model phonological as well as morphological and spelling systems Many creative approaches toward pronunciation modeling for TTS Next time: Read Ch 5


Download ppt "CS 4705 Lecture 4 CS4705 Sound Systems and Text-to- Speech."

Similar presentations


Ads by Google