Presentation is loading. Please wait.

Presentation is loading. Please wait.

Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto.

Similar presentations


Presentation on theme: "Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto."— Presentation transcript:

1 Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto

2 Two observations: Sound systems of natural languages underexploit the sound producing capabilities of humans. The sounds that are used in natural languages vary in frequency of occurrence.  /a/, /i/, /u/ are common; /š/, /õ/ are rare.  /p/, /t/ are common; />/, /q/ are rare.

3 Why are certain speech sounds favored? Possibilities: They are easy to hear (i.e., to distinguish from other sounds). They are easy to produce. They are easy to learn.

4 The role of auditory distinctiveness in the design of vowel inventories Liljencrants & Lindblom (1972) Diehl, Lindblom, & Creeger (2002, 2003)

5 Liljencrants & Lindblom (1972) Possible vowel sound: Any vowel-like output of a computational model of the human vocal tract (Lindblom & Sundberg, 1971). Auditory distance: Euclidean distance between any two vowel sounds i and j in a space defined by the frequencies of the first several formants: D ij = ((∆M 1 ) 2 + (∆M’ 2 ) 2 ) 1/2. Selection criterion: For any given inventory size, select those vowels whose pairwise distances, D ij, are maximal.

6 Predicted vowel systems (Liljencrants & Lindblom (1972)

7 A problem: Too many high vowels

8 These simulations were unrealistic in at least two ways: Acoustic distance (based on formant frequencies) is probably not a good proxy for auditory distance. Vowel sounds do not naturally occur in conditions of total quiet.

9 Improving the realism of the simulations (Diehl, Lindblom, and Creeger, 2002) Define a notion of ‘auditory distance’ based on plausible auditory representations of vowel sounds. Model vowel systems as they would have emerged under natural conditions of background noise.

10 From acoustic to auditory representations Hz to Bark * Input Output Auditory filtering

11 Computing distances among auditory spectra 1.At each point along the Bark dimension, calculate the difference in Phons/Bark between any vowel pair. 2.Square these differences. 3.Sum the squares. 4.Take the square root of the sum. This is a measure of the Euclidean auditory distance between two vowels.

12 Effects of the auditory transform Auditory-based System Formant-based System

13 Effects of the auditory transform Auditory-based System Formant-based System

14 Effects of the auditory transform The problem of excessive high vowels is reduced—but not eliminated.

15 Effects of adding background noise Hypothesis: Vowel systems have evolved to be perceptually robust even at unfavorable signal/noise ratios.

16 Method We used noise whose spectral shape mimicked the long-term average for speech (-6 dB/octave). We computed auditory distances among vowels at 8 different S/N ratios, ranging from 10 dB to -7.5 dB. We then averaged these distances to determine the optimal vowel systems.

17 QuietNoise Effects of adding background noise (3 vowel system)

18 Effects of adding background noise (5 vowel system) Quiet Noise

19 Effects of adding background noise (7 vowel system) Quiet Noise

20 Effects of adding background noise (9 vowel system) Quiet Noise

21 Comparisons with actual vowel inventories The reduction in the number of high vowels (relative to the Liljencrants & Lindblom simulations) yields a much better fit with actual vowel systems. Some fronting/unrounding of the high, back vowel /u/ also appears to be common among the world’s languages (e.g., Japanese and many other 5-vowel systems, American English).

22 Why does background noise reduce the number of high vowels? First formant information tends to be more noise-resistant than higher formant information. This warps the auditory-distance space for vowels: the front-back dimension contracts relative to the open-close dimension. This, in turn, leaves less room for high vowels.

23 More recent modeling (Diehl, Lindblom, and Creeger, 2003) By further improving the realism of our auditory model by incorporating temporal (phase locking) information as well as spectral (excitation pattern) information, we obtain predicted vowel systems that fairly closely match observed systems even without the presence of background noise.

24 Preferred vowel inventories are reasonably well predicted on the basis of a principle of maximal auditory contrast. What about preferred consonant inventories?

25 Voice distinctions Many languages distinguish certain consonants (e.g., /b/ vs /p/, /d/ vs /t/) based on the differences in voice onset time (VOT). This is the interval between the opening of the vocal tract and the onset of vocal fold vibration (voicing).

26 Voice categories across languages (Lisker & Abramson 1964)

27 Why do languages select from these three categories of VOT? One possibility: aerodynamic and biomechanical factors. Another possibility: enhanced discriminability at -20 ms VOT and +20 ms VOT yields robust perceptual distinctions between the three categories. Evidence: human infants, chinchillas, nonspeech analogs of VOT

28 Voice onset time and tone onset time Time (ms) -50 ms +50 ms Frequency A B Time (ms) Frequency

29 Discriminability of TOT stimuli

30 Are TOT categories that are consistent with the natural boundaries more learnable? (Holt, Lotto, and Diehl, JASA, 2004)

31 Summary of VOT results: Preferred voice categories are more discriminable than other possible voice categories. The results of Holt, Lotto, and Diehl (2004) suggest that they are also more learnable.

32 Conclusion Cross-language preferences in speech sound systems appear to reflect performance constraints on talkers, listeners, and language learners.

33 Unsolved problems Measuring articulatory energy costs Weighting contributions of auditory distinctiveness, least effort, and learnability Predicting variability


Download ppt "Adaptive Design of Speech Sound Systems Randy Diehl In collaboration with Bjőrn Lindblom, Carl Creeger, Lori Holt, and Andrew Lotto."

Similar presentations


Ads by Google