Pitch and Formants 1. Harmonics (giving pitch) produced by vocal cord vibration 2. Formant frequencies: resonances of the vocal tract 3. Formant frequencies change as you change the shape of your vocal tract
Sex change Me (m) Higher pitch Shorter vocal-tract (higher formants) Both (-> f)
Prosody vs Segments Segmental: consonants / vowels -> words Prosodic: pitch contour, stress. Emphasis, pragmatics. “I thought she was married?” & "….!" “I thought she was married.” “I thought she was married!” “I thought she was married!” NB in tone languages pitch used segmentally. Are bird/mammal animal systems like human prosody? generally use different pitch contours yes!
Mynah bird speech Klatt &Stefanski (1974) How does a mynah bird imitate human speech? J Acoust Soc Amer, 55, 822-832.
Mynah / Grey parrot Mynah produces "formants" but probably through changing syrinx resonances, not through changing vocal tract shape. (Klatt & Stefanski, 1974, J Acoust Soc Amer) Grey parrot has a longer vocal tract and may use changes in its shape to produce formant variation (more like human speech). (Warren, Patterson, Pepperburg, 1996, Auk)
Characteristics of speech No gaps between words Smoothly changing sound from one speech sound to the next So you can’t just shuffle the acoustic “words” Only silence is /g/ of “ago” Narrow-band spectrogram
Speech is more like semaphore than like music Music: discrete targets giving discrete acoustic events Semaphore: discrete targets with transitions between targets Speech: articulatory transitions between targets
Formants in a wide-band spectrogram “w e g o” <-- F1 <-- F 2 <-- F 3 Burst -->
time "bag" Where are the segments? F1 F2 F3 /bæg/ Where are the segments?
Speech is more like speech than like semaphore Speech does not have invariant acoustic targets: consonants change with the vowel. Compare /s/ in /si/ with /s/ in /su/ This is due to co-articulation.
Different transition - same consonant dee da 1400 Hz Liberman et al. (1967) Perception of the speech code. Psych Rev 74, 431-461
Co-articulation Arises because (mainly) consonant gestures don’t involve all the articulators: eg /b/ is lips only, tongue free to take up position for next vowel. /d/ and /s/ just involve the tongue tip, touching the alveolar ridge, tongue body and lips free to take up position for next vowel - viz. /si/ /su/.
Same noise - different consonant pea ka 1400 Hz F 2 F 1 Burst --> Liberman et al. (1967) Perception of the speech code. Psych Rev 74, 431-461
Two articulatory systems Öhman suggested that articulation can be decomposed into two semi-independent systems: Slow movement from one vowel target to next eg /i/ -> /u/ Rapid consonantal movement superimposed eg /b/ /d/ So the /b/ in /ibu/ is not the same as in /ibi/
Co-articulation Advantages 1. information about different segments is spread across time (Hockett’s squashed eggs). You know that a /u/ is coming because of the type of /s/ you have heard.
Co-articulation 2. Liberman thought that this spreading across time makes it easier to transmit information at a fast rate. Liberman et al (1967) Psych Rev 74, 431-461
The disadvantage of co-articulation for perception is that there are no constant acoustic targets in speech. The same phoneme can be represented as different sounds in different contexts (/s/ before /u/ or /i/. Conversely, the same sound, can be heard as different consonants in different contexts (eg as /p/ before /i/ and /a/ but as /k/ before /u/). Co-articulation - 2
Speech Code Articulatory movement Co-articulation Rapid speech /djewonega?at/ Different vocal-tract sizes: men 15% longer than women Different dialects ---> Factors that make it hard (for machines) to recognise speech