Spectrogram & its reading by Tae-Yeoub Jang
What is spectrogram? Begin to be used since 1940s Another representation of frequency domain analysis The most popular way of representing spectral information 3 dimensional representation X-axis: Time Y-axis: Frequency Darkness (or color): Energy Reviving Sonus
Spectrogram example (color resolution of word “compute”) Reviving Sonus
Spectrogram example (grayscale of word “compute”) Reviving Sonus
Wideband vs. Narrowband spectrograms of the question "Is Pat sad, or mad?" The 5th, 10th and 15th harmonics have been marked by white squares in two of the vowels Reviving Sonus
Types of spectrogram Wideband spectrogram Narrowband spectrogram better time resolution eg) 15 msec window, 1 msec shift, 125 Hz bandwidth Narrowband spectrogram better frequency resolution eg) 50 msec window, 1 msec shift, 40 Hz bandwidth Reviving Sonus
Advantages & Disadvantages Time alignment Disadvantages Less reliable than waveform Reviving Sonus
Vowel Spectrogram Formant frequencies are critical cues for vowel distinction F1: Height high vowels: low F1 F2: Backness back vowels: low F2 Reviving Sonus
Example formant frequencies of English monophthongs F3 2900 2550 2490 2640 2380 2300 2500 2390 F2 2250 1900 1770 1660 1100 1030 870 1500 1190 F1 280 400 550 690 710 450 310 900 640 Reviving Sonus
"heed, hid, head, had, hod, hawed, hood, who'd" (a male speaker, American English) Reviving Sonus
Consonant Spectrogram General Acoustic structure more complicated than vowels Adjacent sounds (especially vowels) convey important information locus High frequency characteristics especially for fricatives and affricates Reviving Sonus
What is LOCUS Information of formant transition from vowels into obstruents or from obstruents into vowels The target frequency that each formant transition is heading toward as an obstruction is made, or the frequency the transition comes as the obstruction is released The characteristic of the consonantal place and manner roughly the same in different vowel contexts Reviving Sonus
Stops General Fairly distinct locus for each place Burst Silence during the closure (only at syllable onset position) Virtually no difference during the closure Reviving Sonus
Stops (cntd.) Voicing distinction voiced: vertical striations for voiced sounds, less abrupt burst, frequently weakened to be like fricatives or approximants voiceless: generally abrupt burst at higher frequency area Reviving Sonus
Stops (cntd.) Place distinction bilabial alveolar velar relatively low F2, F3 locus rising into and falling out of vowel weak and spread vertical lines alveolar F2 locus about 1800 Hz Strong vertical lines velar Velar pinch: vowels F2, F3 merging often double burst long formant transitions Reviving Sonus
Stops (cntd.) Manner distinction Silence duration, VOT, vowel F0 aspirated short long high tense lax med low Reviving Sonus
Examples -- “a bab, a dad, a gag” Reviving Sonus
Place dependent loci Reviving Sonus
Fricatives General Random noise pattern especially in high frequency regions Place distinction Labiodental [f, v]: rising locus into the following vowel Dental [, ð]: major energy above 6000Hz Alveolar [s, z]: major energy above 4000Hz Alveopalatal [š, ž ]: major energy above 6000Hz Glottal [h]: the trace of formant frequencies of neighbouring vowels Reviving Sonus
Fricatives (cntd.) Weak vs. strong Strong [s, z, š, ž ]: darker bands Weak [f, v, , ð ]: spread and fainter Voiced [v, ð ]: often so weak and confused with nasals or approximants Cues to tell [] from [f]: higher formants of [] fall into adjacent vowels Reviving Sonus
Example – “fie, thigh, sigh, shy” Reviving Sonus
Example – “ever, weather, fizzer, pleasure” Reviving Sonus
Nasals General Place distinction Formants similar to vowels but fainter Very low F1 (about 250Hz), F2 (about 2500Hz), and F3 (about 3250Hz) Place distinction bilabial [m]: downward F2, F3 locus alveolar [n]: less amount of F2 transition velar [ŋ ]: velar pinch Reviving Sonus
Examples -- “a Pam, a tan, a kang” Reviving Sonus
Liquies & Approximants General Formants similar to vowels but fainter (especially at high frequency regions) Approximately F1(250Hz), F2(1200Hz), F3(2400Hz) Change in formant structure Reviving Sonus
Liquids & Approximants (cntd.) Phone specific properties Labial glide [w]: very low F1, F2 (600-1000Hz|) and gets too close to each relatively low F3 rapid falloff of spectral amplitude Palatal glide [y]: extremely low F1 extremely high F2, F3 Reviving Sonus
Liquids & Approximants (cntd.) Phone specific properties (cntd.) Flap [Ր]: soft burst, short duration Retroflex [r]: F3 dipping down close to F2 General lowering of F3, F4 Lateral [l]: Low F1, F2 (approx. F1 250Hz, F2 1200Hz) usually substantial energy in the high F region Reviving Sonus
Example – “led, red, wed, yell” Reviving Sonus
Final remarks Spectrogram is not the only cue for acoustic distinction of speech sounds Very often, the waveform is more reliable Reviving Sonus
References & Links http://cslu.cse.ogi.edu/tutordemos/SpectrogramReading/spectrogram_reading.html http://hctv.humnet.ucla.edu/departments/linguistics/VowelsandConsonants/course http://www.cs.indiana.edu/~port/teach/306/speech.acoustics.html http://www.phon.ucl.ac.uk/courses/spsci/b203/week2-5.pdf Reviving Sonus