# ACOUSTICAL THEORY OF SPEECH PRODUCTION

## Presentation on theme: "ACOUSTICAL THEORY OF SPEECH PRODUCTION"— Presentation transcript:

ACOUSTICAL THEORY OF SPEECH PRODUCTION
Robert A. Prosek, Ph.D. CSD 301

Acoustical Theory There is nothing more practical than a good theory
The linear source-filter theory is one of the best in our field Based on Gunnar Fant’s “Acoustic Theory of Speech Production” The theory expresses articulatory-acoustic relationships

Acoustical Theory The source is vocal fold vibration
for some consonants, the source is more complex can be in the vocal tract or a combination of both The filter is the vocal tract extending from the vocal folds to the lips or nares like all filters, the vocal tract is frequency dependent

Acoustic Theory The source and the filter are assumed to be independent this is an assumption made for convenience it implies that you can change the output of the vocal folds without changing the vocal tract vice-versa

Vowels Modeled as a tube closed at one end and open at the other
the closure is a membrane with a slit in it the tube has uniform cross sectional area membrane represents the source of energy (vocal folds) the energy travels through the tube the tube generates no energy on its own the tube represents an important class of resonators odd quarter length relationship Fn=(2n-1)c/4l

Vowels (2) There are an infinite number of resonances for this tube
we need only consider the first three or four the model is valid to only about 5 kHz The model was developed by Chiba and Kajiyama in 1941 based on pipe organs for which a great deal was known

Vowels (3) If c=35000 cm/s, and l=17.5 cm
What are the first three resonances? The simple tube closed at one end and open at the other, with the above length, is a reasonable approximation of /ᴧ/ produced by a male talker

Vowels (4) Some points to note:
A curved tube (vocal tract) and a straight tube (model) behave identically acoustically out to 5 kHz this is because the curve begins to affect acoustic signals with a short wavelength The resonances are equally spaced if the tube has uniform cross sectional area Remember: all of the energy comes from the source (vocal fold vibration for vowels) Changing the length of the tube changes the resonance frequencies Influenced by age and sex l= 14.5 cm for females l= 8.75 cm for children

Vowels (5) A one-vowel model isn’t very useful
Different vowels are modeled, acoustically, by different vocal tract shapes Phonetically, how are vowels distinguished? If we place a constriction in the tube (vocal tract) the resonances changes if you change the articulation, you change the vocal tract shape, and the resonance frequencies, amplitudes and bandwidths

Vowels (6) the amplitude of the harmonics decreases by -12 dB/octave
The output energy of a vowel is the product of the source energy the size and shape of the resonator the radiation characteristic Glottal source characteristics for vowels vocal fold vibration is periodic what does this imply for the spectrum? f0 or F0 is used to indicate the vocal fundamental frequency the amplitude of the harmonics decreases by -12 dB/octave

Vowels (7) Filter characteristics for vowels
the vocal tract is a dynamic filter it is frequency dependent it has, theoretically, an infinite number of resonances each resonance has a center frequency, an amplitude and a bandwidth for speech, these resonances are called formants formants are numbered in succession from the lowest F1, F2, F3, etc. A1, A2, A3, etc. B1, B2, B3, etc. the formants together form the transfer function input-output relationship formants become physically evident only when energized

acoustic effect when a sound leaves a small area and enters a large one The effect is to raise the slope of the spectrum by +6 dB/octave Acoustic Phonetic Relationships for Vowels F1 is inversely related to tongue height F2 is directly related to tongue advancement Lip rounding lowers all formant frequencies

Vowels (9) Perturbation Theory
Volume velocity variations reflect the way air particles vibrate at a particular point in the vocal tract At some points, vibration is minimal (node); at others, maximal (antinodes) For F1, the antinode is at the open end and the node is at the closed end For F2, there are two antinodes and two nodes For F3, there are three antinodes and three nodes etc.

Vowels (10) Perturbation Theory (continued)
if a change in cross sectional area is applied (a perturbation) the acoustic effect depends on proximity to a node or an antinode near an antinode the formant frequency lowers near a node the formant frequency rises lip constrictions lower all formant frequencies laryngeal constrictions raise all formant frequencies

Vowels (11) Amplitude relationships
amplitudes depend on formant frequencies if F1 is lowered (raised), A1 lowers (rises) if two formant frequencies move closer together, then both peaks increase in amplitude how do you raise or lower formant frequencies?

Vowels (12) Source-Filter Interactions
Some vocal tract shapes may affect vocal fold vibration Singers’ formant High impedance constrictions require greater subglottal air pressure Vocal tract - vocal fold coupling during open phase of vibratory cycle

Consonants (1) The linear source-filter theory can be used to describe the acoustics of consonants as well as vowels For consonants, however, the source is not always at the level of the vocal folds some sources are in the vocal tract these sources are aperiodic durations and amplitudes also are different from vowels Nonetheless, source-filter theory gives us a series of expectations for the acoustic characteristics for consonants

Consonants (2) Fricatives
Modeled as a tube with a very severe constriction The air exiting the constriction is turbulent The Reynold’s number gives the conditions for turbulence Re=vh/ʊ Notice that turbulence can be generated in two ways Zeros or antiformants can be found in the spectrum Because of the turbulence, there is no periodicity unless accompanied by voicing What does an aperiodic spectrum look like?

Consonants (3) When a fricative constriction is tapered
the back cavity is involved this resembles a tube closed at both ends Fn=nc/2l such a situation occurs primarily for articulation disorders

Consonants (4) Nasal consonants
Velopharyngeal port is open and the oral cavity is completely blocked at some point The side-branch resonator produces antiformants (zeros) The overall vocal tract is longer than for vowels What effect does this have on the spectrum? Oral formants, nasal formants, nasal antiformants Nasal murmur

Consonants (5) Stops The tube model is not altered very much for stops
However, the time domain becomes critical There is a complete closure of the vocal tract somewhere Pressure builds up behind the closure Rapid release The articulation results in a burst and transitions

Consonants (6) Other consonants are variations of these Affricates
Liquids Glides Diphthongs