Presentation is loading. Please wait.

Presentation is loading. Please wait.

Structure of Spoken Language

Similar presentations


Presentation on theme: "Structure of Spoken Language"— Presentation transcript:

1 Structure of Spoken Language
CS 551/651: Structure of Spoken Language Lecture 9: The Source-Filter Model of Speech Production John-Paul Hosom Fall 2008

2 The Source-Filter Model
One more model of speech… proposed in 1848 by Johannes Müller, developed by Gunnar Fant circa Also called the “Acoustic Theory of Speech Production”. The Source-Filter Model provides a static description of speech; speech dynamics are dealt with in models of coarticulation. According to this model, speech is defined by three parts: A sound source vibration of the vocal folds, air turbulence, or plosion A tube through which the source passes the vocal tract Radiation of sound from the mouth These 3 components are assumed to be independent. We will discuss these three parts separately

3 The Source-Filter Model: Sound Source
Voiced Sound Source: produced by vibration of the vocal folds several models exist that describe the flow of air through the vocal folds each model describes the increase in air flow as the glottis opens, decrease in air flow as it closes, and no air flow as glottis remains closed during pressure buildup. in spectral domain, shape is approximately flat at very low frequencies, and has –12 dB/octave slope at higher freq. Models: Rosenberg, Fant (LF model), Fujisaki (FL model), Klatt glottis opening glottal closure glottis opening air pressure (Pa) time (msec)

4 The Source-Filter Model: Sound Source
Voiced Sound Source: models are of “glottal flow” glottal flow is the same as volume velocity, V in units of m3/s volume velocity per unit area, or V/unit area, is in units of m/s, and is called the point velocity, v. acoustic pressure, p, in Pascals, equals impedance Z times v: p = Z v impedance is constant for a given glottis and vocal tract therefore, acoustic pressure is directly proportional to glottal flow, and so the vertical axis of these models can be considered either glottal flow, volume velocity, or acoustic pressure (in micro Pascals).

5 The Source-Filter Model: Sound Source
All models have the following parameters: • pitch period = 1/F0 = T0 • open quotient (OQ) • skew (SK) These three parameters are used in a function that describes how the sound pressure changes over time within one pitch period. glottis opening glottal closure glottis opening T0 OQ SK OQ measured relative to T0; SK measured relative to OQ

6 The Source-Filter Model: Sound Source
The Rosenberg model: gR(t) is glottal pulse with amplitude A and duration T; gR(t) has three phases: the opening phase until time TO, the closing phase until time TC, and the closed phase with length T-(TO+TC) TC T TO (from

7 The Source-Filter Model: Sound Source
The Liljencrants-Fant (LF) Model: Ei Ti Tp Te Tc Ta uses sin() and exp() functions to create smooth trajectory many parameters allow detailed control of shape The Fujisaki-Ljungqvist (FL) Model: similar to LF, but allows negative flow during closed phase simpler polynomial functions (from

8 The Source-Filter Model: Sound Source
Unvoiced Sound Source: produced by pushing air through constriction in mouth a simple model: noise that decreases at –6 dB/octave Plosive Sound Source: produced by pressure buildup, then release of constriction a very simple model: approximately a step function amplitude time

9 The Source-Filter Model: Vocal Tract Filter
The vocal tract can be modeled as a series of connected tubes with different lengths and diameters: d4 A1 A2 A3 A4 A5 A6 l4 Life can be made much more simple if we start with only two tubes for approximating different vowels: /iy/ A1 A2 A1 A2 /uw/ /aa/ A1 A2 A1 A2 /ah/

10 The Source-Filter Model: Vocal Tract Filter
An electrical-engineering analogy can be drawn between the tubes and a transmission line. From this analogy, the formant frequencies (frequencies of standing waves) occur when where (from Flanagan, p )

11 The Source-Filter Model: Vocal Tract Filter
In the simplest case of a single tube, the formants are located at and if l = 17cm (the typical length of the male vocal tract), then etc. So, for a neutral vowel (no constriction in the vocal tract), formants occur at 500, 1500, 2500, … Hz.

12 The Source-Filter Model: Vocal Tract Filter

13 The Source-Filter Model: Vocal Tract Filter

14 The Source-Filter Model: Vocal Tract Filter
The two-tube model can be expanded to multiple tubes; the math becomes ugly, but results are more realistic:

15 The Source-Filter Model: Bandwidths
In these cases, it has been assumed that the tubes have hard surfaces, which causes the resonant frequencies (formants) to have strong energy only at their center frequencies: (energy is put into the system via the source, but no energy is lost) In reality, the resonant energies decay over time; energy is absorbed by: viscosity (caused by friction of air against vocal-tract walls) heat conduction (at the vocal-tract walls), soft surfaces of vocal-tract walls these effects cause bandwidth to increase with frequency

16 The Source-Filter Model: Radiation
A final effect of the speech-production process is radiation of sound from the lips As sound radiates from a source, its energy decreases. The decrease in energy is not the same for all frequencies; this effect can be modeled as a +6 dB/octave increase in energy: which, coincidentally, is the same equation as pre-emphasis with a=1.0, and also corresponds to a differentiation operation.

17 The Source-Filter Model: Radiation
The derivative effect of radiation from the lips can be moved to the glottal-source model: glottal flow T0 OQ SK glottal flow derivative

18 The Source-Filter Model: Radiation
The derivative effect of radiation from the lips can also be moved to the models of frication and plosion: Unvoiced Sound Source: a very simple model: random (white) noise Plosive Sound Source: a very simple model: an impulse function amplitude time

19 The Source-Filter Model: Complete Picture
glottal source (harmonics) radiation (log scale) vocal tract filter (envelope) final speech signal

20 The Source-Filter Model: Estimating Parameters
The vocal-tract parameters (formants) can be estimated using LPC analysis, with the order of LPC analysis equal to 2×NF, where NF is the expected number of formants. In practice, LPC estimation of formants is not very accurate because of slope of spectrum and irregularities in the spectrum. Once the formants are determined, they can then be inverted, and the original signal filtered with the inverted formants to obtain the source + radiation (first derivative of glottal flow) signal. This is called inverse filtering.

21 The Source-Filter Model: Filtering
Formants can be modeled by a “damped sinusoid”, which has the following representations: where S(f) is the spectrum at frequency value f, A is overall amplitude, fc is the center frequency of the damped sine wave, and  is a damping factor. [Olive, p. 48, 58]. Or, given formant and sampling frequency, compute IIR filter coefficients: (from Klatt, 1980)

22 The Source-Filter Model
A course project that studies the source-filter model might be interesting… Implement LPC, extract formant values and bandwidths of different vowels; how do envelope and formant values change with different orders of LPC (values of p)? Do LPC analysis, then inverse filter the signal to extract the glottal source waveform. Does it look the way it should? Construct two-tube models, predict formant frequencies of all vowels. If you’re more comfortable with programming, signal processing, etc.


Download ppt "Structure of Spoken Language"

Similar presentations


Ads by Google