Structure of Spoken Language

Slides:



Advertisements
Similar presentations
Acoustic/Prosodic Features
Advertisements

CS 551/651: Structure of Spoken Language Spectrogram Reading: Approximants John-Paul Hosom Fall 2010.
Acoustic Characteristics of Consonants
Vowel Formants in a Spectogram Nural Akbayir, Kim Brodziak, Sabuha Erdogan.
From Resonance to Vowels March 8, 2013 Friday Frivolity Some project reports to hand back… Mystery spectrogram reading exercise: solved! We need to plan.
Anna Barney, Antonio De Stefano ISVR, University of Southampton, UK & Nathalie Henrich LAM, Université Paris VI, France The Effect of Glottal Opening on.
SPPA 403 Speech Science1 Unit 3 outline The Vocal Tract (VT) Source-Filter Theory of Speech Production Capturing Speech Dynamics The Vowels The Diphthongs.
ACOUSTICS OF SPEECH AND SINGING MUSICAL ACOUSTICS Science of Sound, Chapters 15, 17 P. Denes & E. Pinson, The Speech Chain (1963, 1993) J. Sundberg, The.
Basic Spectrogram Lab 8. Spectrograms §Spectrograph: Produces visible patterns of acoustic energy called spectrograms §Spectrographic Analysis: l Acoustic.
ACOUSTICAL THEORY OF SPEECH PRODUCTION
The Human Voice Chapters 15 and 17. Main Vocal Organs Lungs Reservoir and energy source Larynx Vocal folds Cavities: pharynx, nasal, oral Air exits through.
Itay Ben-Lulu & Uri Goldfeld Instructor : Dr. Yizhar Lavner Spring /9/2004.
Structure of Spoken Language
PH 105 Dr. Cecilia Vogel Lecture 14. OUTLINE  consonants  vowels  vocal folds as sound source  formants  speech spectrograms  singing.
Complete Discrete Time Model Complete model covers periodic, noise and impulsive inputs. For periodic input 1) R(z): Radiation impedance. It has been shown.
PHYS 103 lecture 29 voice acoustics. Vocal anatomy Air flow through vocal folds produces “buzzing” (like lips) Frequency is determined by thickness (mass)
Vowel Acoustics, part 2 November 14, 2012 The Master Plan Acoustics Homeworks are due! Today: Source/Filter Theory On Friday: Transcription of Quantity/More.
Unit 4 Articulation I.The Stops II.The Fricatives III.The Affricates IV.The Nasals.
It was assumed that the pressureat the lips is zero and the volume velocity source is ideal  no energy loss at the input and output. For radiation impedance:
Introduction to Speech Synthesis ● Key terms and definitions ● Key processes in sythetic speech production ● Text-To-Phones ● Phones to Synthesizer parameters.
Physics of Sound Wave equation: Part. diff. equation relating pressure and velocity as a function of time and space Nonlinear contributions are not considered.
Anatomic Aspects Larynx: Sytem of muscles, cartileges and ligaments.
Analysis & Synthesis The Vocoder and its related technology.
1 Lab Preparation Initial focus on Speaker Verification –Tools –Expertise –Good example “Biometric technologies are automated methods of verifying or recognising.
SPPA 403 Speech Science1 Unit 3 outline The Vocal Tract (VT) Source-Filter Theory of Speech Production Capturing Speech Dynamics The Vowels The Diphthongs.
Harmonics and Overtones Waveforms / Wave Interaction Phase Concepts / Comb Filtering Beat Frequencies / Noise AUD202 Audio and Acoustics Theory.
Voice Transformations Challenges: Signal processing techniques have advanced faster than our understanding of the physics Examples: – Rate of articulation.
Pitch Prediction for Glottal Spectrum Estimation with Applications in Speaker Recognition Nengheng Zheng Supervised under Professor P.C. Ching Nov. 26,
Basic Concepts: Physics 1/25/00. Sound Sound= physical energy transmitted through the air Acoustics: Study of the physics of sound Psychoacoustics: Psychological.
Representing Acoustic Information
Physics 1251 The Science and Technology of Musical Sound Unit 3 Session 31 MWF The Fundamentals of the Human Voice Unit 3 Session 31 MWF The Fundamentals.
CS 551/651: Structure of Spoken Language Lecture 1: Visualization of the Speech Signal, Introductory Phonetics John-Paul Hosom Fall 2010.
NONLINEAR SOURCE-FILTER COUPLING IN SPEECH AND SINGING
Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology Mark Hasegawa-Johnson
Source/Filter Theory and Vowels February 4, 2010.
Resonance, Revisited March 4, 2013 Leading Off… Project report #3 is due! Course Project #4 guidelines to hand out. Today: Resonance Before we get into.
Vowels, part 4 March 19, 2014 Just So You Know Today: Source-Filter Theory For Friday: vowel transcription! Turkish, British English and New Zealand.
1 CS 551/651: Structure of Spoken Language Lecture 8: Mathematical Descriptions of the Speech Signal John-Paul Hosom Fall 2008.
Acoustic Phonetics 3/9/00. Acoustic Theory of Speech Production Modeling the vocal tract –Modeling= the construction of some replica of the actual physical.
MUSIC 318 MINI-COURSE ON SPEECH AND SINGING
ECE 598: The Speech Chain Lecture 7: Fourier Transform; Speech Sources and Filters.
Chapter 16 Speech Synthesis Algorithms 16.1 Synthesis based on LPC 16.2 Synthesis based on formants 16.3 Synthesis based on homomorphic processing 16.4.
Wireless and Mobile Computing Transmission Fundamentals Lecture 2.
Speech Science Fall 2009 Oct 28, Outline Acoustical characteristics of Nasal Speech Sounds Stop Consonants Fricatives Affricates.
Speech Science V Akustische Grundlagen WS 2007/8.
The end of vowels + The beginning of fricatives November 19, 2012.
Sonorant Acoustics March 24, 2009 Announcements and Such Collect course reports Give back homeworks Hand out new course project guidelines.
Speech Science VI Resonances WS Resonances Reading: Borden, Harris & Raphael, p Kentp Pompino-Marschallp Reetzp
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
Fricatives November 20, 2015 The Road Ahead Formant plotting + vowel production exercises are due at 5 pm today! Monday and Wednesday of next week: fricatives,
Vocal Tract & Lip Shape Estimation By MS Shah & Vikash Sethia Supervisor: Prof. PC Pandey EE Dept, IIT Bombay AIM-2003, EE Dept, IIT Bombay, 27 th June,
Sonorant Acoustics + Place Transitions
More On Linear Predictive Analysis
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
Nasals + Liquids + Everything Else
The Speech Chain (Denes & Pinson, 1993)
P105 Lecture #27 visuals 20 March 2013.
Acoustic Tube Modeling (I) 虞台文. Content Introduction Wave Equations for Lossless Tube Uniform Lossless Tube Lips-Radiation Model Glottis Model One-Tube.
Phonetics: A lecture Raung-fu Chung Southern Taiwan University
CSE 551/651: Structure of Spoken Language Lecture 1: Visualization of the Speech Signal, Introductory Phonetics John-Paul Hosom Fall 2005.
Structure of Spoken Language
The Human Voice. 1. The vocal organs
Structure of Spoken Language
The Human Voice. 1. The vocal organs
1. SPEECH PRODUCTION MUSIC 318 MINI-COURSE ON SPEECH AND SINGING
The Vocoder and its related technology
Remember me? The number of times this happens in 1 second determines the frequency of the sound wave.
Lecture 2: Frequency & Time Domains presented by David Shires
EE Audio Signals and Systems
Speech Processing Final Project
Presentation transcript:

Structure of Spoken Language CS 551/651: Structure of Spoken Language Lecture 9: The Source-Filter Model of Speech Production John-Paul Hosom Fall 2008

The Source-Filter Model One more model of speech… proposed in 1848 by Johannes Müller, developed by Gunnar Fant circa 1970. Also called the “Acoustic Theory of Speech Production”. The Source-Filter Model provides a static description of speech; speech dynamics are dealt with in models of coarticulation. According to this model, speech is defined by three parts: A sound source vibration of the vocal folds, air turbulence, or plosion A tube through which the source passes the vocal tract Radiation of sound from the mouth These 3 components are assumed to be independent. We will discuss these three parts separately

The Source-Filter Model: Sound Source Voiced Sound Source: produced by vibration of the vocal folds several models exist that describe the flow of air through the vocal folds each model describes the increase in air flow as the glottis opens, decrease in air flow as it closes, and no air flow as glottis remains closed during pressure buildup. in spectral domain, shape is approximately flat at very low frequencies, and has –12 dB/octave slope at higher freq. Models: Rosenberg, Fant (LF model), Fujisaki (FL model), Klatt glottis opening glottal closure glottis opening air pressure (Pa) time (msec)

The Source-Filter Model: Sound Source Voiced Sound Source: models are of “glottal flow” glottal flow is the same as volume velocity, V in units of m3/s volume velocity per unit area, or V/unit area, is in units of m/s, and is called the point velocity, v. acoustic pressure, p, in Pascals, equals impedance Z times v: p = Z v impedance is constant for a given glottis and vocal tract therefore, acoustic pressure is directly proportional to glottal flow, and so the vertical axis of these models can be considered either glottal flow, volume velocity, or acoustic pressure (in micro Pascals).

The Source-Filter Model: Sound Source All models have the following parameters: • pitch period = 1/F0 = T0 • open quotient (OQ) • skew (SK) These three parameters are used in a function that describes how the sound pressure changes over time within one pitch period. glottis opening glottal closure glottis opening T0 OQ SK OQ measured relative to T0; SK measured relative to OQ

The Source-Filter Model: Sound Source The Rosenberg model: gR(t) is glottal pulse with amplitude A and duration T; gR(t) has three phases: the opening phase until time TO, the closing phase until time TC, and the closed phase with length T-(TO+TC) TC T TO (from http://www.physik3.gwdg.de/~micha/aachen98/aachen98.html)

The Source-Filter Model: Sound Source The Liljencrants-Fant (LF) Model: Ei Ti Tp Te Tc Ta uses sin() and exp() functions to create smooth trajectory many parameters allow detailed control of shape The Fujisaki-Ljungqvist (FL) Model: similar to LF, but allows negative flow during closed phase simpler polynomial functions (from http://www.ims.uni-stuttgart.de/phonetik/EGG/page13.htm)

The Source-Filter Model: Sound Source Unvoiced Sound Source: produced by pushing air through constriction in mouth a simple model: noise that decreases at –6 dB/octave Plosive Sound Source: produced by pressure buildup, then release of constriction a very simple model: approximately a step function amplitude time

The Source-Filter Model: Vocal Tract Filter The vocal tract can be modeled as a series of connected tubes with different lengths and diameters: d4 A1 A2 A3 A4 A5 A6 l4 Life can be made much more simple if we start with only two tubes for approximating different vowels: /iy/ A1 A2 A1 A2 /uw/ /aa/ A1 A2 A1 A2 /ah/

The Source-Filter Model: Vocal Tract Filter An electrical-engineering analogy can be drawn between the tubes and a transmission line. From this analogy, the formant frequencies (frequencies of standing waves) occur when where (from Flanagan, p. 70-71)

The Source-Filter Model: Vocal Tract Filter In the simplest case of a single tube, the formants are located at and if l = 17cm (the typical length of the male vocal tract), then etc. So, for a neutral vowel (no constriction in the vocal tract), formants occur at 500, 1500, 2500, … Hz.

The Source-Filter Model: Vocal Tract Filter

The Source-Filter Model: Vocal Tract Filter

The Source-Filter Model: Vocal Tract Filter The two-tube model can be expanded to multiple tubes; the math becomes ugly, but results are more realistic:

The Source-Filter Model: Bandwidths In these cases, it has been assumed that the tubes have hard surfaces, which causes the resonant frequencies (formants) to have strong energy only at their center frequencies: (energy is put into the system via the source, but no energy is lost) In reality, the resonant energies decay over time; energy is absorbed by: viscosity (caused by friction of air against vocal-tract walls) heat conduction (at the vocal-tract walls), soft surfaces of vocal-tract walls these effects cause bandwidth to increase with frequency

The Source-Filter Model: Radiation A final effect of the speech-production process is radiation of sound from the lips As sound radiates from a source, its energy decreases. The decrease in energy is not the same for all frequencies; this effect can be modeled as a +6 dB/octave increase in energy: which, coincidentally, is the same equation as pre-emphasis with a=1.0, and also corresponds to a differentiation operation.

The Source-Filter Model: Radiation The derivative effect of radiation from the lips can be moved to the glottal-source model: glottal flow T0 OQ SK glottal flow derivative

The Source-Filter Model: Radiation The derivative effect of radiation from the lips can also be moved to the models of frication and plosion: Unvoiced Sound Source: a very simple model: random (white) noise Plosive Sound Source: a very simple model: an impulse function amplitude time

The Source-Filter Model: Complete Picture glottal source (harmonics) radiation (log scale) vocal tract filter (envelope) final speech signal

The Source-Filter Model: Estimating Parameters The vocal-tract parameters (formants) can be estimated using LPC analysis, with the order of LPC analysis equal to 2×NF, where NF is the expected number of formants. In practice, LPC estimation of formants is not very accurate because of slope of spectrum and irregularities in the spectrum. Once the formants are determined, they can then be inverted, and the original signal filtered with the inverted formants to obtain the source + radiation (first derivative of glottal flow) signal. This is called inverse filtering.

The Source-Filter Model: Filtering Formants can be modeled by a “damped sinusoid”, which has the following representations: where S(f) is the spectrum at frequency value f, A is overall amplitude, fc is the center frequency of the damped sine wave, and  is a damping factor. [Olive, p. 48, 58]. Or, given formant and sampling frequency, compute IIR filter coefficients: (from Klatt, 1980)

The Source-Filter Model A course project that studies the source-filter model might be interesting… Implement LPC, extract formant values and bandwidths of different vowels; how do envelope and formant values change with different orders of LPC (values of p)? Do LPC analysis, then inverse filter the signal to extract the glottal source waveform. Does it look the way it should? Construct two-tube models, predict formant frequencies of all vowels. If you’re more comfortable with programming, signal processing, etc.