Speech Science COMD 6305 Weds/Fri 1:00pm - 2:15pm

Presentation on theme: "Speech Science COMD 6305 Weds/Fri 1:00pm - 2:15pm"— Presentation transcript:

Speech Science COMD 6305 Weds/Fri 1:00pm - 2:15pm
William F. Katz, Ph.D. UTD

Chapter 1. The Nature of Sound
INCLUDING Definition of Speech Science Basic physics concepts Overview of sound Resonance

What is ‘speech science’?
“Speech Science is the experimental study of speech communication, involving speech production and speech perception as well as the analysis and processing of the acoustic speech signal.” Speech Science asks questions like: How is speech planned and executed by the vocal system? How do the acoustic properties of sounds relate to their articulation? How and why do speech sounds vary from one context to another? How do listeners recover the linguistic code from auditory sensations? How do infants learn to produce and perceive speech? How and why do speech sounds vary between speakers? How and why do speech sounds vary across speaking styles or emotions? (-M. Huckvale, UCL)

Physics Principles Energy Ability to do work* *(force to move a mass some distance in some amount of time) Common Units: Mass = kilograms (kg) Distance = meters (m) Time = seconds (sec) Energy: ability to do work Force required to move some mass some distance in some amount of time Matter is made of particles which have mass Mass and spacing of articles determines density

Physics Principles - cont’
Opposing Forces - Affect objects in motion Friction Inertia Elasticity Limit the motion of objects 3 types! a) friction: or resistance Opposes movement Rubbing hands or eraser across table

Friction Impedes or opposes movement (not much friction here….)

In order to vibrate, an object must possess
Inertia Elasticity *Any object that can vibrate can produce sound

Inertia “Tendency for mass at rest to remain at mass” or “Object in motion to remain in motion” (Note: Amount of inertia is related to mass) b) inertia: tendency of mass at rest to remain at rest Object in motion remain in motion Amount of inertia related to mass Increased mass needs increased force to change velocity

Elasticity: Restoring force
Ability of a substance to recover its original shape and size after distortion All solids are elastic; gases also behave as if elastic Elasticity: ability to resist changing shape Force applied to an object will deform it to some degree Stiff spring is very elastic, DOESN’T WANT TO MOVE So it easily returns to normal Returning to initial form by action of restoring force Stiffness: often used as measure of one of the resistive forces that opposes the setting into motion of an object Elasticity: resists compression

Hooke’s law – describes elasticity - “restoring force is proportional to the distance of the displacement and acts in the opposite direction” (if not stretched beyond its elastic limit)

Density (= mass per unit of volume)
Lower Density Higher Density

Pressure Pressure = Pa Pa = N/m2, or dyne (older literature)
N = kg x meter s2 (where N = Newton, a measure of force) See later slide for other measures of pressure! Pressure: force required to accelerate a kg of weight 1 meter per second, per second “per second, per second” = acceleration How rate changes over time

Measurement of air pressure
Dynes* per sq centimeter dyne/cm2 Pounds per square inch psi Microbar bar Pascal Pa Centimeters of water cm H2O Millimeters of mercury mm Hg *The force required to accelerate a mass of one gram at a rate of one cm/sec2

Air pressure can also be measured in cm of H20 or mm of Mercury (Hg)

Two Units of Measure (pressure)
dynes/cm2 “microbar” (CGS system) N/m2 “Pascal”* (MKS system) *(For speech, micropascal is more commonly used) Meters/Kilograms/Secs Centimeters/Grams/Secs

Pressure at different locations may vary
P atmos P oral P trach P alveolar

Air Composed of gas molecules Particles travel in Brownian motion
Has pressure (force per unit area) Pressure related to speed of Brownian motion “rapid changes in relatively static atmospheric pressure we call sound pressure or sound energy” Air composed of gas molecules (Nitrogen & oxygen, etc) Molecules always in motion, try to keep distance from each other -- Brownian motion Pressure: force per unit area high pressure vs low pressure areas pressure related to speed of Brownian motion faster BM and higher density = higher force and higher pressure Atmospheric air pressure: changes gradually w/ altitude and weather Air pressure: changes smaller & more rapid

Air Movement driving pressure: (difference in pressure) high pressure FLOWS to low pressure volume velocity: rate of flow – e.g liter/sec laminar flow - in a parallel manner turbulent flow – in a non-parallel manner (flows around an object - eddies)

Flow

Air Pressure, Volume, Density
Volume: amount of space in three dimensions Density: amount of mass per unit of volume Boyle’s law - as volume decreases, pressure increases (assuming constant temperature)

What is “sound?” Form of energy
Waves produced by vibration of an object Transmission through a medium (gas, liquid, solid) Has no mass or weight Travels in longitudinal wave text pg. 7

Transverse vs. Longitudinal Wave
Longitudinal – SOUND! In sound waves, also known as acoustic waves, the local oscillations always move in the same direction as the wave. Waves like this are called longitudinal waves. Unlike acoustic waves, radio waves or guitar-string vibrations are transverse waves; that is, the local oscillations are always perpendicular to the wave motion. Transverse waves can be set up in, say, a skipping rope or a washing line. Unlike acoustic waves, radio waves or guitar-string vibrations are transverse waves. That is, the vibrations of the medium are perpendicular to direction the wave is moving. Examples include an audience wave, waves of a ribbon or string, guitar-string vibrations and water waves. Cool!

Air Pressure Changes from Sound
Compression Rarefaction

Propagation of Sound DEMO: Longitudinal Wave C = Condensation
C R C R C R C C = Condensation R = Rarefaction DEMO: Longitudinal Wave

Periodic waves Simple (sine; sinusoid)
Complex (actually a composite of many overlapping simple waves)

Sinusoid waves Simple periodic motion from perfectly oscillating bodies Found in in nature (e.g., swinging pendulum, sidewinder snake trail, airflow when you whistle) Sinusoids sound ‘cold’ (e.g. flute)

Simple waves - key properties
Frequency = cycles per sec (cps) = Hz Amplitude – measured in decibels (dB), 1/10 of a bel (Note: dB is on a log scale, increases by powers of 10)

Frequency and intensity are independent!
Frequency = cycles second Or “cps” or “Hertz (Hz)” Higher frequency = higher pitch Intensity Magnitude (amplitude) of vibration Amount of change in pressure Frequency and intensity are independent! Frequency of vibration determines frequency of sound wave Freq – cycles/second (cps) -- Hertz Higher frequency – higher the pitch

Amplitude (intensity)
More force is applied and this increases pressure at peaks Peaks (condensations) = increases in pressure DRAW SEPARATELY Troughs (rarefactions) = decreases in pressure

Frequency - Tones

Quickie Quiz! Q: What is the frequency of this wave ?
HINT: It repeats 2.5x in 10 msec 10 msec

Answer: 250 Hz! (2.5 cycles in .01 sec = 250 cps)

Periodic waves A complex wave is periodic, but not sinusoidal
Panels A and B are periodic waves, though Panel B is complex periodic B.

Complex periodic waves
Results from imperfectly oscillating bodies Demonstrate simple harmonic motion Examples - a vibrating string, the vocal folds

Complex periodic waves – cont’d
Consists of a fundamental (F0) and harmonics Harmonics (“overtones”) consist of energy at integer multiples of the fundamental (x2, x3, x4 etc…)

Harmonic series Imagine you pluck a guitar string and could look at it with a really precise strobe light Here is what its vibration will look like

From simple waves  complex (showing power spectra)
Also known as a “line spectrum” At bottom is a complex wave with an F0 of 100 Hz Note energy at 200 Hz (second harmonic) and 300 Hz (third harmonic)

Fourier’s Theorem Fourier Synthesis Fourier Analysis Complex waveform
Complex Sounds: Addition of Sinusoids: The top sinusoid in this slide depicts a 100-Hz sinwave, the next sinusoid down a 200-Hz sinwave, the third sinusoid from the top is a 300-Hz sinwave, and the red waveform on the bottom is the time-point for time-point sum of the amplitudes of the three sinusoids represented by the three blue sinwaves (100 Hz, 200 Hz, 300 Hz). Thus, the summed red waveform starts as a 100-Hz sinwave and it becomes more complex as the 200-Hz and then the 300-Hz components are added. This slide is an animation of Figure 4.2 in the Textbook. Complex waveform

Frequency - Male Vowels

Frequency - Female Vowels

Periodic and aperiodic waveforms
Spectra of periodic and aperiodic sounds

Transient vs. continuous signal

Continuous spectrum Energy is present across a continuum of frequencies (noise) Frequency (Hz) - Amplitude + (not necessarily equal energy at all frequencies!)

Decibels Unit used to express intensity or pressure of sound
Ratio of 2 numbers Exponential or logarithmic scale Expressed in terms of a specified reference value Amplitude of vibration or the extent of the particle displacement that determines the intensity of sound Named after Alexander Graham Bell, the decibel or dB, is a unit ofmeasurement of intensity in the field of acoustics. Uses a logarithmic scale It is expressed in terms of a specified reference value. Therefore, it is a relative unit Not an absolute value – is a ratio of 2 numbers! Not linear Is exponential or logarithmic Increasing sound pressure 10 fold creates only a 2dB increase Increasing sound pressure 100 times creates a 20dB increase Reference for the log ratio differs depending on the application

Linear vs. Logarithmic Scale
Increments NOT EQUAL Based on exponents of a given number Log Scale 102 900 103 9,000 104 90,000 105 Incremental amount between units Linear Scale 1,000 2,000 3,000 Linear: a ruler, for example begins at zero and each amount is equal to the next can sum units by addition Logarithmic: log scale is based on exponents of a given number For dB, the base is 10 and the scale consists of incremental amounts that are not equal and are increasingly large numerical differences

Why use log scale for sound?
Large dynamic range between pressure of smallest audible sound and largest tolerable sound Difficult to use with interval scale, so use a ratio scale Sound power = energy transferred over time (work done) so, unit is watts per meter squared or can be described by how much pressure the sound exerts (Pa or N/m2) The human ear is sensitive to a large intensity range: that is, the human ear has a large dynamic range Ex: 20 microPascals to 20,000,000 micro Pascals For example: 1 intensity unit

Decibel Logarithm of the ratio of 2 intensities
Ratio of power or pressure measured, divided by a reference power or pressure (usually 20 μPA) Logarithm labeled a “bel” “bel” is still too large, so 1/10th of bel is used: decibel Threshold of hearing = “softest sound of a particular frequency a pair of human ears can hear 50% of the time under ideal listening conditions” Human hearing is relatively insensitive to low bass (below 100 Hz), and also compresses at higher sound levels. Typically tested with a sound level meter 

dB – cont’d The ‘speech banana’
Midrange frequencies can be perceived when they are less intense ..whereas a very low or high frequency sound must have more intensity to be audible Why you need to boost the bass on your stereo system if you turn the volume down low -- otherwise the lows become inaudible. The horizontal axis of the chart contains information about frequencies a person can hear, from high to low. The vertical axis provides details about decibel levels the person's ears can detect, charted with the lowest levels at the top and the highest at the bottom. The speech banana is located around one third of the way down the chart. Someone who has difficulty hearing normal human speech can have difficulty learning to talk, understanding conversations, and learning material that is presented in spoken format. People with mild hearing loss may not realize that they cannot hear normally. Children with hearing loss often struggle in class because they cannot hear the teacher talking, but they don't realize their hearing is impaired because they can hear sounds in other ranges. Regular hearing tests are designed to identify hearing loss early so that interventions can be provided. When a hearing test is conducted, the results can be charted to show the range of sounds the subject of the test can hear. People with good hearing will have results that are located above the speech banana, meaning that they can hear sounds at both lower and higher frequencies than normal human speech, and lower decibel levels than normal human speech. If hearing test results fall below or within the speech banana, it means that the person may have difficulty hearing people talk. One approach to hearing loss is to fit the person with hearing aids. Hearing aids are adjusted and tested to generate a new audiogram showing what the person can hear while wearing the hearing aids. The goal is to provide the person with the ability to hear the normal range of human speech to facilitate communication. In other cases, hearing loss may be profound enough that hearing aids cannot provide people with a range of hearing that falls inside the speech banana. These individuals may need people to talk more loudly around them, or they may consider using modes of communication like sign language to facilitate conversation. The ‘speech banana’

More dB facts Orville Labs, Minneapolis -9.4 dBA
Threshold of hearing = “softest sound of a particular frequency a pair of human ears can hear 50% of the time under ideal listening conditions” Many prefer A-weighting – considers human hearing in the best possible way (dBA) Typically tested with a sound level meter  FACT: There can be a negative dB value! Orville Labs, Minneapolis -9.4 dBA

Physical vs. perceptual
Fundamental frequency (F0)  Amplitude/ Intensity  Duration  PERCEPTUAL “Pitch” “Loudness” “Length”

Phon Scale Psychoacoustic scale for intensity
If a given sound is perceived to be as loud as a 60 dB sound at 1000 Hz, then it is said to have a “loudness of 60 phons” 60 phons means "as loud as a 60 dB, 1000 Hz tone” Instrumentation reference Psychoacoustic scale for intensity is the phone scale Uses a 1000HZ pure tone as the reference frequency. If a given sound is perceived to be equal in loudness to a 60dB sound at 1000Hz, it is said to have a loudness of 60phons Each curve represents the equivalent loudness level of a 1000hz tone. For ex: the 30 phone line is equally loud at all frequencies as a 1000 Hz tone at 30 dB Psychoacoustic scale for intensity

Bark scale (pitch) (1961) A psychoacoustical scale for pitch
Basically log-linear Ranges from 1 to 24 and corresponds to the first 24 critical bands of hearing. Similar to Mel Scale (1961)

Phase A measure of the position along the sinusoidal vibration
These two waveforms are slightly out of phase (approx. 900 difference) Important for sound localization

Phase examples

Damping - example (Loss of vibration due to friction)

Review of source characteristics
Simple waves are a good way to learn about basic properties of frequency, amplitude, and phase. Examples include whistling; not really found much in speech Complex waves are found in nature for oscillating bodies that show simple harmonic motion (e.g., the vocal folds)

Source-filter theory Vocal tract model demo

Let’s look at the filter
In speech, the filter is the supralaryngeal vocal tract (SLVT) The shape of the oral/pharyngeal cavity determines vowel quality (via resonance) SLVT shape is chiefly determined by tongue movement, but lips, velum and (indirectly) jaw also play a role

Resonance Resonance = reinforcement or shaping of frequencies as a function of the boundary conditions through which sound is passed To get a basic idea of resonance, try producing a vowel with and without a paper towel roll placed over your mouth! The ‘extra tube’ changes the resonance properties.

Resonance – cont’d The SLVT can be modeled as a kind of bottle with different shapes… as sound passes through this chamber it achieves different sound qualities The resonances of speech that relate to vowel quality are called formants. Thus, R1 = F1 (“first formant). R2 = F2, etc. F1 and F2 are critical determinants of vowel quality

Resonators

Resonance - concepts Natural frequency (Resonant frequency RF)
Mechanical (vibrating body, e.g. vocal folds) Acoustic (air-filled space)

Resonance Child’s swing behaves like a pendulum and can help us understand resonance Restorative & displacement forces, inertia & equilibrium Two ways to inject energy into the oscillation: From Alison Behrman’s Speech and Voice Science The oscillation of the swing behaves in the same way as the earlier example of the mass/ spring system. When the swing hangs straight down, it is at equilibrium. The displacement of the swing results in a buildup of a restorative force, with an overshoot of equilibrium due to inertia. If there is no one to push the child on the swing, and the child does not “pump” her legs, then friction will overcome the displacement and restorative forces. The arc (amplitude) of the swing will slowly decrease and eventually the swing will come to rest. There are two ways to put energy into the swing system An adult can come along and push the child The child can pump her legs out and lean back as she moves from the equilibrium position forward to the top of the arc. Then as the restorative force overcomes the swing and it accelerates back toward equilibrium, the child can tuck her legs in and lean forward a bit. In both cases, an outside force (the adult’s push or the child’s pumping movements) results in a large increase in the amplitude of the vibration. Critical factor is the timing of the outside force. If energy is supplied to the swing with appropriate timing - the amplitude of its movement will increase If energy is supplied to a vibration at the same frequency of the vibration, it will increase the amplitude or energy of the vibration. If adult tries to push the swing while it is still in the backwards motion – the swing ay come to a stop. In that case energy was imparted to a swing at a frequency other than the resonant frequency of the swing. *the amplitude is independent of the frequency* Push the swing harder and it will not make the swing travel faster – only in a larger arc. The phenomenon of matching energy input to a system and the frequency of vibration of the system is called resonance.

Resonance Resonance: large increase in vibration when a force is applied at a natural frequency of the medium Set an object into vibration (hit, drop, etc) and the rate at which it vibrates ~ natural frequency (or set of frequencies).

Vibration Free vibration – no additional force is applied after an object is set into motion. Forced vibration – when forces “drive” the motion to vibrate at its natural frequency. (e.g., guitar string).

Forced Vibration The Tacoma-Narrows Bridge twisting in sympathetic resonance with the wind (left) and stretching to the point of breaking (right). In 1940, a bridge was constructed across Puget Sound in Washington State. It was a suspension bridge with 2800 feet been two suspension pillars. Very narrow, only 40 feet wide. Only a few months after completion, strong winds caused standing wave oscillations of the bridge. The amplitude of the oscillations became so large that the bridge actually collapsed. Contributing factors include narrowness & flexibility of the bridge & unusual wind patters that circled around the bridge at the resonant frequency of the bridge.

Resonant Freq of Glass Breaking glass with sinusoid sound
Freaky deaky guy actually breaks glass with voice (Mythbusters) Typical case of faking it

Why is resonance important?
In vocal tract, allows some frequencies to be magnified over others. In the auditory system (ear canal) allows for certain sounds to be amplified as compared to others. Resonance in the vocal tract (caused by articulator movements and vocal constriction) give rise to differences in speech acoustics, allowing some frequencies to be magnified over others.

Acoustic Resonators / Bandwidth

Narrow vs. Broad Filter (regular resonator) (irregular resonator)

Passband Resonator

Input  SLVT  speech output
Application: Sound spectrograms, speech analysis

F1 through F3 Closed-tube model of the VT, showing first three resonances (formants)

Formants for three GAE vowels

Formants – ‘HAR’ ✓ H: Height relates inversely to F1.
✓ A: Advancement relates to F2. ✓ R: Rounding is a function of lip protrusion and lowers all formants - through lengthening of the vocal tract by approximately 2 to 2.5 cm.

Peterson & Barney, 1952

To be continued… We have described resonance and formant frequencies with respect to vowels Also important for consonants Because consonants require more spatial and temporal precision, these details will be covered later…

Similar presentations