Basic Hearing Issues Parts of the ear. Their contribution to the hearing process. What is loudness? What is intensity? How loud can you hear? How quiet can you hear? How high is high? How low is low? What do two ears do for us?
Fundamental Divisions of the Ear Outer ear: –Head and shoulders –Pinna –Ear Canal Middle Ear –Eardrum –3 bones and spaces Inner ear –Cochlea Organ of Corti –Basilar Membrane –Tectoral Membrane –Inner Hair Cells –Outer Hair Cells
Functions of the Outer Ear In a word, HRTFs. HRTF means head related transfer functions, which are defined as the transfer functions of the body, head, and pinna as a function of direction. Sometimes people refer to HRIRs, or head related impulse responses, which are the same information expressed in the time, as opposed to frequency, domain. HRTFs provide information on directionality above and beyond that of binaural hearing. HRTFs also provide disambiguation of front/back, up/down sensations along the cone of confusion. The ear canal resonates somewhere between 1 and 4 kHz, resulting in some increased sensitivity at the point of resonance. This resonance is about 1 octave wide, give or take.
Middle Ear The middle ear acts primarily as a highpass filter (at about 700 Hz) followed by an impedance-matching mechanical transformer. The middle ear is affected by muscle activity, and can also provide some level clamping and protection at high levels. –You dont want to be exposed to sound at that kind of level.
Inner Ear (Cochlea) In addition to the balance mechanism, the inner ear is where most sound is transduced into neural information. The inner ear is a mechanical filterbank, implementing a filter whose center-frequency tuning goes from high to low as one goes farther into the cochlea. –The bandpass nature is actually due to coupled tuning of two highpass filters, along with detectors (inner hair cells) that detect the difference between the two highpass (HP) filters.
Critical Bandwidths The bandwidth of a filter is referred to as the Critical Band or Equivalent Rectangular Bandwidth (ERB). ERBs and Critical Bands (measured in units of Barks, after Barkhausen) are reported as slightly different. ERBs are narrower at all frequencies. ERBs are probably closer to the right bandwidths, note the narrowing of the filters on the Bark scale in the previous slide at high Barks (i.e. high frequencies). I will use the term Critical Band in this talk, by habit. None the less, I encourage the use of a decent ERB scale. Bear in mind that both Critical Band(widths) and ERBs are useful, valid measures, and that you may wish to use one or the other, depending on your task. There is no established ERB scale to date, rather researchers disagree quite strongly, especially at low frequencies. It is likely that leading-edge effects as well as filter bandwidths lead to these differences. The physics suggests that the lowest critical bands or ERBs are not as narrow as the literature suggests.
What are the main points? The cochlea functions as a mechanical time/frequency analyzer. –Center frequency changes as a function of the distance from the entrance end of the cochlea. High frequencies are closest to the entrance. –At higher center frequencies, the filters are roughly a constant fraction of an octave bandwidth. –At lower center frequencies, the filters are close to uniform bandwidth. –The filter bandwidth, and therefore the filter time response length varies by a factor of about 40:1.
What happens as a function of Level? As level rises, the ear desensitizes itself by many dB. As level rises, the filter responses in the ear shift slightly in position vs. frequency. The ear, with a basic 30dB SNR (1000^.5) in the detector, can range over at least 120dB of level.
What does this mean? The internal experience, called Loudness, is a highly nonlinear function of level, spectrum, and signal timing. The external measurement in the atmosphere, called Intensity, behaves according to physics, and is somewhat close to linear. The moral? Theres nothing linear involved.
Some points on the SPL Scale (x is very loud pipe organ, o is threshold of pain/damage, + is moderate barometric pressure change
Edge effects and the Eardrum The eardrums HP filter desensitizes the ear below 700Hz or so. The exact frequency varies by individual. This means that we are not deafened by the loudness of weather patterns, for instance. At both ends of the cochlea, edge effects lessen the compression mechanisms in the ear. The results of these two effects, coupled with the ears compression characteristics, results in the kind of response shown in the next slide.
Fletcher and Munsons famous equal loudness curves. The best one-picture summary of hearing in existence.
Whats quiet, and whats loud? As seen from the previous graph, the ear can hear to something below 0dB SPL at the ear canal resonance. As we approach 120dB, the filter responses of the ear start to broaden, and precise frequency analysis becomes difficult. As we approach 120dB SPL, we also approach the level at which near-instantaneous injury to the cochlea occurs. Air is made of discrete molecules. As a result, the noise floor of the atmosphere at STP is approximately 6dB SPL white noise in the range of 20Hz-20kHz. This noise may JUST be audible at the point of ear canal resonance. Remember that the audibility of such noise must be calculated inside of an ERB or critical band, not broadband.
So, whats high and whats low in frequency? First, the ear is increasingly insensitive to low frequencies, as shown in the Fletcher curve set. This is due to both basilar membrane and eardrum effects. 20Hz is usually mentioned as the lowest frequency detected by the hearing apparatus. Low frequencies at high levels are easily perceived by skin, chest, and abdominal areas as well as the hearing apparatus.
At higher frequencies, all of the detection ability above kHz lies in the very first small section of the basilar membrane. While some young folks have been said to show supra- 20kHz hearing ability (and this is likely true due to their smaller ear, ear canal, and lack of exposure damage), in the modern world, this first section of the basilar membrane appears to be damaged very quickly by environmental noise. At very high levels, high (and ultrasonic) signals can be perceived by the skin. You probably dont want to be exposed to that kind of level.
What about binaural issues? Binaurally, with broadband signals, we can distinguish 10 microsecond shifts in left vs. right stimulii of the right characteristics. While this has implications in block- processed algorithms with pre-echo, it does not generally relate substantially to ADC and DAC hardware that is properly clocked.
The results? For presentation (NOT capture, certainly not processing), a range of 6dB SPL (flat noise floor) to 120dB (maximum you should hear, also maximum most systems can achieve) should be more than sufficient. This is about 19 bits. An input signal range of 20Hz to 20kHz is probably enough, there are, however, some filter issues that will be raised later, that may affect your choice of sampling frequency.
What do Analog and Digital really mean? The domain we commonly refer to as analog is a time- continuous (at least to mortal eyes) domain, with continuous level resolution limited by physical properties of material. The level resolution and time resolution are never exact due to basic physics. The digital domain is a sampled, quantized domain. That means that we only know the value of the signal at specified time, and that the level of the signal occupies one of a set of discrete levels. The set of levels, and the times that the signal has a value, however, are exact in the digital framework. (Although there may, of course, be errors in acquisition.)
Properties of the analog domain The time domain is continuous. This means that any frequency limits come from physical processes, not from mathematical restrictions. However, physics places some very strong constraints on such signals: –All signals have finite energy –All signals have finite bandwidth –All signals have finite duration –All signals have a finite noise floor –All four of the points above are very important!
A reminder about Duality in the Fourier domain Multiplying two signals means that you convolve their (full, complex) spectra. Convolving two signals means that you multiply their (full, complex) spectra. These two properties of Fourier Analysis (other commonly used transforms obey them as well) are very important. Remember them even if you dont know anything about Fourier Analysis.
Fourier Domain Properties Please remember the properties. I dont want or expect anyone to understand all the details, but please remember the PROPERTIES. Fourier analysis is valid on all finite energy, finite-bandwidth signals. That describes all real- world audio signals that we care to deal with. (The only counterexamples occur in astrophysics and particle physics, neither of which a listener can be comfortably seated near.)
So, lets sample that analog signal. Sampling means capturing the value of the signal at a periodic rate. –This means that we MULTIPLY the signal by the specific impulse at a regular interval. Quite obviously, thats not what actual hardware does. Most use a track/hold, or other capture method. The result, in the sampled domain, is the same. –That means that we CONVOLVE the signal spectrum and the sampling spectrum. This means that the spectrum repeats at every multiple of the sampling frequency. –Hence, we have the Nyquist criterion, later proven by Shannon.
Three examples 2*bfs In each case, top is signal spectrum (same for all 3) middle is sampling spectrum and bottom is the result
The Nyquist Criterion Simply put, if we wish to sample a signal of bandwidth B, we must sample it at least at 2B sampling rate. –If you think about this briefly, that follows from the previous slide, where the spectrum (extending to +-B from DC) will overlap if you sample it at a lower frequency –This overlap is called aliasing.
A Graphical Example Top to bottom: Sampling train, spectrum of sampling train, Sine wave below half the sampling rate and resulting samples, their spectra, sine wave as far above half the sampling rate and resulting samples, their spectra
What would I hear if there was a demo of aliasing? Aliasing and imaging (imaging, as we will see shortly, is the reconstruction version of aliasing) sound awful. Aliasing in general is anharmonic, and remarkably annoying. THEREFORE: Filtering is a requirement. Its not an option. The presence of a filter has consequences that we will discuss later. This leads us to the sampling theorem.
Hence, the Sampling Theorem We must limit the bandwidth of the signal to fs/2, where fs is the sampling frequency. (This is saying the same thing as the Nyquist conjecture, restated in terms of the data to be captured, rather than the sampling rate.) While this does not mean dc to fs/2, thats what we do in audio, since we want signals close to dc. (There are other sampling methods that sample other regions of frequency.) mustThis means that we must band limit the signal into the sampler. An anti-alias filter is not just a good idea, its mathematically necessary.
Consequences of the Sampling Theorem We must band limit the signal in order to avoid aliasing. Any out-of-band signals will alias back into the base band. That ^^ has consequences far beyond the initial sampling of the material. Well get to that later when we talk about things like clipping and nonlinearities, or jitter. This means that we know what times the samples were taken, so we can reconstruct that periodicity later, without an error that always grows with time, distance, number of copies, etc.
Ok, but we represent those samples as binary values, right? Yes, we do. Thats called quantization. Thats the other necessary process in digitization of a signal. Quantization is why digital signals can be saved and re-saved without degradation in terms of level. Its also why digital PCM signals have a fixed, unchanging noise floor. (There are other possiblities, well talk about those later.)
So, quantization is like rounding, right? Well, Lets see! Using rounding only: Original, quantized, error, and spectrum of original and error.
To drive the point even farther home: Thats right, we have to dither quantizers. Its not just a good idea.
Dither? Whats that? As the spectra (and error waveform of the second slide) show, the error of an undithered quantizer is highly correlated to the original signal. Dither consists of adding some random function BEFORE the quantization so that the noise is decorrelated. The first kind of dither people tried was called uniform. Lets see how that works out.
Uniform +- ½ step-size dither. Not bad, but notice the noise level coming and going around the zero crossing?
Hence, Triangular PDF Dither: Now, the noise stays constant over all amplitudes.
To recap: Notice, even in this very mild case, where harmonics do not alias over each other, in addition to eliminating tonal components and preserving information, dither RAISES the noise floor, and lowers the PERCIEVED noise floor.
To summarize: A digital signal is sampled and quantized. Sampling requires anti-aliasing filters. Quantization requires TPD. Dither and Anti-aliasing are not options!
What about reconstruction? Yes, that convolution theorem applies again, this time usually convolving a square pulse with the digital signal. This leads to a form of signal images. While images and aliases come about by mathematically similar processes, people persist in having different names for them. Some (many in the high end) omit the anti- imaging filter, and imagine that there is a beating problem. If you havent heard about this problem yet, you will at some point. The next few graphs show why it isnt so.
Basically, the same thing happens. Top Line: Sine wave Below fs/2 Next line Sine wave plus first alias pair (blue) and just the aliases (red) Each line adds another pair, except the last, which adds the first 100 pairs. The gain of the red waveform is greatly increased in order to make it visible. Notice that after 100 alias pairs are added in the original waveform has the familiar stairstep
Notice, at the bottom, the beating that some audio enthusiasts complain about. Notice that beating only happens when aliases are added.
Low frequency reconstruction example.
Elements of reconstruction: In reconstruction, filtering is also necessary to remove the image signals that originate in the same fashion as aliases arise in sampling. In reconstruction, the waveform is sometimes a step rather than an impulse, so other compensation is sometimes necessary to get a flat frequency response. Why? –Again, using the step in time (convolving) means that you multiply the signal by the frequency response of the step in the frequency domain, leading to a rolloff like sin(x)/x. –This rolloff can be as much as -3.92dB at fs/2, and can cause audible softness if not corrected somehow. Modern converters of the delta-sigma variety do not use a step at the final sampling frequency (although they certainly use a step its at a much higher frequency). Their design, however, introduces other issues. That discussion comes later.
How to Build converters A quick survey of methods. No real commercial converter is Described.
Baseband Conversion Audio Input Filter A to D converter This is the basic block diagram for any PCM converter. In this converter, the filter is outside the converter, and the quantizer is part of the sampling mechanism. This method is not very common any more, but we will discuss its properties before moving on to oversampling converters. Sampling Clock
Spectrum of Signal and Noise (note, to make scaling easier, I will use 7/8/9 bit quantization) Original Spectrum of Original Quantized and Dithered Spectrum of Quantized, Dithered Signal Red 8 bits, green 9 bits
Things to notice. –The noise floor is flat. If you sum up all of the energy in the noise floor, you will wind up with the SNR you expect. Notice that practically all of the noise is IN BAND when there is no oversampling. –Its really hard to see quantization at even very noisy levels in a waveform plot. Each bit of quantization is worth 6.02dB of signal to noise ratio. 1 more bit will drop the noise floor by 6 dB. 1 less bit raises the noise floor by 6dB. NOTICE THAT NOISE SPREADS OVER THE ENTIRE OUTPUT SPECTRUM.
Why so much emphasis on how the noise spectrum spreads out? Therein lies the beginnings of oversampling.
Sine wave, original sampling rate Spectrum of 8 bit quantization Sine wave, 4x sampling rate Spectrum of 7 bit quantization, shown in same bandwidth! 4x oversampled. Full spectrum of 7 bit quantized signal at 4x sample rate. Notice that the noise has 4x the bandwidth, but ¼ of it falls in the original passband
That demonstrates the most trivial form of oversampling. This trivial form of oversampling provides the equivalent of 1 bit, in-band, for every 4x the sampling frequency, i.e. 3db per doubling. Now we move on to more sophisticated forms of oversampling, with the noise spectrum shaped as well.
Noise Shaping + - H(s) Quantizer Output Bits Quantized Signal Error Signal This is the basic form of a noise-shaper. Im not going to do a full mathematical analysis for the sake of time. What H(s) does is shape the noise floor. This can be done with or without oversampling. Two examples will follow. The output bits of this system are PCM. ALL OVERSAMPLED SYSTEMS ARE PCM SYSTEMS AT THEIR HEART!!!!
Adding this H(s) introduces some Issues, of course. The values of H(s) must be carefully controlled in order to ensure stability. H(s) has storage in it, so quantization noise gets stored. This means that you have more noise than just the basic quantization noise. So, there is a penalty, especially if there is a lot of storage or memory in the noise shaper, as well as a gain. The shape of the noise is closely related to the inverse of H(s). I wont try to present a full analysis, thats for the hardware engineer and chip designer.
An example of noise shaping with no oversampling: NOTE: This is an example, nobody uses this particular H(z), and in fact Ive not even tested it for stability!!!! The point is simple, you CAN do noise shaping even with no oversampling, and some DACs do it, to attempt to match zero loudness curves. We can discuss the utility of that in Q/A.
Whats the point? Within limits, using a noise-shaping system, you can move the noise around in frequency. You can, for instance, push lots of the noise up to high frequencies. –That is one of the reasons for oversampling.
Whats another reason for Oversampling? You get to control the response of the initial anti- aliasing/anti-imaging filter digitally. –As most everyone knows by now, high-order analog filters have a variety of problems: They are hard to manufacture –That means expensive Their long-term performance is very hard to assure. –That means that they tend to annoy the customer Since they are IIR filters, they have startling phase problems near the transition and stop bands. –That annoys the customer, too
What sort of oversampling does the filter issue lead us to? 4x Oversampling, Digital FIR filter 5 th order Analog filter This is shown as an example
The results? The 13 th order analog filter (with horrible phase response) is replaced by a 5 th order analog filter. The first, sharp antialiasing filter is now a digital filter, with deterministic behavior and performance. All it takes is MIPS. –Nowdays, MIPS are cheap. –The filter is trustworthy. It wont drift, oscillate, distort, etc, if its designed and implemented properly. Its characteristics are exactly known.
Remember the Filters in the ear? Your ear is a time analyzer. It will analyze the filter taps, etc, AS THEY ARRIVE, it doesnt wait for the whole filter to arrive. If the filter has substantial energy that leads the main peak, this may be able to affect the auditory system. –In Codecs this is a known, classic problem, and one that is hard to solve. –In some older rate converters, the pre-echo was quite audible. The point? We have to consider how the ear will analyze any anti-aliasing filter. Two examples follow.
An example of a filter with passband ripple and barely enough stop band rejection.
An example of a good, longer filter, with less passband ripple.
An interesting result Trying to use the shortest possible filter (i.e. minimizing MIPS) results in a worse time response from the point of view of the auditory system. –Passband ripple means that there are tails on the filter.
Another interesting result Sharper filters have more ringing, and may have more auditory problems: –The main lobe of a filter cutting off in 2.05 kHz must necessarily have a wider main lobe than the narrowest (in time) cochlear filter. df * dt >=1. –The main lobe of a filter cutting off over 4kHz will have a main lobe a bit smaller than the narrowest cochlear filter. –This suggests that for higher sampling rates, we do not want the fastest filter, rather a filter with a wider transition band, and narrower time response.
Is this audible? Thats a good question. Since we are stuck, in general, with the filters our ADCs and DACs use, its dreadfully hard to actually run this listening test. How would I do that? –Get a DAC with a SLOW rolloff running at 4x (192K). –Make a DC to 20 K Gaussian pulse at 192kHz. –Downsample by zeroing 3 of every 4 samples and multiplying the others by 4. –Generate a third signal with a TIGHT filter. –Compare the three signals in a listening test.
Is this 4x oversampling what people do? Not generally. Thats what they did for a while, until MIPS got even cheaper. What they did was go to more oversampling, a LOT more oversampling? Yes. –Uses more digital, less analog –There are a whole variety of circuitry and linearity reasons, almost all of them point toward much more oversampling and less analog hardware.
Massive oversampling: Remember: One gets 3dB per doubling of Fs from oversampling with a flat noise floor. If we also put a single integrator with its zero at 20kHz into H(s), we will see that the increased SNR available is db/doubling of Fs. There will be some cost in the form of a constant negative term to this SNR, which is overcome by very moderate levels of oversampling. Each additional order of integration adds another 6dB/doubling of the sampling frequency. On the next page are some curves:
Noise shape Vs. Order, integrator pole At w= (Note: Curves as examples only. Real-world circuit considerations limit these curves)
Right. So what does that do for me, anyhow? Remember, nearly the same amount of noise is being shaped in each case. –As there is more space under the curve at high frequencies, more of the noise moves to high frequencies. –That means there is LESS noise at low frequencies. –Therefore, if we FILTER OUT the high frequencies, we wind up with a lower sampling rate signal with a higher SNR.
Some Examples (low order) Order 1Order 2 1x 2x 4x 8x 16x 32x 64x 128x Original SNR 0dB. Base before upsampling = 48kHz
SNR vs. order vs downsampling rate for that (ideal) system. Order 1 Order 2 Order 3 Order 4 Order 5 Order 6 1x0 (dB) x ,8 4x x x x x x
Real converters IC designers have found that having 4-bit flash converters (16 levels, 24dB SNR) inside a delta- sigma converter is often the cheapeast way to achieve the required results with present-day digital circuitry. –The sampling rate can be lower, so the circuitry and power run slower. –The filters can be shorter. –The flash converter takes some space, but less power and space than additional DSP circuitry.
Details about those examples All of the examples have n integrators with a knee at 20kHz. This is not necessarily the optimum solution, it is used for example The examples are theoretically calculated, there is no component or electrical error involved. –Nothing is ever this good in the real world. Are you surprised?
Auditory system characteristics Everything must be considered within the relevant cochlear filter bandwidth. 0dB SPL is slightly below atmospheric noise level. 120dB SPL is a good maximum, even that level is very dangerous for hearing. High frequency issues may be due to actual hearing, to filter time response issues, or both. Gradual filters are safer than steep filters.
Quantization and Sampling Antialiasing and antiimaging filters are not just a good idea, its a requirement. Dithering is not just a good idea, its a requirement. There are many ways to quantize and sample.
converter Technology converter technology exists to do proper, clean conversions that operate over the advisable part of the human hearing range, both in frequency and level. There is no basic mathematical difference in the result of SAR vs. a Delta-Sigma converter in terms of what it delivers to the PCM system, the differences are due to circuitry and cost issues. Capturing the direct delta-sigma waveform (single or multibit) can be done. One group of high-rate proponents sells such a system. The only things this results in, practically, are the removal of the sharp anti-aliasing filter and the retention of the high-frequency noise. –This also makes it hard to process or capture the signal –For instance, an IIR filter that does bass boost might take 128 bits of arithmetic width to impliement for a 1-bit data input. –Most electronics then require a filter to protect them from the HF noise. Tweeters and power amps in particular do not like this kind of input at all.
How to test converters Noise in the presence of low signal Noise in the presence of maximum signal Single tone source Broadband (i.e. something like the room correction noise) stimulii Multitones that test for aliasing. One could do another 2 hours on how to test converters. Is there a demand?