Presentation on theme: "Frequency response adaptation in binaural hearing David Griesinger Cambridge MA USA www.DavidGriesinger.com."— Presentation transcript:
Frequency response adaptation in binaural hearing David Griesinger Cambridge MA USA www.DavidGriesinger.com
Introduction This paper proposes fundamental questions about the properties of human hearing. –A new model for up/down and front/back localization is presented that takes into account observed data and time-constants. –This model explains why recording and reproduction of binaural material has been generally unsuccessful. –This paper will discuss methods that allow most listeners to accurately reproduce binaural recordings without head tracking. We will go on to discuss: –Are the methods used for up/down and front/back localization different for different individuals? Can binaural recordings made for one individual be made to work for another individual without head-tracking? –Given the extremely non-uniform transfer of sound pressure from the soundfield to a human eardrum, how can we accurately perceive a frequency balance as flat –Does a frequency balanced pink noise from a frontal loudspeaker sound balanced in frequency? –If not, are commercial recordings, which are equalized using loudspeakers, actually frequency balanced? –If not – in what ways are they biased?
Better binaural technology To answer these questions the author constructed an accurate physical model of his own hearing, including the ear canal and eardrum. The eardrum impedance is modeled with a resistance tube. The pinna compliance is modeled by cutting away the inside of the pinna casting. Tiny probe microphones were also built with a very soft tip. This allows binaural recording of performances at the authors eardrums, correct headphone calibration, and verification of the accuracy of the dummy head model.
A perplexing Discrepancy Recordings made with this technology provide excellent localization accuracy. –But at least initially the timbre of the playback through carefully calibrated headphones seems incorrect. –The frequencies around 3kHz seem too strong, and the bass is usually weaker than my memory of the performance. Checking and re-checking the calibrations has convinced me the recordings and the playback are correct. –It is my memory of the performance that is flawed. –The most reasonable explanation is that we continuously adapt to the frequency balance of sounds around us. We remember the timbre after such adaptation has taken place.
A simple model of human hearing Over a long period of time the brain builds spectral maps of the features in HRTFs that define up/down and back/front information. When a sound is heard these features are compared to the maps, and a localization is found.
A simple model of human hearing-2 When a match has been found, the perceptible features of the particular HRTF are removed, again from a fixed spectral map. But this spectrum is altered by an adaptive equalizer, which acts to make all frequency bands equally perceived. The time constant of this mechanism for the author is about 5 minutes. It may be shorter for some individuals. This equalizer can correct for gross errors in timbre if given sufficient time.
An example The features of this model can be verified through the time constants for the various sections. –The match to a particular direction is very fast. –The correction for the HRTF timbre takes longer – a fraction of a second. The author once noticed a gliding whistle while walking under an overhead ventilator slot that emitted broadband noise. –Walking rapidly (~3.5mph) under that noise source produced a gliding whistle, somewhat like a Doppler shift. –This is the uncorrected sound of the vertical HRTFs –In spite of the lack of timbre correction the sound was correctly localized – even at much higher speeds. No timbre shift was perceived when walking slowly under the slot (<2mph). –When there is sufficient time our brains correct the timbre – but this correction takes time.
Headphone listening When we listen to binaural recordings with headphones the whole process is broken. Headphones match individuals very poorly (as we will see). None of the spectral features match the fixed HRTF maps. The brain is confused, and the subject perceives the sound inside the head. But the adaptive equalizer is still active – and after a time period the sound is perceived as frequency balanced.
Consequences of adaptation for sound engineers. Tonmeisters talk about being familiar with a particular loudspeaker or studio. –They claim they can make an accurately balanced recording with these tools. A logical conclusion is that the timbre of loudspeakers or playback equipment is irrelevant. –As long as you are familiar with it everything is fine. But the conclusion is clearly false. –A recent book by Floyd Toole details the changes in the frequency content of popular records as fashion in monitor loudspeakers changed. –All sound reinforcement engineers are aware of how much intelligibility can increase when a sound system is equalized. This typically involves a treble boost above 1000Hz. –Absolute frequency balance matters.
Upward Masking Sound enters basilar membrane at the oval window. High frequencies excite the membrane near the entrance, passing through it and exiting through the second window below. Low frequencies travel further down the spiral, until they excite the membrane and pass through. Strong low frequencies disturb the high frequency portion of the membrane, causing the well know phenomenon of upward masking. Upward masking is a purely mechanical effect, and it cannot be compensated by adaptive equalization. The high frequencies are simply not detected. Intelligibility is frequently low in acoustic spaces because there is little low frequency absorption, and the LF acoustic power is boosted. We adapt to the frequency imbalance, and say the sound is OK – but unintelligible
Upward masking and mixing A consequence of upward masking is that elements in a mix that are audible in one studio or set of loudspeakers may be masked in another. Recordings mixed over headphones can be seriously in error. –Most headphones boost the treble, raising the apparent clarity. –As an engineer I learned early to mistrust the balance between direct and reverberation over headphones –The best I could do was make the recording much dryer than I, or my clients, preferred and hope for the best. One can always make the recording more reverberant Making it dryer is much more difficult! Can we find a way to correct headphone errors?
Accurate binaural recordings If safe, comfortable probe microphones are available, it is possible to make accurate binaural recordings. First we measure the headphone response at the eardrum – response H. We can then record with the same probe microphones. If we equalize the recording with the inverse of H, H, the recording will play back with perfect fidelity. (Note that measuring the headphone at the eardrum eliminates the second instance of the ear canal.)
Playback of binaural over speakers If we want to play back the binaural recording over speakers, or if we want to play loudspeaker music over headphones, we need to measure the spectrum of a carefully equalized loudspeaker at the eardrums of the listener. This is the spectrum S. We then equalize the binaural recording with S, and we can play it over speakers. Equalizing the phones with HS allows playback of both binaural and loudspeaker mixed music. HS is the inverse of the free-field earphone response
Binaural equalization in practice Note the two previous slides made no attempt to equalize the probe microphone(s). –With those schemes, the response of the probe cancels in the final result. In practice, the probe response is complicated and difficult to invert. –The author carefully measures the impulse response of the probes with a B&K 4133 as a reference. –The responses are inverted in the frequency domain with Matlab. With care minimal pre-echo is produced. All measurements with the probes are first convolved with this inverse function. –Second order parametric filters are combined to produce the other equalization filters. –Parametric filters can be easily inverted, and sound better than mathematical inverse filters to the author
Probe Equalization This graph shows the frequency response and time response of the digital inverse of the two probes as measured against a B&K 4133 microphone. Matlab is used to construct the precise digital inverse of the probe response, both in frequency and in time. The resulting probe response is flat from ~25Hz to 17kHz. In general, I prefer NOT to use a mathematical inverse response, as these frequently contain audible artifacts. I minimized these artifacts here by carefully truncating the measured response as a function of frequency.
Adaptive Timbre – how do we perceive pink noise as flat Pink noise sounds plausibly pink even on this sound system. Lets add a single reflection – and listen for a few minutes without other sounds: –The result at first sounds colored, with an identifiable pitch component. –The pitch component gradually reduces its loudness. But now play the unaltered noise again. –The unaltered noise now has a pitch, complementary to the pitch from the reflection.
Some demos of eardrum recordings These recordings have been equalized for loudspeaker reproduction. You may be able to judge clarity and intelligibility over near-field loudspeakers. –Accurate headphone reproduction requires headphone equalization –If probes are available the method described here will work, –A method which uses equal loudness curves will be described later in this paper. opera balcony 2, seat 11 –Moderate intelligibility, reverberant sound opera balcony 3, seat 12 –Poor intelligibility, very reverberant opera standing room –Deep under balcony 2 – good intelligibility A concert hall – row 8 (quite close) –Very good sound. Not so good further back.
The need for eardrum measurements Almost all current binaural research uses HRTF and headphones measured with a blocked or partially blocked ear canal. –There is an assumption (without proof) that such measurements accurately reproduce the sound pressure at the eardrum. –Dummy head models such as KEMAR or the B&H HATS make the additional assumption that a standard microphone coupler duplicates the impedance of an ear canal. These assumptions are false. To quote Hammershoi and Moller: –The most immediate observation is that the variation [in sound transmission from the entrance of the ear canal to the eardrum] from subject to subject is rather high…The presence of individual differences has the consequence that for a certain frequency the transmission differs as much as 20dB between subjects. –20dB is a significant difference in response! In spite of the data, Hammershoi and Muller recommend using measurements at the entrance to the ear canal! –The recommendation can be disproved by a single subject…
HRTFs from blocked ear canals Here are pictures of a partially blocked canal and a fully blocked canal. The following data applies to the fully blocked measurements, but the partially blocked measurements are similar.
Blocked measurements vs eardrum To compare the two measurement methods, I equalize the blocked measurement of a single HRTF to the same HRTF measured at the eardrum. I chose the HRTF at azimuth 15 degrees left, and 0 degrees elevation. The needed equalization requires at least 3 parametric sections. –Red is the right ear, blue is the left ear
HRTF differences blocked to eardrum Twenty different HRTFs were measured with a blocked canal, equalized by the above EQ, and the difference between them and the open ear canal are plotted. This data supports Hammershoi and Mullers contention that that the directional properties of the measured HRTFs are preserved by the blocked measurement, at least to a frequency of ~7kHz. Note the vertical scale is +-30dB. The errors at 7-10k are significant.
Headphone response differences Using the same method, I measured three headphones. Blue is the AKG 701, red is the AKG 240, and Cyan is the Sennheiser 250 The curves plot the difference between the blocked and unblocked measurement, with the measured HRTF at azimuth 15, elevation 0 as a reference. The vertical scale is +-30dB. Errors of at least 10dB exist at midband.
More headphones Blue – and old but excellent noise protection earphone by Sharp. Red – Ipod earbuds. The error in the blocked measurements are large enough to prevent accurate localization of binaural recordings.
Analysis The previous curves are NOT the frequency response of the headphones under test. They show the ERRORs that occur when a blocked ear canal measurement is used instead of the eardrum pressure. Because the scale of the plots is +-30dB the difference curves look better than they really are. Errors of 10dB in frequency ranges vital for timbre are present for almost all the examples shown. We can conclude that it is possible to use recordings from dummy heads that lack accurate ear canals IF AND ONLY IF it is possible to equalize them, either by comparison to a reference with ear canals, or by equalizing them to sort-of flat for a frontal sound source. If this is done, we must also equalize the headphones at the eardrum for the same source. We can with more assurance conclude that it is NOT possible to equalize headphones with a measurement system that does NOT include an accurate ear canal model. –Both KEMAR and B&K HATS do not qualify. Measurement systems with true ear canals are a very good thing –In addition I have found that for many earphones it is vital to have a pinna model with identical compliance to a human ear. –Particularly on-ear headphones alter the concha volume – and drastic changes in the frequency response can result if the compliance is not accurate. –Pinna are complex structures with variable compliance – so this is tricky!
Headphone calibration through equal loudness contours There is a non-invasive method of headphone calibration to an individual. IEC publication 268-7 and German Standard DIN 45-619 recommend loudness comparison using 1/3 octave noise instead of physical measurement for headphones. These recommendations were superseded by diffuse field measurements as suggested by Theile. Should these methods be revived? – I believe the answer is yes.
Equal Loudness Top – ISO equal loudness curves for 80dB and 60dB SPL. These are the average from many individuals, so features in them are broadened. Bottom – (blue/red) averaged frontal response over a +-5 degree cone in front of the author, measured at the eardrums. The loudspeaker was equalized to 200Hz. Bottom - black/cyan – the same measurement for the authors dummy head with no equalization. The difference in eardrum impedance above 8kHz boosts the response of the dummy – but this can be removed by equalization.
Equal Loudness 2 We can measure equal loudness curves because the ear does not adapt when the stimulus is narrow band – either noise or tone. The differences between the top and bottom curves in the previous slide can be attributed to the properties of the middle ear and the inner ear. Thus equal loudness curves are a method of measuring the effective frequency response an individuals hearing system in the absence of short-term adaptation to the environment. They represent our sensitivity to timbre in a quiet environment, or before adaptation takes place. Their extreme lack of flatness is proof of the existence, and effectiveness, of adaptation.
Loudness matching experiments The author wrote a Windows program that presents a subject with alternating bands of 1/3 octave noise, one at 500Hz, and the other at a test frequency –The subject matches the loudness of the two bands by adjusting the test band up and down. –In use, the equal loudness curves from 500Hz to 12kHz for a carefully equalized frontal loudspeaker are obtained for this subject. –The subject then repeats the experiment with a pair of headphones over a frequency range of 30Hz to 12kHz. In this case the balance between the two ears is also tested and corrected. –The difference of the loudspeaker and headphone measurements becomes the ideal headphone correction for this individual. This program can be used to test the variation in response of a particular headphone over a wide range of individuals. –Subjects report that the resulting equalization is very pleasant, and binaural recordings made with the authors ears reproduce well without head tracking. –Music recorded for loudspeakers is judged identical in timbre in both the headphones and the loudspeaker. –The equalization is also identical in timbre to a large high-quality stereo sound system.
Results for ~10 individuals About 10 students from Helsinki University participated in the test. The top left graph shows the equal loudness contours from the loudspeaker for each subject. The other curves show the difference between this curve and the equal loudness curves for four different headphones. It was hoped that the Stax 303 phones would show less individual variation. This was not the case. (blue = left ear, red = right cyan = authors left ear) The Philips phones were an insert type. These also showed large variation among individuals.
The dip at 3kHz for all subjects All subjects show a dip in the loudspeaker equal loudness curve at 3kHz. This corresponds to a universal peak in the response of the concha and ear canal at this frequency. It is this ear sensitivity peak that causes the most trouble with our memory of timbre. When we first play an accurately calibrated binaural recording – particularly of a speaking voice or a chorus – this peak in the loudness is highly noticeable and unpleasant. –Once we adapt, everything is OK again.
Comments on these results. The experiment is equivalent to equalizing headphones for a frontal, free-field response. –This is at variance with the current standard for diffuse field equalization. –In the authors experience the free field equalization is far more useful than the diffuse field equalization, and gives better results on loudspeaker recorded music. –Loudspeaker equalized recordings are intended to be heard in a room where the direct sound is frontal, and dominant. After doing the experiment the subjects were given the opportunity to listen to music both with the frontal equalization and with their own equal loudness equalization (the speaker curves were not subtracted.) –The authors binaural recordings were perceived with better localization with the free-field equalization. (These recordings were equalized for free-field reproduction.) –Many subjects preferred their own equal loudness equalization for other material. This equalization requires no adaptation to a recording that has an accurately flat frequency response. The sound can be quite seductive.
Some Speculation Equal loudness curves have two prominent features; the increase insensitivity around 3kHz, and the decrease in sensitivity at low frequencies. Music that has been recorded with frequency linear microphones and not post-processed often seems lacking in bass and harsh in the midrange – both on loudspeakers and on eardrum-equalized headphones. The author speculates that an unconscious collusion between loudspeaker designers and recording engineers routinely boosts the bass, and tweaks the 3kHz region on commonly available recordings. –It is common to boost the bass 10dB at 60Hz in automobiles. Floyd Tooles findings that the loudspeakers that are closest to frequency linear are preferred in blind listening tests may be biased by the choice of recordings used in the tests. –The spectrum of choral music in the authors unprocessed recordings shows a ~3dB peak around 3kHz. –This peak is generally absent in vocalists on pop music. Perhaps they use a different singing technique – and perhaps the equalization has been adjusted closer to an equal-loudness curve.
Conclusions Experiments and observation suggest that human hearing uses a combination of fixed spectral maps to perceive the localization of a sound, and then corrects the HRTF timbre with a similar map. –The time constant for the directional match is milliseconds, the correction of timbre takes a fraction of a second These fixed maps are combined with a multiband AGC system that tends to equalize loudness across frequency bands. –This process takes time – typically minutes. This hearing model explains why accurate reproduction of timbre at the eardrum is an essential part of binaural recording and playback. –When timbre is reproduced correctly, head tracking is unnecessary. The existence of equal loudness curves show that for narrow band signals long-time- constant adaptation of frequency response does not take place. When a broadband signal is first heard, a listener will perceive a timbre that reflects their individual equal loudness curve. But this timbre is replaced in a short time with a more balanced timbre, and it is this balanced timbre that is remembered. –This process explains why concert halls of widely different quality can provide a satisfying experience – and why some outrageous loudspeakers get good reviews. It is likely that given the opportunity to equalize a recording to their own taste using loudspeakers with a flat frequency response, recording engineers will be tempted to move toward their own equal loudness curve. –Ones own equal loudness curve often sounds smoother and more natural than realistic reproduction. –The temptation is dangerous – but probably harmless. Individual loudness curves can be rather different – particularly at low frequencies. But adaptation will continue to work when the recording is played back, and if the response does not match that of the listener, they will soon not notice the difference.