The Physics and Psycho-Acoustics of Surround Recording Part 2

Slides:

Advertisements

Similar presentations

David Griesinger Acoustics, Cambridge, Massachusetts, USA

Advertisements

Advanced Mixing Advanced Audio and Mixing with Greg Hill AVGenius.com Copyright © 2011 AV Genius, LLC

Frequency response adaptation in binaural hearing David Griesinger Cambridge MA USA

Frequency response adaptation in binaural hearing

Loudspeaker and listener positions for optimal low-frequency spatial reproduction in listening rooms. David Griesinger Location: Lexicon, 3 Oak Park Dr.,

Harman Specialty Group

Listening to Concert Halls

Subjective aspects of room acoustics David Griesinger

Measurement of acoustic properties through syllabic analysis of binaural speech David Griesinger Harman Specialty Group Bedford, Massachusetts

Harman Specialty Group – (till Nov. 21)

Multichannel solutions for sound enhancement and acoustic conditions in concert halls and operas David Griesinger Lexicon

Pitch, Timbre, Source Separation, and the Myths of Sound Localization

Morset Sound Development Sound System Calibration Example - Performing Monitor Calibration in Studios.

EE2F2 - Music Technology 4. Effects. Effects (FX) Effects are applied to modify sounds in many ways – we will look at some of the more common Effects.

Time-Frequency Analysis Analyzing sounds as a sequence of frames

Auditory scene analysis 2

SYN-AUD-CON Acoustic Test and Measurement Seminar The Early Sound Field: Properties, Perceptual Attributes, and Measurement Methods NEIL THOMPSON SHADE.

Basic Acoustics Inverse square law Reinforcement/cancellation

Auditorium Acoustics Chapter 23. Sound Propagation Free field sound pressure proportional to 1/r SPL drops 6 dB with every doubling of distance. Indoors.

Room Acoustics. Reverberation Reverberation direct sound reflected sounds.

Chapter-8 Room and Auditorium Acoustics 1.Criteria in Acoustical Design The acoustical quality of a room is determined largely by its Reverberation time.

Auditorium Acoustics 1. Sound propagation (Free field)

Pitch Perception.

Reflections Diffraction Diffusion Sound Observations Report AUD202 Audio and Acoustics Theory.

1 Normal Probability Distributions. 2 Review relative frequency histogram 1/10 2/10 4/10 2/10 1/10 Values of a variable, say test scores In.

Resident Physics Lectures

Image and Sound Editing Raed S. Rasheed Digital Sound Digital sound types – Monophonic sound – Stereophonic sound – Quadraphonic sound – Surround.

SYED SYAHRIL TRADITIONAL MUSICAL INSTRUMENT SIMULATOR FOR GUITAR1.

Back to Stereo: Stereo Imaging and Mic Techniques Huber, Ch. 4 Eargle, Ch. 11, 12.

Creating The Recorded Image of Turkish Art Music: The Decision Making Process in a Recording Session Doç. Dr. CAN KARADOĞAN İTÜ TMDK.

PREDICTION OF ROOM ACOUSTICS PARAMETERS

3-D Sound and Spatial Audio MUS_TECH 348. Cathedral / Concert Hall / Theater Sound Altar / Stage / Screen Spiritual / Emotional World Subjective Music.

On Timbre Phy103 Physics of Music. Four complex tones in which all partials have been removed by filtering (Butler Example 2.5) One is a French horn,

Digital Voice Communication Link EE 413 – TEAM 2 April 21 st, 2005.

SUBJECTIVE AND OBJECTIVE ROOM ACOUSTIC PARAMETERS Acoustics of Concert Halls and Rooms Science of Sound, Chapter 23 Concert Halls and Opera Houses (Beranek.

GEOMETRICAL DESIGN STUDIES ACOUSTICS OF CONCERT HALLS AND ROOMS Handbook of Acoustics, Chapter 9 Long, Architectural Acoustics, Chapter 19.

20 10 School of Electrical Engineering &Telecommunications UNSW UNSW 10 Author: Jonathan Jayanthakumar An Analysis of the Role of the First.

So far: Historical overview of speech technology  basic components/goals for systems Quick review of DSP fundamentals Quick overview of pattern recognition.

Live Sound Reinforcement

Fundamentals of Digital Audio. The Central Problem n Waves in nature, including sound waves, are continuous: Between any two points on the curve, no matter.

L INKWITZ L AB Accurate sound reproduction from two loudspeakers in a living room 13-Nov-07 (1) Siegfried Linkwitz.

1 Recording Fundamentals INART 258 Fundamentals of MIDI & Digital Audio Mark BalloraMark Ballora, instructor 1.

Inspire School of Arts and Science Jim White. Equalization (EQ) Equalization means boosting or cutting specific frequencies within an audio signal. A.

LE 460 L Acoustics and Experimental Phonetics L-13

ADVANCED RADIO PRODUCTION Books: “Modern Radio Production” by Hausman, Benoit, Messere, & O’Donnell: Chapter 15 Pertemuan 12 Matakuliah: O Dasar-Dasar.

Acoustics Reverberation.

Mono and Stereo Miking Techniques. Choosing Microphones Limited collection: useful for broad range of applications  Neumannn KM 184’s (desert island.

Spatial Perception of Audio vs. The Home Theatre James D. Johnston Chief Scientist, DTS, Inc.

Review Exam III. Chapter 10 Sinusoidally Driven Oscillations.

SOUND IN THE WORLD AROUND US. OVERVIEW OF QUESTIONS What makes it possible to tell where a sound is coming from in space? When we are listening to a number.

The Care and Feeding of Loudness Models J. D. (jj) Johnston Chief Scientist Neural Audio Kirkland, Washington, USA.

Inspire School of Arts and Science Jim White. What is Reverb? Reverb or ‘reverberation’ is not simply just an effect which makes vocals sound nice! It.

 Space… the sonic frontier. Perception of Direction  Spatial/Binaural Localization  Capability of the two ears to localize a sound source within an.

L INKWITZ L AB S e n s i b l e R e p r o d u c t i o n & R e c o r d i n g o f A u d i t o r y S c e n e s Hearing Spatial Detail in Stereo Recordings.

MONO SOUND. In everyday life we listen with two ears. As we compare these two separate sound images of the external world, they build a three dimensional.

3-D Sound and Spatial Audio MUS_TECH 348. Stereo Loudspeaker Reproduction.

Effects. Effects in Music n All music that is recorded or amplified relies on effects to enhance certain characteristics of the sound. n Guitarists typically.

Introduction to psycho-acoustics: Some basic auditory attributes For audio demonstrations, click on any loudspeaker icons you see....

Pitch What is pitch? Pitch (as well as loudness) is a subjective characteristic of sound Some listeners even assign pitch differently depending upon whether.

Microphones. How Microphones Work Sound is created when a vibrating object (such as a guitar string, drum skin etc..) causes the air around it to vibrate.

Recording Arts…Audio Sound Waves Fall What does this all mean to you in this class? You are always working with sound waves – it is important to.

Time Based Processors. Reverb source:

The normal approximation for probability histograms.

Listening to Concert Halls

Auditorium Acoustics 1. Sound propagation (Free field)

PREDICTION OF ROOM ACOUSTICS PARAMETERS

PREDICTION OF ROOM ACOUSTICS PARAMETERS

Hearing Spatial Detail

Chapter 12 Power Analysis.

Speech Perception (acoustic cues)

Presentation transcript:

The Physics and Psycho-Acoustics of Surround Recording Part 2 David Griesinger Lexicon dgriesinger@lexicon.com www.world.std.com/~griesngr

Introduction We all know how to make a good recording We need good music A very good performance And satisfactory balance between the solos and the instruments. But we want to make a great recording How do we do it? How do we know when a recording is great? We must learn how to hear the technical quality of a great recording, And learn how to achieve the best result. The talk is based on classical music – but the techniques and perceptions apply to all recordings.

The recording space is very important! It is much easier to achieve a great result in a large hall. But large halls with great acoustics are rare. Our job is to make a great result in the hall we have available (usually small). This talk will tell you how to do it. And help you hear the difference. We will not talk about issues such as instrumental balance or the differences between microphones or sample rates. We will talk about basic sound properties: The clarity and localization of the direct sound The perceived distance between the sound source and the listener (depth) The recording and reproduction of the sound of the hall.

Major Goals To review the physical and psychoacoustic properties that make a great recording (or a great performance space). The clarity of the direct sound (the absence of muddiness) The creation of a large listening area and a stable front image – using three front speakers in a 5.1 recording. The blending together of the different instruments into a whole acoustic scene through early reflections. The re-creation of the acoustic space of the performance, through late reflections and envelopment. To show how muddiness occurs when there are too many early reflections To show how we perceive muddiness through our perception of pitch. To show how the loudspeaker positions in the playback room influences the envelopment at low frequencies. To play as many musical examples as possible!

Localization – a stable front image over a large listening area In a high-quality recording the front image does not greatly change when a listener moves away from the sweet spot. Image stability requires using the center channel speaker in a 5.1 recording. Even without the center speaker some two channel recordings are more stable than others. Popular music recordings are often better than classical recordings in image stablilty. The secret is Amplitude Panning Which is almost universally used in popular music recording.

Time delay panning Many engineers attempt to record a broad sound source with closely spaced microphones Omni microphones are often used in a so-called “Decca Tree”. Cardioid microphones are often used in the “ORTF” configuration Both these techniques rely on time delay differences to spread the front image Time delay spreading only works when the listener is in the sweet spot. The front image is not stable over a large area.

Training to hear localization The importance of ignoring the sweet spot Most research tests of localization use a single listener, who is strictly restricted to the sweet spot. Your customers will not listen this way! How do you know if the recording has a stable front image? Move laterally in front of the loudspeakers. Does the sound image stay wide and fixed to the loudspeakers, or does it follow you? Do the soloists in the center follow you left or right? If they do they are recorded with too much phantom center. Since most 5 channel recording methods are derived from stereo techniques almost all have too much phantom center. A center image that follows a listener who moves laterally out of the sweet spot is the most common failing of even the best five channel recordings. Play examples

Example: Time delay panning outside the sweet spot. Record the orchestra with a “Decca Tree” - three omni microphones separated by one meter. A source on the left will be picked up with equal level in all three microphones. The time delays will be different by +-3ms. On playback, a listener on the far right will hear this instrument coming from the right loudspeaker. This listener will hear every instrument coming from the right.

Amplitude panning outside the sweet spot. If you record with three widely spaced microphones, an instrument on the left will have high amplitude in the left microphone. The time delay will also be much shorter. A listener on the far right will hear the instrument on the left. Now the orchestra spreads out across the entire loudspeaker basis, even when the listener is not in the sweet spot.

WARNING!!! In the author’s experience a front image that is not stable when you walk in front of the speakers will never make a great recording. regardless of how beautiful it is in the sweet spot. This is my FIRST test of a recording, either two channel or surround.

Summary of acoustic perceptions in a recording 1. Clarity – the lack of muddiness Clarity is perceived through the direct sound – sound that travels directly from the instrument to the microphone. A clear direct sound requires that the microphone be relatively close to the instrument! 2. Blend and depth Blend and depth are perceived through early reflections that arrive from all around the listener. The total energy in these early reflections must be less than the energy in the direct sound! In a surround recording these reflections should come equally from all the loudspeakers (except the center,) and they must be decorrelated. (different) 3. Envelopment (reverberation) Envelopment is perceived through late reflected energy that arrives from all around the listener. (Not just from the rear!) The energy must be decorrelated in each loudspeaker

Clarity Clarity to an acoustician is determined through intelligibility – the ability to understand speech or a musical line. For this talk I will use a different meaning: For me clarity is the perception that the sound source is acoustically close to the listener. While this definition may seem vague, almost everyone agrees on the optimal acoustic distance for a recorded sound source. We can demonstrate this perception:

Muddiness: Dry Speech + 40ms reflections Mono speech: The sound is clear, but much too close to the loudspeaker. Speech with ~40ms allpass reflections and no direct sound. Mono: Stereo: Note both the mono and the stereo version sound muddy and distant. There is no phantom image in the stereo version.

Reflections used in these experiments The reflections used in these experiments form a decaying burst which peaks about 25ms after the direct sound, and has largely decayed away by 50ms. The reflections are different in the two channels, and have a flat frequency response.

Optimum level for Early Reflections Recorded sound consists of a mix of direct sound and reflections Too many reflections and muddiness results. But reflections add a sense of blend and depth. An optimum mix must be found. The optimum level for early reflections is -4 to -6dB relative to the direct sound. This level is preferred by almost every listener. In a surround recording the reflections should come equally from all directions (except the center), and be decorrelated. The perceived result is independent of the precise delay time and the pattern of the reflections. It is the total energy which determines the perception.

Depth without Muddiness Dry speech Note the sound is uncomfortably close Mix of dry with early reflections at -5dB. The mix has distance (depth), and is not muddy! Note there is no apparent reverberation, just depth. Same but with the reflections delayed 20ms at -5dB. Note also that with the additional delay the reflections begin to be heard as discrete echos. But the apparent distance remains the same. Same but with the reflections delayed 50ms at -3dB Now the sound is becoming garbled. These reflections are undesirable! If the speech were faster it would be difficult to understand. Same but with reflections delayed 150ms at -12dB I also added a few reflections between 20 and 80ms at a level of -8dB to smooth the decay. Note the strong hall sense, and the lack of muddiness.

The ideal mix We see from the previous slide that the ideal acoustic mix has three independent perceptual requirements: 1. The direct sound dominates the total energy by at least 4dB. 2. There are early reflections that add blend, distance, and depth to the sound. These should come equally from all directions in a surround recording And they should avoid adding energy in the 50ms to 100ms time region. 3. There should be reflections (reverberation) with time delays greater than 150ms to provide the impression of the hall. To make a great recording we must separately capture all three!

Direction of early reflections It is not possible to detect whether the reflections come from the front or the rear when they arrive between 20ms and 50ms after the end of a sound. But it is more natural if they come from both front and rear. Using all four speakers also results in the largest sweet spot - demo

Muddiness is hard to avoid in small spaces! We are attempting to show that the optimum total energy for all reflections is at least 4dB less than the direct sound. The total reflected energy sum does not include the floor reflection. I will explain why later if there is time. The direct sound must dominate the total sound picture The reverberation radius of a small hall or church is usually below 2m, and may be as low as 1m. Every microphone used in the recording picks up both direct sound and reverberation. But only the microphone closest to the sound source picks up true direct sound. Direct sound into all the other microphones is perceived as a reflection, and adds to the potential distance and muddiness.

Muddiness also comes from the playback room! In this room there is no absorption in the front, and thus the reverberation radius is small, perhaps as low as 2.5m. The distance from the front loudspeakers to the listeners is greater than the reverberation radius. So the reverberation will be stronger than the direct sound. We are trying to keep the direct sound stronger than the reflections by 4dB. This goal is probably not possible to achieve in this room! (Except at frequencies above 1000Hz, where the side curtains begin to be absorptive.) Always mix your recordings in an absorbent space!

Boston Cantata Singers Cantata #76 Die Himmel erzahlen die Ehre Gottes Performance in Jordan Hall, January 23, 2004. Reverberation time in Jordan ~1.4 seconds at 1000Hz. This is similar to the Semperoper Dresden. The typical audience member is ~ 3 reverb radii from this singer. The dramatic consequences are highly audible. Although Jordan is beloved as a chamber music hall, the stage house is deep and reverberant. When the hall is full, the sound in the audience can be dry and muddy. The recording engineer must overcome these obstacles.

Cantata Singers Bach BWV 76 Multimiked recording. Note the clarity of vocal timbre (low sonic distance). Recording simulating the sound in the hall. Note the timbre coloration and the sense of distance to the performers. With the picture and after adaptation the performance is quite enjoyable.

The Ideal Reverberation has 20ms to 50ms reflections with a total energy -4dB to -6dB has relatively little energy from 50 to 150ms.

Most small rooms – (including playback rooms) Have exponential decay If we pick up enough late reflections to hear the hall, we will get too many early reflections. We will get coloration and poor intelligibility.

Example of as small recording space: Swedenborg Chapel, Cambridge

Oriana Consort in Swedenborg Chapel

Oriana Setup

Recording in Sweedenborg Chapel, Cambridge The chapel holds perhaps 200 people, but when it is empty the RT is ~ 1.8 seconds. And the reverberation radius is ~ 1.5m The picture shows four supercardioid microphones about 1m from the chorus. These provide the direct sound. With the supercardioid pattern we have a 6dB direct/reverberant ratio, so the reverberation is less than the direct sound by about 6dB. Note that in this space we must add hall sound and early reflections very carefully, or the sound will become muddy! In addition the early reflections and reverberation arrive soon after the direct sound. The sound seems small and cramped. There is no sense of space around the direct sound. The chorus microphones are as close as they can be to the chorus without creating balance problems. We cannot exclude the early reverberation by moving the mikes closer.

Main microphones in Sweedenborg Chapel The picture also shows two variable pattern microphones about 2m from the chorus. I put these there for an experiment. The sound is not very good… The problem with a “main microphone” pair in this space is that it must be placed too far from the singers! A main pair must be at least 2.5m away or there will be balance problems. This distance is beyond the reverberation radius, and the sound will be muddy.

Hall Sound in Sweedenborg The chapel is reverberant – with a high reverberation level But the reverberation is too strong in the 10-150ms time range. Using cardioid microphones pointing away from the sound source reduces the early reverberation energy and maximizes the late energy. The hall sounds larger and better.

Distance Perception and MUD Reflections during the sound event and up to 150ms after it ends create the perception of distance But there is a price to pay: Reflections from 10-50ms do not impair intelligibility. The fluctuations they produce are perceived as an acoustic “halo” or “air”around the original sound stream. (ESI) Reflections from 50-150ms contribute to the perception of distance – but they degrade both timbre and intelligibility, producing the perception of sonic MUD. We will have many examples of mud in this talk!

Training to hear MUD Mud occurs when the reverberant decay of the recording venue has too much reflected energy in the 10-150ms region of the decay curve. This is true of nearly all sound stages, small auditoria, and churches. If you are recording in such a space with a relatively large ensemble, you are in trouble. The perception of mud can be tricky, because our hearing mechanism adapts to a muddy environment, and the sonic degradation becomes inaudible after about 10 minutes. It is easy to convince yourself the recording is excellent when you have been listening to it all day. This is why we can enjoy a concert even when we are sitting far from the instruments. You MUST compare your recording to a reference recording in a short time A/B test.

Example: John Eargle at Skywalker ranch John Eargle has made wonderful recordings, particularly those with the Dallas Symphony on Delos Records But even he can be fooled by a small space As I said, you adapt quickly to such a space, and no longer hear the mud that it produces. John Eargle recently made a 5.1 channel DVD audio recording at the Skywalker ranch in Los Angeles. He was very excited by it – but listen and compare to Dallas. Skywalker is a large sound stage with controllable acoustics. It is not a concert hall. As a consequence the reverberation radius is relatively short. By my estimate (without having seen it) the radius is less than 3.5 meters. It is very easy to record mud in such a space. Many instruments are beyond the reverb radius. Adding more microphones only increases the reverberant pickup.

Recording in a large space is much easier! Covenant church is a very large space, holding more than 1000 people. It is damped by pew cushions and acoustic treatment on the walls, yielding a RT of 2.5 seconds and a large reverberation radius – probably above 3m. The microphones can be quite distant without picking up early reflections or reverberation. It is a very good place to record! (And it is exceptionally beautiful visually…)

Example – depth perspective through mike technique: When the reverberation radius is large enough we can use an extra pair of microphones to create a single early reflection. This can provide the needed perspective and depth Direct sound: Early reflection: Late reverberation: Direct + Early -5dB: Direct + Early + Late -8dB: Mike 480L Mike 480L

The depth impression is greatly improved in surround I will run the same experiment, but use all five speakers. The early reflections will come from both the front and rear equally, but different delay patterns will be used for each speaker. This means the reflections are decorrelated. The late (hall) reflections will also come equally but decorrelated in the front and rear speakers. This will create a large and uniform sweet spot for the acoustics.

The Polyhymnia Pentangle The Polyhymnia engineers employ a surround array of spaced omni microphones, at a spacing similar to the ITU playback array. The technique works well in spaces where the reverberation radius is equal to or greater than the microphone spacing! In this case the direct sound picked up by the rear microphones is perceived as an early lateral reflection and the adds distance to the front image. Caution!! In a small hall this array will be TOO MUDDY!!! In practice the Polyhemnia engineers often pick up the direct sound with accent microphones. In this case the front microphones provide a first reflection to the front speakers. The center microphone is also often moved closer to the sound sources, so it picks up mostly direct sound.

Boston Symphony Hall

Boston Symphony Hall 2631 seats, 662,000ft^3, 18700m^3, RT 1.9s It’s enormous! One of the greatest concert halls in the world – maybe the best. Recording here is almost too easy! Working here is a rare privilege Sufficiently rare I do not do it. (It’s a union shop.) The recording in this talk is courtesy of Alan McClellan of WGBH Boston. (Mixed from 16 tracks by the presenter) Reverb Radius is >20’ (>6.6m) even on stage. The stage house is enormous and NOT reverberant. With the orchestra in place, stage house RT = ~1 sec

Boston Symphony Hall, occupied, stage to front of balcony, 1000Hz This picture compares favorably to our picture of the ideal reverberation on a recording. But this is what an audience member hears 100 feet from the stage!

Boston Symphony Orchestra in Symphony Hall

Boston Cantata Singers in Symphony Hall. March 17, 2002

Microphone Array (WGBH)

Beware the “main microphone” array Nearly all engineers will provide a “main microphone” usually a “Decca Tree”, or a pair of omni or cardioid microphones. Almost always the sound from this array is only acceptable for instruments close to the microphones. Most of the instruments are far beyond the reverberation radius. The more distant instruments must be spot-miked. A cardioid pair (ORTF) has too much phantom center for an acceptable surround recording. (this is a two-channel technique only.) Very frequently time delay panning (for a Decca Tree or spaced omnis) makes the sound unusable in a high-quality mix. Time delay panning makes the front image unstable Closely spaced microphones yield high correlation at low frequencies, which degrades the sense of space. It is better to simply turn off the main microphone (even if your instructor insists you install one.) In our Boston Symphony Hall recording a pair of B&K omnis spaced ~25cm was hung behind the conductor by the WGBH engineer. Front pair Front pair LF

Correlation in the “main microphone:” two omnis spaced by ~25cm, just behind the conductor. ___ = measured correlation; - - - = calculated, assuming d=25cm The high correlation in this pair makes the sound unusable in a stereo or surround mix. It sounds unpleasant even in this lecture room, as the audio demo makes clear.

Beware the exclusive use of spaced front microphones In our recording the wide front orchestra pick-up is fine for the first row of the strings. But nearly all the orchestra is beyond the reverberation radius for these microphones. If we want good balance and clarity, we must use additional microphones over the orchestra And treat these microphones as part of our “main” array. Using cardioid microphones in front will help a lot. The cardioid is 4.7 dB less sensitive to reverberation, which will pick out more distant instruments with clarity. Using super cardioid microphones will help a little bit more. But if the stage house is reverberant the improvement is minimal. The author greatly prefers to use (equalized) directional microphones for orchestra and chorus pick-up. After equalization the bass performance is adequate. There is better control of leakage, and less MUD.

Balance and distance come first In any recording the balance between the musical forces should reflect the needs of the music. In this recording, even with 120 singers the chorus is nearly inaudible in the hall. So we must heavily use the chorus accent microphones. In the final mix MOST of the energy in the recording will come from these. In practice, these are our MAIN microphones! However, if we heavily use the chorus microphones, the chorus will sound too close to the loudspeakers And in front of the orchestra. To correct this distance problem we MUST use electronic early reflections. There is no other possible solution. Play example

Let’s build the hall sound We need decorrelated reverberation in both the front and the rear with equal level Test just the hall microphones to see if the reverberation is enveloping and uniform. Then add the front microphones for the direct sound. Where the hall balance is not correct you MUST augment the natural reverberation with electronics. In this recording the orchestra is much stronger than the chorus – even with 120 singers – and there is too little chorus in the natural reverberation!! When we add the accent microphones the chorus will sound as if they are in a smaller space. So we add electronic reverberation from the chorus (equally in all four outer speakers) from the surround reverberator.

Final Mix The final mix uses the three omni microphones over the chorus as the main microphones. They are simply patched to left, center, and right. The spot microphones for the soloists are mostly mixed to the center, with some panning to the left or right. (No divergence was used.) The orchestra is a combination of two wide spaced omnis patched to left front and right front. Augmented by spot microphones over the woodwinds and the more distant strings. the center channel was provided automatically through leakage from the soloists’s microphones. The rear channels come from a widely spaced pair of omnis about 20 feet behind the conductor, Extensively augmented by electronic early reflections and late reverberation.

Hall sound: decorrelation at low frequencies. It is widely believed that localization is impossible below 100Hz. So a single subwoofer has become the standard for reproducing low frequencies. Although localization below 100Hz is difficult in a small room, there is a large difference between a single subwoofer and an independently driven pair. We have turned off the subwoofer in this room and we are running the other speakers full-range. A great recording will easily demonstrate the difference between a single subwoofer and full-range discrete speakers. As a consequence you must be sure the hall sound in your recordings is decorrelated at low frequencies! Both in the front and in the rear of a surround recording. Most single microphone array surround techniques fail for this reason.

Conclusions A great recording: Has a stable front image over a large listening area. Has direct sound stronger than early reflections, microphone leakage, and reverberation. So it is not MUDDY! Has decorrelated early reflections both in the front speakers and in the rear speakers. These provide a sense of blend and depth to the recording. But be sure to mix in an absorbent space! Has decorrelated late reverberation in both the front and the back speakers. The decorrelation must be active for low frequencies It is possible to make a great recording in a small space But if the group is physically larger than the reverberation radius, electronic early reflections and reverberation will probably be necessary.

Medial Reflections – the detection of muddiness. Medial reflections can cause clear differences in quality. We can measure medial energy through an analysis of pitch. Pitch information is available in each critical band, even those above the frequency of auditory phase-locking. Here is an example of speech filtered into a 1000Hz 1/3 octave band. The waveform appears to be a series of decaying tone bursts, repeating at the fundamental frequency. When this signal is rectified, there is substantial energy at the fundamental frequency.

Waveform of speech formants The waveform of the word “five” in the 2kHz 1/3 octave band. The same, but convolved with a 20ms windowed burst of white noise, simulating a diffuse reflection, or the sound of a small reverberant room. Non-reverberant speech has a clear repeating pattern in the waveform. Reverberant speech does not. We can devise a measurement system around this difference.

The plus/minus pitch detector The pitch detector operates separately on each third octave band. Each band is rectified and low-pass filtered. The output is delayed, and then added and subtracted from the undelayed signal. The logs of the “plus” signal and the “minus” signal are then subtracted from each other. The result has a high sensitivity to fundamental pitch.

Example – “one, two” 2500Hz 1/3 octave band. Pitch detector output with dry speech – the syllables “one, two” with no added reverberation. Note the high accuracy of the fundamental extraction and the >15dB S/N

Same – but convolved with 20ms of white noise Convolving with white noise does not change the intelligibility, nor the C80, but dramatically changes the sound – and the pitch coherence. By chance the second syllable is not seriously degraded, but the first one is – at least in this 1/3 octave band The sound quality is markedly degraded. We need a measure for this perception.

“one,two” 2500Hz band – equal mix of direct and one diffuse reflection at 30ms. The high pitch coherence and high direct/reverberant ratio in the first 30ms is easily seen at the start of each syllable.

Segment of opera – old Bolshoi Segment from the old Bolshoi Segment from the new Bolshoi. (I was unable to produce a similar plot.) Segment of Verdi – pitch coherence of the 2500Hz 1/3 octave band. F, F, glide to A. Recording from the back of the first balcony. There is no obvious gap before reflections arrive, and the pitch coherence appears relatively high.

Sound examples – syllables “one,two,three” with no reverberation 1kHz 1/3 octave band 1.25kHz 1.6kHz 2kHz 2.5kHz 3.2kHz Note the height and frequency of the pitch coherence peaks are (almost) uniform through all bands.

Maximum pitch coherence vs 1/3 octave band for non-reverberant speech The syllables “one two three four five six seven” are analyzed. Note that the maximum pitch coherence is relatively constant across all 1/3 octave bands, although the value depends on the particular vowel

“one,two,three” convolved with 20ms noise 1kHz 1.25kHz 1.6kHz 2kHz 2.5kHz 3.2kHz Note that most of the pitch coherence has been eliminated

Maximum pitch coherence vs /3 octave bands for speech convolved with 20ms noise. The syllables “one two three four five six seven” are analyzed. Note the pitch coherence is low and not constant across third octave bands.

Pitch coherence of speech with a diffuse reflection at a level of 0dB 1kHz 1.25kHz 1.6kHz 2kHz 2.5kHz Note the low pitch coherence for some of the syllables in several bands

Maximum pitch coherence vs 1/3 octave bands for direct + reverb at 0dB Analysis of the syllables “one two three four five six seven.” Note the low and noise-like coherence for most of the syllables.

Pitch coherence of speech with a diffuse reflection at a level of -4dB (optimum) 1kHz 1.25kHz 1.6kHz 2kHz 2.5kHz 3.2kHz Note the high pitch coherence on most syllables in most bands. This reflection level is usually chosen as optimum.

Max pitch coherence vs 1/3 octave band for direct and reflected at -4dB Analysis of the syllables “one two three four five six seven.” Note the pitch coherence is both high and uniform across 1/3 octave bands

Teatro Alla Scala, Milan Echograms from LaScala. (From Hidaka and Beranek) illustrate these profiles: Top curve - 2kHz octave band, 0-200ms At 2kHz note the high direct sound and low level of reflections in the 50-150ms time range. Bottom curve - 500Hz octave band 0-200ms Note the high reverberation level – and short critical distance.

Let’s listen to Alla Scala! Matlab can be used to read these printed impulse responses and convert them into real impulse responses. 1. First we read the .bmp file from a scan, and convert the peaks in the file to delta functions with identical time delay, and an amplitude equivalent to the peak height. All the direct sound energy is combined into a single delta function, and the level of the direct sound is normalized (relative to the rest of the decay), so the 2kHz and 500kHz impulses can be accurately combined. 2. We then apply a random variable ~+- 5ms to the delay time to correct for the quantization in the scan. 3. We then extend the echogram to higher times by tacking on an exponentially decaying segment of white noise, with a decay rate equal to the published data for the hall. 4. We then filter the result for the 2kHz echogram with a 1k high-pass filter, and combine it with the 500Hz echogram low-pass filtered at 1kHz. 5. If desired we can create a “right channel” and a “left channel” reverberation by using a different set of random variables in steps 2 and 3. 6. We convolve a segment of dry sound with the new impulse response. The result is sonically quite convincing!

Alla Scala at 500Hz – reading the plot Top curve – 500Hz measured impulse response as given by Beranek. JASA Vol. 107 #1, Jan 2000, pp 356-367 Bottom curve – impulse response as regenerated from delta functions, passed through a 500Hz 6th order 1 octave filter. Note the correspondence is more than plausible.

Alla Scala 500Hz – randomizing and extending Top graph: Alla Scala published data Bottom graph: regenerated impulse response after randomization and extension.

Pitch coherence of speech in La Scalla 1kHz 1.25Hz 1.6kHz 2kHz 2.5kHz 3.2kHz Note the excellent sharpness of the pitch peaks, and good consistency across bands.

Maximum coherence vs 1/3 octave bands La Scala, Milan Pitch coherence is similar to our example where the direct/reverberant ratio ~=4dB While not as clear as in some examples, fundamental pitch is easily extracted using this simple detector.

Listen to Alla Scala, NNT Tokyo, Semperoper 2kHz 500Hz 2kHz and 500Hz Impulse responses from Scala Milan NNT Theater Tokyo Semper Oper Dresden (All data from Hidaka and Beranek) Original Sound

Pitch Coherence – NNT opera house, Tokyo 1kHz 1.25kHz 1.6kHz 2kHz 2.5kHz 3.2kHz Note the peaks – where they exist – are very broad, indicating inexact pitch extraction. For most bands, there is no extracted pitch for all syllables.

Maximum coherence vs 1/3 octave band NNT Opera Theater, Tokyo Fundamental pitch is not extractable using this simple detector.

Binaural Examples in Opera Houses It is very difficult to study opera acoustics, as the sound changes drastically depending on: the set design, the position of the singers (actors), the presence of the audience, and the presence of the orchestra. Binaural recordings made during performances give us the only clues. Here is a sound bite from a famous German opera house: Note the excessive distance of the singers, and the low intelligibility. This is MUD in action! And here is an example from another famous German opera house: Note the increase in intelligibility, reduced distance, and the improvement in dramatic connection between the singer and the audience.

Synthetic Opera House Study We can use MC12 Logic 7 to separate the orchestra from the singers on commercial recordings, and test different theories of balance and reverberation. From Elektra – Barenboim. Balance in original is OK by Barenboim. Original Orchestra Left&Right Vocals Downmix - No reverb on the singers Reverb from orchestra Reveb from singers Downmix with reverb on the singers.

Muddiness: Dry Speech + 20ms noise Mono speech signal Convolved with noise – (diffuse reflections) Mono: Stereo: Note the reflections increase muddiness and distance. The stereo version is more natural than the mono, but equally distant.

Recorded speech in Covenant Voice segment recorded at 1.5m with a supercardioid mike The same segment with the reflections below. Note the muddiness increases dramatically A frequency-flat reflection pattern with peak energy about 30ms after the direct sound

Demo 1: Clarity Demonstrate dry sound Demonstrate muddy sound by adding reflections in monaural. Note that adding only very early reflections does not decrease the intelligibility. But it increases the perceived distance of the source. Demonstrate adding reflections in surround Note that adding the reflections in surround increases the perceived distance more effectively. Less reflected energy is needed, and the direct sound remains clear. The optimum early energy is between -4dB and -6dB