Front-end Audio Processing: Reflections on Issues, Requirements, and Solutions Tomas Gaensler mh acoustics www.mhacoustics.com Summit NJ/Burlington VT.

Front-end Audio Processing: Reflections on Issues, Requirements, and Solutions Tomas Gaensler mh acoustics www.mhacoustics.com Summit NJ/Burlington VT USA

Front-end Audio Processing Processing to enhance perceived and/or measured sound quality in communication and recording devices

Not So Famous Quotes (Acoustic Jewelry/Bluetooth Headset) Gary Elko (mh/Bell labs colleague) At IWAENC 1995: “Acoustic Echo cancellation will not be needed in the future when people wear acoustic jewelry” Arno Penzias (1978 Nobel prize laureate) “No one would want acoustic jewelry because people would think the users talking to themselves are crazy” I’m glad the success of Bluetooth headsets show that both were completely wrong!

Classical Front-end Architectures - POTS Carbon microphone with expansion effect that reduces noise Large coupling loss in handset mode Switch Loss Switch loss in speakerphone supporting telephones

Classical Front-end Architectures – Cellphone 1995

Classical Front-end Architectures – Cellphone 2005 - 2010

Cellphones and Handsfree Common problems: Far-end listener does not hear near-end talker Near-end listener does not understand far-end talker Why? Form factor – Size Limited understanding of physics and acoustics(?)

Echo louder than near-end: Linear AEC ERLE  20-30 dB After cancellation Residual Echo to Near-end Ratio (RENR): RENR  90-20-70 = 0 dB RX/TX Levels, Coupling and Doubletalk >20 dB of residual echo suppression required Duplexness suffers Far-end  95—100 dBSPL at loudspeaker 85—90 dBSPL at mic Near-end talker  55—70 dBSPL at mic

TX: Dynamic Range and Noise Echo 90 dBSPL  Peak echo 105-110 dB No saturation of echo in TX path Near-end speech Level: 70 dBSPL Actual speech to room noise ratio is only about 27 dB at best Echo Level: 90 dBSPL Gain is required to get loud enough output Perceived noise level is ~20 dB above normal room noise level

TX: Fixed-point Processing and Quantization Noise N=64  Q-noise increases by 36 dB Double-precision “required” Q-noise increases by 6log2(N) dB!

RX: Dynamic Range and Distortion Small loudspeakers have rather high cut-off frequency (high-pass) EQ often required to get acceptable “sound” (frequency response). However EQ means: Loss of signal loudness and dynamic range Increased (analog) distortion Many manufacturers compensate the loss of signal level by excessive digital gain and therefore get (digital) saturation To AEC Digital gainAnalog gain

What Can or Should be Done? Minimize acoustical coupling by good physical design TX Use noise suppression but not excessively Double-precision, block scaling, or floating-point RX Compression instead of fixed gain 10% or less loudspeaker/driver THD is desired

What about Non-linear AEC Algorithms? Interesting problem proposed and worked on for many years Not practical in most AEC applications since Complicated model  Gain and therefore saturation possibly in both TX and RX paths Added complexity and system cost Often slow convergence Difficult to fine-tune in field Even when non-linear cancellation works perfectly, the user still perceives a distorted loudspeaker signal!

Classical Front-end Architectures – Cellphone 2005 - 2010 Why RX NS? Why TX NS?

Single Channel Noise Suppression Basic single channel noise suppressor An extremely successful signal processing invention by Manfred Schroeder in the 1960s Musical tones – is it a (solved) problem? How do we evaluate and improve quality? How about convergence rate?

Background to Single Channel Noise Suppressors Block processing: Frequency domain model: Linear Time-varying filter: Wiener filter: speech NS noise “enhanced” speech

Background to Single Channel Noise Suppressors Estimation of spectra is often done recursively: Frequency smoothing:, when speech is “not” present

Musical Tones – Is it a (Solved) Problem? Examples  Original (“Sally Sievers’ reel, June-Sept. 1964” by Manfred Schroeder and Mohan Sondhi at Bell Labs)  Original + noise (iSNR ~ 6 dB)  Schroeder – 1960s  “Generic spectral subtraction” – Boll 1979  IS-127 – 1995 “A problem of last century”, only a constraint in design Controlling variance of suppression gains Any NS algorithm should be constrained not to have musical tones Must only have a small impact on voice quality

Quality Metrics Most importantly: Listen! SNR Total Segmental During speech Distortion metrics: ISD (Itakura-Saito distance) ITU-T P.862: PESQ/MOS-LQO

Quality Metric – P.862 (PESQ/MOS-LQO) MOS-LQO (MOS Listening Quality Objective) Alg-1/2 – Wiener methods with 12 dB noise suppression What can the best noise suppressor achieve?

Quality Metric – “My Rule of Thumb” 12 dB Ideal MOS (PESQ) performance bound is given by shifting the unprocessed PESQ-curve to the left Example for 12 dB suppression 12 dB shift to the left

Convergence Rate Important performance criterion: Non-stationary noise conditions Frame loss Main objective: Maximize convergence rate while maintaining speech quality

Convergence Rate – A Useful Test a)Input sequence b)IS-127 c)Wiener Based d)A spectral subtraction m-script retrieved from the internet

Convergence Rate and MOS-LQO a)“Normal” b)“Fast” c)MOS-LQO

Current Applications and Drivers of NS Technology Where is NS going in industry now? Beyond “12 dB” of suppression Multi-microphone solutions Two- or more channel suppressors Linear beamforming Applications Mobile phones (a few two-microphone models have reached the market) Bluetooth headsets: great "new" application for signal processing (Ericsson BT headset 2000)

Background to Linear Beamforming N : Number of microphones Broadside linear beamforming (e.g. delay-sum) Directional gain: 10log(N) White Noise Gain (WNG)>0 Practical size: “large” (~30cm) Endfire differential beamforming Directional gain: 20log(N) WNG<0 Practical size: “small” (1.5-5cm)  Differential beamformers more suitable for small form-factors

Background to Linear Beamforming What do we gain? Less reverberation (increased intelligibility) Less (environmental) noise No (or low) distortion on axis Possible interference rejection by spatial zero(s) Some Issues: Performance is given by critical distance! Increase in sensor noise (WNG, differential beamforming)

Beamforming: Critical Distance Critical distance (Reverberation radius): reverberant-to-direct path energy ratio is 0 dB: DI = Directivity Index: gain of direct to reverberant energy over an omni- directional microphone Order of finite differences used. 1 st : 2 mics, 2 nd : 3 mics etc) OrderDI [dB] 00 16 2.0 29.5 3.0 312 4.0

First-Order Differential Beamforming

Classical First-Order Beamformer Responses CardioidHypercardioidDipole

Beamforming Demo: DEWIND  processing

Front-end Audio Processing: Reflections on Issues, Requirements, and Solutions Tomas Gaensler mh acoustics www.mhacoustics.com Summit NJ/Burlington VT.

Similar presentations

Presentation on theme: "Front-end Audio Processing: Reflections on Issues, Requirements, and Solutions Tomas Gaensler mh acoustics www.mhacoustics.com Summit NJ/Burlington VT."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Front-end Audio Processing: Reflections on Issues, Requirements, and Solutions Tomas Gaensler mh acoustics www.mhacoustics.com Summit NJ/Burlington VT.

Similar presentations

Presentation on theme: "Front-end Audio Processing: Reflections on Issues, Requirements, and Solutions Tomas Gaensler mh acoustics www.mhacoustics.com Summit NJ/Burlington VT."— Presentation transcript:

Similar presentations

About project

Feedback