Front-end Audio Processing: Reflections on Issues, Requirements, and Solutions Tomas Gaensler mh acoustics www.mhacoustics.com Summit NJ/Burlington VT.

Slides:



Advertisements
Similar presentations
The Fully Networked Car Geneva, 4-5 March Automotive Speech Enhancement of Today: Applications, Challenges and Solutions Tim Haulick Harman/Becker.
Advertisements

Acoustic Echo Cancellation for Low Cost Applications
A Phonetician ’ s Guide to Audio Formats Chilin Shih University of Illinois at Urbana Champaign LSA 2006January 5-8, 2006.
Time-Frequency Analysis Analyzing sounds as a sequence of frames
AVQ Automatic Volume and eQqualization control Interactive White Paper v1.6.
Speech Enhancement through Noise Reduction By Yating & Kundan.
How ClearOne Microphone Technology Improves Speech Recognition Results SpeechTEK 2007 Kurt Olsen Director of Product Marketing ClearOne.
Speech Compression. Introduction Use of multimedia in personal computers Requirement of more disk space Also telephone system requires compression Topics.
Room Acoustics: implications for speech reception and perception by hearing aid and cochlear implant users 2003 Arthur Boothroyd, Ph.D. Distinguished.
Adaptive Filters S.B.Rabet In the Name of GOD Class Presentation For The Course : Custom Implementation of DSP Systems University of Tehran 2010 Pages.
© 2006 AudioCodes Ltd. All rights reserved. AudioCodes Confidential Proprietary Signal Processing Technologies in Voice over IP Eli Shoval Audiocodes.
1 TAC2000/ IP Telephony Lab Perceptual Evaluation of Speech Quality (PESQ) Speaker: Wen-Jen Lin Date: Dec
1 Voice Quality Enhancements 2 Outline Acoustic and network echo Noise Reduction (NR) Mobile Cross-talk Control (MCC) Noise Level Compensation (NLC)
Microphones and Room Acoustics and Their Influence on Voice Signals Svante Granqvist 1, Jan Švec 2 1 Department of Speech, Music and Hearing (TMH), Royal.
Speech Enhancement Based on a Combination of Spectral Subtraction and MMSE Log-STSA Estimator in Wavelet Domain LATSI laboratory, Department of Electronic,
3/24/2006Lecture notes for Speech Communications Multi-channel speech enhancement Chunjian Li DICOM, Aalborg University.
Zhengyou Zhang, Qin Cai, Jay Stokes
Signal processing and Audio storage Equalization Effect processors Recording and playback.
Why is ASR Hard? Natural speech is continuous
1 Recent development in hearing aid technology Lena L N Wong Division of Speech & Hearing Sciences University of Hong Kong.
ABSTRACT: Noise cancellation systems have been implemented to counter the effects of echoes in communications systems. These systems use algorithms that.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Adaptive Noise Cancellation ANC W/O External Reference Adaptive Line Enhancement.
Digital Audio Signal Processing Lecture-4: Acoustic Echo Cancellation Marc Moonen Dept. E.E./ESAT-STADIUS, KU Leuven homes.esat.kuleuven.be/~moonen/
Yuan Chen Advisor: Professor Paul Cuff. Introduction Goal: Remove reverberation of far-end input from near –end input by forming an estimation of the.
Acoustic Echo Cancellation Using Digital Signal Processing. Presented by :- A.Manigandan( ) B.Naveen Raj ( ) Parikshit Dujari ( )
Digital Signals and Systems
Fundamentals of Digital Communication
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
The Fully Networked Car Geneva, 4-5 March Wideband Speech Communications: the Good, the Bad, and the Ugly Scott Pennock Sr. Hands-Free Standards.
 Sound is a form of energy similar to light, which travels from one place to another by alternately compressing and expanding the medium through which.
EE Audio Signals and Systems Effects Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
By Grégory Brillant Background calibration techniques for multistage pipelined ADCs with digital redundancy.
1 Techniques to control noise and fading l Noise and fading are the primary sources of distortion in communication channels l Techniques to reduce noise.
Microphone Integration – Can Improve ARS Accuracy? Tom Houy
Technical Seminar Presented by :- Debabandana Apta (EC ) National Institute of Science and Technology [1] “ECHO CANCELLATION” Presented.
By Asst.Prof.Dr.Thamer M.Jamel Department of Electrical Engineering University of Technology Baghdad – Iraq.
ECE 4710: Lecture #9 1 PCM Noise  Decoded PCM signal at Rx output is analog signal corrupted by “noise”  Many sources of noise:  Quantizing noise »Four.
2nd Workshop on Wideband Speech Quality - June Some Aspects of Wideband Speech in Enterprise Telephony Eric J. Diethorn
Nico De Clercq Pieter Gijsenbergh Noise reduction in hearing aids: Generalised Sidelobe Canceller.
Speech Enhancement Using Spectral Subtraction
Acoustic impulse response measurement using speech and music signals John Usher Barcelona Media – Innovation Centre | Av. Diagonal, 177, planta 9,
Dynamic Range and Dynamic Range Processors
DSB-SC AM Tx signal  AM Tx signal spectrum
1 PATTERN COMPARISON TECHNIQUES Test Pattern:Reference Pattern:
Digital Audio Signal Processing Lecture-2: Microphone Array Processing - Fixed Beamforming - Marc Moonen Dept. E.E./ESAT-STADIUS, KU Leuven
Definition and Coordination of Signal Processing Functions for telephone connections involving automotive speakerphones Scott Pennock Senior Hands-Free.
Nico De Clercq Pieter Gijsenbergh.  Problem  Solutions  Single-channel approach  Multichannel approach  Our assignment Overview.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
SwissQual AG – Your QoS Partner Workshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction 1 8th and 9th June Mainz,
A BRIEF OVERVIEW OF THE SCIENCES OF SOUND Intro to Sound Part 2 1.
Linda Wang HW engineer, GC-CAL Feb 18, 2005 Sy.Sol 6120: Acoustic Parameters Training (TAT)
Troubleshooting Echo in VoIP Network Deployments
Dept. E.E./ESAT-STADIUS, KU Leuven homes.esat.kuleuven.be/~moonen/
Dongxu Yang, Meng Cao Supervisor: Prabin.  Review of the Beamformer  Realization of the Beamforming Data Independent Beamforming Statistically Optimum.
NOISE. NOISE AND DISTORTION NOISE : Noise can be defined as an unwanted signal that interferes with the communication of another signal. A noise itself.
Variable Step-Size Adaptive Filters for Acoustic Echo Cancellation Constantin Paleologu Department of Telecommunications
Motorola presents in collaboration with CNEL Introduction  Motivation: The limitation of traditional narrowband transmission channel  Advantage: Phone.
ClearOne ® Beamforming Microphone Array 06/01/2012.
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab Multiplicative Update of AR gains in Codebook- driven Speech.
UNIT-IV. Introduction Speech signal is generated from a system. Generation is via excitation of system. Speech travels through various media. Nature of.
Compensating cocktail party noise with binaural spatial segregation on a novel device targeting partial hearing loss Luca Giuliani 1, Sara Sansalone 2,
AVQ Automatic Volume and eQualization control
Speech Enhancement Summer 2009
Adaptive Dual Microphone
PATTERN COMPARISON TECHNIQUES
The Importance of In-Mask Communications
AVQ Automatic Volume and eQqualization control
Sampling rate conversion by a rational factor
Wiener Filtering: A linear estimation of clean signal from the noisy signal Using MMSE criterion.
Presentation transcript:

Front-end Audio Processing: Reflections on Issues, Requirements, and Solutions Tomas Gaensler mh acoustics Summit NJ/Burlington VT USA

Front-end Audio Processing Processing to enhance perceived and/or measured sound quality in communication and recording devices

Not So Famous Quotes (Acoustic Jewelry/Bluetooth Headset) Gary Elko (mh/Bell labs colleague) At IWAENC 1995: “Acoustic Echo cancellation will not be needed in the future when people wear acoustic jewelry” Arno Penzias (1978 Nobel prize laureate) “No one would want acoustic jewelry because people would think the users talking to themselves are crazy” I’m glad the success of Bluetooth headsets show that both were completely wrong!

Classical Front-end Architectures - POTS Carbon microphone with expansion effect that reduces noise Large coupling loss in handset mode Switch Loss Switch loss in speakerphone supporting telephones

Classical Front-end Architectures – Cellphone 1995

Classical Front-end Architectures – Cellphone

Cellphones and Handsfree Common problems: Far-end listener does not hear near-end talker Near-end listener does not understand far-end talker Why? Form factor – Size Limited understanding of physics and acoustics(?)

Echo louder than near-end: Linear AEC ERLE  dB After cancellation Residual Echo to Near-end Ratio (RENR): RENR  = 0 dB RX/TX Levels, Coupling and Doubletalk >20 dB of residual echo suppression required Duplexness suffers Far-end  95—100 dBSPL at loudspeaker 85—90 dBSPL at mic Near-end talker  55—70 dBSPL at mic

TX: Dynamic Range and Noise Echo 90 dBSPL  Peak echo  dB No saturation of echo in TX path Near-end speech Level: 70 dBSPL Actual speech to room noise ratio is only about 27 dB at best Echo Level: 90 dBSPL Gain is required to get loud enough output Perceived noise level is ~20 dB above normal room noise level

TX: Fixed-point Processing and Quantization Noise N=64  Q-noise increases by 36 dB Double-precision “required” Q-noise increases by 6log2(N) dB!

RX: Dynamic Range and Distortion Small loudspeakers have rather high cut-off frequency (high-pass) EQ often required to get acceptable “sound” (frequency response). However EQ means: Loss of signal loudness and dynamic range Increased (analog) distortion Many manufacturers compensate the loss of signal level by excessive digital gain and therefore get (digital) saturation To AEC Digital gainAnalog gain

What Can or Should be Done? Minimize acoustical coupling by good physical design TX Use noise suppression but not excessively Double-precision, block scaling, or floating-point RX Compression instead of fixed gain 10% or less loudspeaker/driver THD is desired

What about Non-linear AEC Algorithms? Interesting problem proposed and worked on for many years Not practical in most AEC applications since Complicated model  Gain and therefore saturation possibly in both TX and RX paths Added complexity and system cost Often slow convergence Difficult to fine-tune in field Even when non-linear cancellation works perfectly, the user still perceives a distorted loudspeaker signal!

Classical Front-end Architectures – Cellphone Why RX NS? Why TX NS?

Single Channel Noise Suppression Basic single channel noise suppressor An extremely successful signal processing invention by Manfred Schroeder in the 1960s Musical tones – is it a (solved) problem? How do we evaluate and improve quality? How about convergence rate?

Background to Single Channel Noise Suppressors Block processing: Frequency domain model: Linear Time-varying filter: Wiener filter: speech NS noise “enhanced” speech

Background to Single Channel Noise Suppressors Estimation of spectra is often done recursively: Frequency smoothing:, when speech is “not” present

Musical Tones – Is it a (Solved) Problem? Examples  Original (“Sally Sievers’ reel, June-Sept. 1964” by Manfred Schroeder and Mohan Sondhi at Bell Labs)  Original + noise (iSNR ~ 6 dB)  Schroeder – 1960s  “Generic spectral subtraction” – Boll 1979  IS-127 – 1995 “A problem of last century”, only a constraint in design Controlling variance of suppression gains Any NS algorithm should be constrained not to have musical tones Must only have a small impact on voice quality

Quality Metrics Most importantly: Listen! SNR Total Segmental During speech Distortion metrics: ISD (Itakura-Saito distance) ITU-T P.862: PESQ/MOS-LQO

Quality Metric – P.862 (PESQ/MOS-LQO) MOS-LQO (MOS Listening Quality Objective) Alg-1/2 – Wiener methods with 12 dB noise suppression What can the best noise suppressor achieve?

Quality Metric – “My Rule of Thumb” 12 dB Ideal MOS (PESQ) performance bound is given by shifting the unprocessed PESQ-curve to the left Example for 12 dB suppression 12 dB shift to the left

Convergence Rate Important performance criterion: Non-stationary noise conditions Frame loss Main objective: Maximize convergence rate while maintaining speech quality

Convergence Rate – A Useful Test a)Input sequence b)IS-127 c)Wiener Based d)A spectral subtraction m-script retrieved from the internet

Convergence Rate and MOS-LQO a)“Normal” b)“Fast” c)MOS-LQO

Current Applications and Drivers of NS Technology Where is NS going in industry now? Beyond “12 dB” of suppression Multi-microphone solutions Two- or more channel suppressors Linear beamforming Applications Mobile phones (a few two-microphone models have reached the market) Bluetooth headsets: great "new" application for signal processing (Ericsson BT headset 2000)

Background to Linear Beamforming N : Number of microphones Broadside linear beamforming (e.g. delay-sum) Directional gain: 10log(N) White Noise Gain (WNG)>0 Practical size: “large” (~30cm) Endfire differential beamforming Directional gain: 20log(N) WNG<0 Practical size: “small” (1.5-5cm)  Differential beamformers more suitable for small form-factors

Background to Linear Beamforming What do we gain? Less reverberation (increased intelligibility) Less (environmental) noise No (or low) distortion on axis Possible interference rejection by spatial zero(s) Some Issues: Performance is given by critical distance! Increase in sensor noise (WNG, differential beamforming)

Beamforming: Critical Distance Critical distance (Reverberation radius): reverberant-to-direct path energy ratio is 0 dB: DI = Directivity Index: gain of direct to reverberant energy over an omni- directional microphone Order of finite differences used. 1 st : 2 mics, 2 nd : 3 mics etc) OrderDI [dB]

First-Order Differential Beamforming

Classical First-Order Beamformer Responses CardioidHypercardioidDipole

Beamforming Demo: DEWIND  processing