Background Noise Definition: an unwanted sound or an unwanted perturbation to a wanted signal Examples: Clicks from microphone synchronization Ambient.

Slides:



Advertisements
Similar presentations
Revised estimates of human cochlear tuning from otoacoustic and behavioral measurements Christopher A. Shera, John J. Guinan, Jr., and Andrew J. Oxenham.
Advertisements

Acoustic Echo Cancellation for Low Cost Applications
MPEG-1 MUMT-614 Jan.23, 2002 Wes Hatch. Purpose of MPEG encoding To decrease data rate How? –two choices: could decrease sample rate, but this would cause.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Time-Frequency Analysis Analyzing sounds as a sequence of frames
Speech Enhancement through Noise Reduction By Yating & Kundan.
Advanced Speech Enhancement in Noisy Environments
CS 551/651: Structure of Spoken Language Lecture 11: Overview of Sound Perception, Part II John-Paul Hosom Fall 2010.
Windowing Purpose: process pieces of a signal and minimize impact to the frequency domain Using a window – First Create the window: Use the window formula.
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
2004 COMP.DSP CONFERENCE Survey of Noise Reduction Techniques Maurice Givens.
1 Digital Audio Compression. 2 Formats  There are many different formats for storing and communicating digital audio:  CD audio  Wav  Aiff  Au 
Background Noise Definition: an unwanted sound or an unwanted perturbation to a wanted signal Examples: – Clicks from microphone synchronization – Ambient.
Speech & Audio Processing
1 Audio Compression Techniques MUMT 611, January 2005 Assignment 2 Paul Kolesnik.
Time-Frequency and Time-Scale Analysis of Doppler Ultrasound Signals
Single-Channel Speech Enhancement in Both White and Colored Noise Xin Lei Xiao Li Han Yan June 5, 2002.
Digital Image Processing Chapter 5: Image Restoration.
Speech Enhancement Based on a Combination of Spectral Subtraction and MMSE Log-STSA Estimator in Wavelet Domain LATSI laboratory, Department of Electronic,
HIWIRE MEETING CRETE, SEPTEMBER 23-24, 2004 JOSÉ C. SEGURA LUNA GSTC UGR.
1 New Technique for Improving Speech Intelligibility for the Hearing Impaired Miriam Furst-Yust School of Electrical Engineering Tel Aviv University.
Warped Linear Prediction Concept: Warp the spectrum to emulate human perception; then perform linear prediction on the result Approaches to warp the spectrum:
Spectral Analysis Spectral analysis is concerned with the determination of the energy or power spectrum of a continuous-time signal It is assumed that.
A Full Frequency Masking Vocoder for Legal Eavesdropping Conversation Recording R. F. B. Sotero Filho, H. M. de Oliveira (qPGOM), R. Campello de Souza.
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
GCT731 Fall 2014 Topics in Music Technology - Music Information Retrieval Overview of MIR Systems Audio and Music Representations (Part 1) 1.
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
DIGITAL WATERMARKING OF AUDIO SIGNALS USING A PSYCHOACOUSTIC AUDITORY MODEL AND SPREAD SPECTRUM THEORY By: Ricardo A. Garcia University of Miami School.
Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.
Background Noise Definition: an unwanted sound or an unwanted perturbation to a wanted signal Examples: Clicks from microphone synchronization Ambient.
Microphone Integration – Can Improve ARS Accuracy? Tom Houy
Multiresolution STFT for Analysis and Processing of Audio
Scheme for Improved Residual Echo Cancellation in Packetized Audio Transmission Jivesh Govil Digital Signal Processing Laboratory Department of Electronics.
Nico De Clercq Pieter Gijsenbergh Noise reduction in hearing aids: Generalised Sidelobe Canceller.
Speech Enhancement Using Spectral Subtraction
The Care and Feeding of Loudness Models J. D. (jj) Johnston Chief Scientist Neural Audio Kirkland, Washington, USA.
Pitch-synchronous overlap add (TD-PSOLA)
1 Audio Compression. 2 Digital Audio  Human auditory system is much more sensitive to quality degradation then is the human visual system  redundancy.
Nico De Clercq Pieter Gijsenbergh.  Problem  Solutions  Single-channel approach  Multichannel approach  Our assignment Overview.
Speech Signal Representations I Seminar Speech Recognition 2002 F.R. Verhage.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Adaphed from Rappaport’s Chapter 5
Noise Reduction Two Stage Mel-Warped Weiner Filter Approach.
Speech Enhancement Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimation method.
Physical Layer PART II. Position of the physical layer.
SOUND PRESSURE, POWER AND LOUDNESS MUSICAL ACOUSTICS Science of Sound Chapter 6.
P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti / DSP 2009, Santorini, 5-7 July DSP 2009 (Santorini, Greece. 5-7 July 2009), Session: S4P,
Speech Enhancement for ASR by Hans Hwang 8/23/2000 Reference 1. Alan V. Oppenheim,etc., ” Multi-Channel Signal Separation by Decorrelation ”,IEEE Trans.
1 Audio Coding. 2 Digitization Processing Signal encoder Signal decoder samplingquantization storage Analog signal Digital data.
Lecture#10 Spectrum Estimation
Automatic Equalization for Live Venue Sound Systems Damien Dooley, Final Year ECE Progress To Date, Monday 21 st January 2008.
Introduction to psycho-acoustics: Some basic auditory attributes For audio demonstrations, click on any loudspeaker icons you see....
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
By Sarita Jondhale 1 Signal preprocessor: “conditions” the speech signal s(n) to new form which is more suitable for the analysis Postprocessor: operate.
SOUND PRESSURE, POWER AND LOUDNESS
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab Multiplicative Update of AR gains in Codebook- driven Speech.
Speech and Singing Voice Enhancement via DNN
Speech Enhancement Summer 2009
PATTERN COMPARISON TECHNIQUES
Introduction to Audio Watermarking Schemes N. Lazic and P
Digital transmission over a fading channel
Spectral Analysis Spectral analysis is concerned with the determination of the energy or power spectrum of a continuous-time signal It is assumed that.
Speech Perception CS4706.
Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing
朝陽科技大學 資訊工程系 謝政勳 Application of GM(1,1) Model to Speech Enhancement and Voice Activity Detection 朝陽科技大學 資訊工程系 謝政勳
Ben Scholl, Xiang Gao, Michael Wehr  Neuron 
Speech Perception (acoustic cues)
Govt. Polytechnic Dhangar(Fatehabad)
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

Background Noise Definition: an unwanted sound or an unwanted perturbation to a wanted signal Examples: Clicks from microphone synchronization Ambient noise level: background noise Roadway noise Machinery Additional speakers Background activities: TV, Radio, dog barks, etc. Classifications Stationary: doesn’t change with time (i.e. fan) Non-stationary: changes with time (i.e. door closing, TV)

Noise Spectrums Power measured relative to frequency f White Noise: constant over range of f Pink Noise: Decreases by 3db per octave; perceived equal across f but actually proportional to 1/f Brown(ian): Decreases proportional to 1/f2 per octave Red: Decreases with f (either pink or brown) Blue: increases proportional to f Violet: increases proportional to f2 Gray: proportional to a psycho-acoustical curve Orange: bands of 0 around musical notes Green: noise of the world; pink, with a bump near 500 HZ Black: 0 everywhere except 1/fβ where β>2 in spikes Colored: Any noise that is not white Audio samples: http://en.wikipedia.org/wiki/Colors_of_noise Signal Processing Information Base: http://spib.rice.edu/spib.html

Applications ASR: Sound Editing and Archival: Mobile Telephony: Prevent significant degradation in noisy environments Goal: Minimize recognition degradation with noise present Sound Editing and Archival: Improve intelligibility of audio recordings Goals: Eliminate noise that is perceptible; recover audio from old wax recordings Mobile Telephony: Transmission of audio in high noise environments Goal: Reduce transmission requirements Comparing audio signals A variety of digital signal processing applications Goal: Normalize audio signals for ease of comparison

Signal to Noise Ratio (SNR) Definition: Power ratio between a signal and noise that interferes. Standard Equation in decibels: SNRdb = 10 log(A Signal/ANoise)2 N= 20 log(Asignal/Anoise) For digitized speech SNRf = P(signal)/P(noise) = 10 log(∑n=0,N-1sf(n)2/nf(x)2) where sf is an array holding samples from frame, f; and nf is an array of noise samples. Note: if sf(n) = nf(x), SNRf = 0

Stationary Noise Suppression Requirements low residual noise low signal distortion low complexity Problems Tradeoff between removing noise and distorting the signal More noise removal also distorts the signal Popular approaches Time domain: Moving average filter (distorts frequency domain) Frequency domain: Spectral Subtraction Time domain: Weiner filter (autoregressive)

Auto regression Definition: An autoregressive process is one where a value can be determined by a linear combination of previous values Formula: Xt = c + ∑0,P-1ai Xt-i + nt This is none other than linear prediction; noise is the residue Thought: Perhaps iterative linear prediction could eventually leave just noise in the residue

Spectral Subtraction Noisy signal: yt = st + nt where st is the clean signal and nt is additive noise Therefore: st = yt – nt and estimated s’t = yt – n’t The power spectrum: S’(f)2 = |Y(f)|2 – |N’(f)|2 S’(f) = (|Y(f)|2 – |N’(f)|2)½ Generalize to: S’(f) = (|Y(f)|a – |N’(f)|a)1/a Or S’(f) = Y(f)( 1 – (|N’(f)|/Y(f))a )1/a Perform an inverse transform back into time domain S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans. Acoustics, Speech, Signal Processing, vol. ASSP-27, Apr. 1979.

Spectral Subtraction Block Diagram Note: Gain refers to the factor to apply to the frequency bins

Assumptions Noise is relatively stationary within each segment of speech The estimate in non-speech segments is a valid predictor The phase differences between the noise signal and the speech signal can be ignored The noise is a linear signal There is no correlation between the noise and speech signals There is no correlation between noise in the current sample with noise in previous samples

Implementation Issues Question: How do we estimate the noise? Answer: Use the frequency distribution during times when no voice is present Question: How do we know when voice is present? Answer: Use Voice Activity Detection algorithms (VAD) Question: Even if we know the noise amplitudes, what about phase differences between the clean and noisy signals? Answer: Since human hearing largely ignores phase differences, assume the phase of the noisy signal. Question: Is the noise independent of the signal? Answer: We assume that it is. Question: Are noise distributions really stationary? Answer: We assume yes.

Voice Activity Detector (VAD) Many VAD algorithms exist (Next set of slides) General approach Compare current frame energy to the current noise estimate Apply rules for temporal speech General principle: It is better to misclassify noise as speech than to misclassify speech as noise. Example: Multi-Rate (AMR) GSM coder (06.94 version 7.1.0) “Digital cellular telecommunications system (Phase 2+); Voice Activity Detector (VAD) for Adaptive Multi-Rate (AMR) speech traffic channels; General description (GSM 06.94 version 7.1.0 Release 1998)," 1998.

Phase Distortions Problem: We don’t know how much of the phase in an FFT is from noise and from speech. Assumption: The algorithm assumes the phase of both are the same (that of the noisy signal). Result: When SNR approaches 0db the audio has an hoarse sounding voice. Why?: The phase assumption means that the expected noise magnitude is incorrectly calculated. Conclusion: There is a limit to spectral subtraction utility when SNR is close to zero

Echoes The signal is typically framed with a 50% overlap Rectangular windows lead to significant echoes in the noise reduced signal Solution: Overlapping windows by 50% using Bartlet (triangles), Hanning, Hamming, or Blackman windows reduces this effect.

Musical noise Definition: Random isolated tone bursts across the frequency. Why? Most implementations set frequency bin magnitudes to zero if noise reduction would cause them to become negative Green dashes: noisy signal, Solid line: noise estimate Black dots: projected clean signal

Evaluation Advantages Disadvantages Easy to understand and implement The noise estimate is not exact When too high, speech portions will be lost When too low, some noise remains When a noise frequency exceeds the noisy sound frequency, a negative frequency results. Incorrect assumptions Negligible with large SNR values; significant impact with small SNR values.

Ad hoc Enhancements Eliminate negative frequencies: S’(f) = Y(f)( max{1 – (|N’(f)|/Y(f))a )1/a, t} Result: source of musical noise Reduce the noise estimate S’(f) = Y(f)( max{1 – b(|N’(f)|/Y(f))a )1/a, t} Different constants for a, b, t in the frequency bands Turn to psycho-acoustical methods Maximum likeliood: S’(f) = Y(f)( max{½–½(|N’(f)|/Y(f))a )1/a,t} Smooth spectral subtractions: GS(p) = λFGS(p-1)+(1-λF)G(p) Exponentially average noise estimate over frames |W (m,p)|2 = λN|W(m,p-1)|2 + (1-λN)|X(m,p)2, m = 0,…,M-

Acoustic Noise Suppression Take advantage of the properties of human hearing related to masking Preserve only the relevant portions of the speech signal Don’t attempt to remove all noise, only that which is audible Utilize: Mel or Bark Scales Perhaps utilize overlapping filter banks

Threshold of Hearing The limit of the internal noise of the auditory system Tq(f) = 3.64(f/1000)-0.8 – 6.5e-0.6(f/1000-3:3)^2 + 10-3(f/1000)4 (dB SPL)

Masking

Acoustical Effects Characteristic Frequency (CF): The frequency that causes maximum response at a point of the Basilar Membrane Neuron exhibit a maximum response for 20 ms and then decrease to a steady state, recovering a short time after the stimulus is removed Masking effects can be simultaneous or temporal Simultaneous: one signal drowns out another Temporal: One signal masks the ones that follow Forward: still audible after masker removed (5ms–150ms) Back: weak signal masked from a strong one following (5ms)

Non Stationary Noise Example: A door slamming, a clap Characterized by sudden rapid changes Time Domain signal Energy Frequency domain Large amplitudes outside the normal frequency range Short duration in time Possible solutions: compare to energy, correlation, frequency of previous frames Example: cocktail party (background voices) What would likely happen to happen in the frequency domain? How about in the time domain? Any Ideas?