Speech Enhancement Summer 2009

Slides:



Advertisements
Similar presentations
Robust Speech recognition V. Barreaud LORIA. Mismatch Between Training and Testing n mismatch influences scores n causes of mismatch u Speech Variation.
Advertisements

Speech Enhancement through Noise Reduction By Yating & Kundan.
Advanced Speech Enhancement in Noisy Environments
2004 COMP.DSP CONFERENCE Survey of Noise Reduction Techniques Maurice Givens.
Digital Signal Processing
Background Noise Definition: an unwanted sound or an unwanted perturbation to a wanted signal Examples: – Clicks from microphone synchronization – Ambient.
Advances in WP1 Nancy Meeting – 6-7 July
Communications & Multimedia Signal Processing Meeting 6 Esfandiar Zavarehei Department of Electronic and Computer Engineering Brunel University 6 July,
Single-Channel Speech Enhancement in Both White and Colored Noise Xin Lei Xiao Li Han Yan June 5, 2002.
Speech Enhancement Based on a Combination of Spectral Subtraction and MMSE Log-STSA Estimator in Wavelet Domain LATSI laboratory, Department of Electronic,
Communications & Multimedia Signal Processing Formant Track Restoration in Train Noisy Speech Qin Yan Communication & Multimedia Signal Processing Group.
Background Noise Definition: an unwanted sound or an unwanted perturbation to a wanted signal Examples: Clicks from microphone synchronization Ambient.
1 Speech Enhancement Wiener Filtering: A linear estimation of clean signal from the noisy signal Using MMSE criterion.
Warped Linear Prediction Concept: Warp the spectrum to emulate human perception; then perform linear prediction on the result Approaches to warp the spectrum:
Introduction to Spectral Estimation
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
LE 460 L Acoustics and Experimental Phonetics L-13
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
Time-Domain Methods for Speech Processing 虞台文. Contents Introduction Time-Dependent Processing of Speech Short-Time Energy and Average Magnitude Short-Time.
Microphone Integration – Can Improve ARS Accuracy? Tom Houy
Digital Audio Signal Processing Lecture-4: Noise Reduction Marc Moonen/Alexander Bertrand Dept. E.E./ESAT-STADIUS, KU Leuven
Speech Enhancement Using Spectral Subtraction
Reduction of Additive Noise in the Digital Processing of Speech Avner Halevy AMSC 663 Mid Year Progress Report December 2008 Professor Radu Balan 1.
Definition and Coordination of Signal Processing Functions for telephone connections involving automotive speakerphones Scott Pennock Senior Hands-Free.
1.Processing of reverberant speech for time delay estimation. Probleme: -> Getting the time Delay of a reverberant speech with severals microphone. ->Getting.
Pitch-synchronous overlap add (TD-PSOLA)
Experimental Results ■ Observations:  Overall detection accuracy increases as the length of observation window increases.  An observation window of 100.
Nico De Clercq Pieter Gijsenbergh.  Problem  Solutions  Single-channel approach  Multichannel approach  Our assignment Overview.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
Image Denoising Using Wavelets
Chapter 4: Baseband Pulse Transmission Digital Communication Systems 2012 R.Sokullu1/46 CHAPTER 4 BASEBAND PULSE TRANSMISSION.
Noise Reduction Two Stage Mel-Warped Weiner Filter Approach.
Speech Enhancement Using a Minimum Mean Square Error Short-Time Spectral Amplitude Estimation method.
CHAPTER 5 SIGNAL SPACE ANALYSIS
Chapter 11 Filter Design 11.1 Introduction 11.2 Lowpass Filters
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
Digital Audio Signal Processing Lecture-3 Noise Reduction
Presentation Outline Introduction Principals
RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
Suppression of Musical Noise Artifacts in Audio Noise Reduction by Adaptive 2D filtering Alexey Lukin AES Member Moscow State University, Moscow, Russia.
Power Spectral Estimation
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 20,
Performance of Digital Communications System
Feature Transformation and Normalization Present by Howard Reference : Springer Handbook of Speech Processing, 3.3 Environment Robustness (J. Droppo, A.
Feature Matching and Signal Recognition using Wavelet Analysis Dr. Robert Barsanti, Edwin Spencer, James Cares, Lucas Parobek.
语音与音频信号处理研究室 Speech and Audio Signal Processing Lab Multiplicative Update of AR gains in Codebook- driven Speech.
UNIT-IV. Introduction Speech signal is generated from a system. Generation is via excitation of system. Speech travels through various media. Nature of.
PART II: TRANSIENT SUPPRESSION. IntroductionIntroduction Cohen, Gannot and Talmon\11 2 Transient Interference Suppression Transient Interference Suppression.
Presented By: Shamil. C Roll no: 68 E.I Guided By: Asif Ali Lecturer in E.I.
PERFORMANCE OF A WAVELET-BASED RECEIVER FOR BPSK AND QPSK SIGNALS IN ADDITIVE WHITE GAUSSIAN NOISE CHANNELS Dr. Robert Barsanti, Timothy Smith, Robert.
UNIT-III Signal Transmission through Linear Systems
Introduction to Audio Watermarking Schemes N. Lazic and P
CS 591 S1 – Computational Audio
Digital Communications Chapter 13. Source Coding
Adaptive Filters Common filter design methods assume that the characteristics of the signal remain constant in time. However, when the signal characteristics.
Signal processing.
Speech Enhancement with Binaural Cues Derived from a Priori Codebook
Outline Introduction Signal, random variable, random process and spectra Analog modulation Analog to digital conversion Digital transmission through baseband.
The Chinese University of Hong Kong
Image Analysis Image Restoration.
ON THE ARCHITECTURE OF THE CDMA2000® VARIABLE-RATE MULTIMODE WIDEBAND (VMR-WB) SPEECH CODING STANDARD Milan Jelinek†, Redwan Salami‡, Sassan Ahmadi*, Bruno.
Two-Stage Mel-Warped Wiener Filter SNR-Dependent Waveform Processing
朝陽科技大學 資訊工程系 謝政勳 Application of GM(1,1) Model to Speech Enhancement and Voice Activity Detection 朝陽科技大學 資訊工程系 謝政勳
A Tutorial on Bayesian Speech Feature Enhancement
EE513 Audio Signals and Systems
Wiener Filtering: A linear estimation of clean signal from the noisy signal Using MMSE criterion.
Dealing with Acoustic Noise Part 1: Spectral Estimation
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

Speech Enhancement Summer 2009 Pham Van Tuan Electronic & Telecommunication Engineering Danang University of Technology

ACKNOWLEDGMENT 1. Signal Processing & Speech Communication Institute, Graz University of Technology, Austria Speech Communication 1,2 2. Dept. E.E./ESAT, K.U.Leuven Noise Reduction, Marc Moonen/Ann Spriet 3. Peter Vary, Digital Speech Transmission, 2008 4. Tuan V. Pham, Wavelet Analysis For Robust Speech Processing and Applications, 2008

Introduction Aims: Noise Types: Improvements in the intelligibility of speech to human listeners. Improvement in the quality of speech that make it more acceptable to human listeners. Modifications to the speech that lead to improved performance of automatic speech or speaker recognition systems. Modifications to the speech so that it may be encoded more effectively for storage or transmission. Noise Types: Additive acoustic noise Acoustic reverberation Convolutive channel effects Electrical interference Codec distortion

General Scheme The signal is firstly transformed into other domains to get a better presentation of the speech signal The noise level is estimated by noise estimation The noise component is removed out of the noisy speech signal by the gain function Based on different linear estimators and non-linear estimators, a gain function will be designed.

Noise Estimation Based on different linear estimators and non-linear estimators, a gain function will be designed The most difficult parts in the noise reduction algorithms Especially for non-stationary and non-white noise (whose characteristics change over time & over various frequency bands) Single-channel noise reduction methods can be divided : • Exploiting the periodicity of voiced speech. • Auditory model based systems. • Optimal linear estimators. • Statistical model based systems using optimal non-linear estimators. Due to the spectral overlap between speech and noise signals, the denoised speech signals obtained from single- channel methods exhibit more speech distortion However, this method shows low cost and small size

? Additive Noise Model Microphone signal is Goal: Estimate s[k] based on y[k] Applications: SE in conferencing, handsfree telephony, hearing aids, digital audio restoration, speech recognition, speech-based technology Will consider speech applications: s[k] = speech signal Can be stationary, non-stationary, narrowband, and broadband noise. Interference speakers is also considered. desired signal estimate noise signal(s) ? y[k] desired signal contribution noise contribution

Additive Noise Model Strictly speaking: the estimation of statistical quantities via time averaging is only admissible when the signal is stationary and ergodic. Signal chopped into `frames’ (e.g. 10..20msec), for each frame i a frequency domain representation is where the spectral components are short-time spectra of time domain signal frames (obtained from windowing technique using window w ) However, speech signal is an on/off (time-varying) signal, hence some frames have speech +noise, some frames have noise only, A speech detection algorithm or Voice Activity Detection (VAD) is needed to distinguish between these 2 types of frames (based on statistical features) .

SE in DFT domain

Observation Magnitude squared DFT coefs. of noisy, voiced speech sound and the estimated noise power spectral density (PSD) of the noisy speech.

Estimation Definition: () = average amplitude of noise spectrum Assumption: noise characteristics change slowly, hence estimate () by (long-time) averaging over (M) noise-only frames Estimate clean speech spectrum Si(), using Gain function Gi() of corrupted speech spectrum Yi() + estimated ():

Gain functions = most frequently used in practice Magnitude Subtraction Spectral Subtraction Wiener Estimation Maximum Likelihood Non-linear Estimation Ephraim-Malah Suppr. Rule = most frequently used in practice

Spectral Subtraction Magnitude Subtraction Signal model: Estimation of clean speech spectrum: PS: half-wave rectification

Spectral Subtraction Power Spectral Subtraction Signal model: Estimation of clean speech spectrum: PS: half-wave rectification

Wiener Filter Wiener Estimation Goal: find linear filter Gi() such that MSE is minimized Solution: ( how ?) Assume speech s[k] and noise n[k] are uncorrelated, then... PS: half-wave rectification (as done so far) <- cross-correlation in i-th frame <- auto-correlation in i-th frame

Generalized Formula Generalized magnitude squared spectral gain function Practical heuristic form of spectral subtraction rule:

Suppressing Behaviors

Interpretation Power Spectral Subtraction method is interpreted as a time- variant filter with magnitude frequency response: The short-time energy spectrum |Yi()|2 of noisy speech signal is calculated directly. The noise level ()2 is estimated by averaging over many non-speech frames where the background noise is assumed to be stationary. Negative values resulting from spectral subtraction are replaced by zero. This results into “musical noise”: a succession of randomly spaced spectral peaks emerges in the frequency bands -> the residual noise which is composed of narrow-band components located at random frequencies that turn on and off randomly in each short-time frame

magnitude subtraction

Solutions Flooring factor Over-subtraction factor SNR-dependent subtraction factor Averaging estimated noise level over K frames Reduce noise variance at each frequency: apply a simple recursive first-order low-pass filter (using smoothing coef p controlling bandwidth & time constant of the LP filter)

Solutions Solutions? Magnitude averaging: replace Yi() in calculation of Gi() by a local average over frames EMSR (p7) augment Gi() with soft-decision VAD: Gi()  P(H1 | Yi()). Gi() … instantaneous average probability that speech is present, given observation

MMSE Estimation Ephraim-Malah Suppression Rule (EMSR) with: modified Bessel functions previous frame

Wavelet Denoising Additive noise model in Wavelet domain: Hard-Thresholding Soft-Thresholding Shrinking

Hard Thresholding

Soft Thresholding

Optimal Shrinking

Noise Reduction for ASR

Short-time Stationary Processes Strictly speaking: the estimation of statistical quantities via time averaging is only admissible when the signal is stationary and ergodic.

Short-time Stationary Processes

Power Spectral Density