Presentation is loading. Please wait.

Presentation is loading. Please wait.

Speech Enhancement Summer 2009

Similar presentations


Presentation on theme: "Speech Enhancement Summer 2009"— Presentation transcript:

1 Speech Enhancement Summer 2009
Pham Van Tuan Electronic & Telecommunication Engineering Danang University of Technology

2 ACKNOWLEDGMENT 1. Signal Processing & Speech Communication Institute, Graz University of Technology, Austria Speech Communication 1,2 2. Dept. E.E./ESAT, K.U.Leuven Noise Reduction, Marc Moonen/Ann Spriet 3. Peter Vary, Digital Speech Transmission, 2008 4. Tuan V. Pham, Wavelet Analysis For Robust Speech Processing and Applications, 2008

3 Introduction Aims: Noise Types:
Improvements in the intelligibility of speech to human listeners. Improvement in the quality of speech that make it more acceptable to human listeners. Modifications to the speech that lead to improved performance of automatic speech or speaker recognition systems. Modifications to the speech so that it may be encoded more effectively for storage or transmission. Noise Types: Additive acoustic noise Acoustic reverberation Convolutive channel effects Electrical interference Codec distortion

4 General Scheme The signal is firstly transformed into other domains to get a better presentation of the speech signal The noise level is estimated by noise estimation The noise component is removed out of the noisy speech signal by the gain function Based on different linear estimators and non-linear estimators, a gain function will be designed.

5 Noise Estimation Based on different linear estimators and non-linear estimators, a gain function will be designed The most difficult parts in the noise reduction algorithms Especially for non-stationary and non-white noise (whose characteristics change over time & over various frequency bands) Single-channel noise reduction methods can be divided : • Exploiting the periodicity of voiced speech. • Auditory model based systems. • Optimal linear estimators. • Statistical model based systems using optimal non-linear estimators. Due to the spectral overlap between speech and noise signals, the denoised speech signals obtained from single- channel methods exhibit more speech distortion However, this method shows low cost and small size

6 ? Additive Noise Model Microphone signal is
Goal: Estimate s[k] based on y[k] Applications: SE in conferencing, handsfree telephony, hearing aids, digital audio restoration, speech recognition, speech-based technology Will consider speech applications: s[k] = speech signal Can be stationary, non-stationary, narrowband, and broadband noise. Interference speakers is also considered. desired signal estimate noise signal(s) ? y[k] desired signal contribution noise contribution

7 Additive Noise Model Strictly speaking: the estimation of statistical quantities via time averaging is only admissible when the signal is stationary and ergodic. Signal chopped into `frames’ (e.g msec), for each frame i a frequency domain representation is where the spectral components are short-time spectra of time domain signal frames (obtained from windowing technique using window w ) However, speech signal is an on/off (time-varying) signal, hence some frames have speech +noise, some frames have noise only, A speech detection algorithm or Voice Activity Detection (VAD) is needed to distinguish between these 2 types of frames (based on statistical features) .

8 SE in DFT domain

9 Observation Magnitude squared DFT coefs. of noisy, voiced speech sound and the estimated noise power spectral density (PSD) of the noisy speech.

10 Estimation Definition: () = average amplitude of noise spectrum
Assumption: noise characteristics change slowly, hence estimate () by (long-time) averaging over (M) noise-only frames Estimate clean speech spectrum Si(), using Gain function Gi() of corrupted speech spectrum Yi() + estimated ():

11 Gain functions = most frequently used in practice
Magnitude Subtraction Spectral Subtraction Wiener Estimation Maximum Likelihood Non-linear Estimation Ephraim-Malah Suppr. Rule = most frequently used in practice

12 Spectral Subtraction Magnitude Subtraction Signal model:
Estimation of clean speech spectrum: PS: half-wave rectification

13 Spectral Subtraction Power Spectral Subtraction Signal model:
Estimation of clean speech spectrum: PS: half-wave rectification

14 Wiener Filter Wiener Estimation Goal: find linear filter Gi() such that MSE is minimized Solution: ( how ?) Assume speech s[k] and noise n[k] are uncorrelated, then... PS: half-wave rectification (as done so far) <- cross-correlation in i-th frame <- auto-correlation in i-th frame

15 Generalized Formula Generalized magnitude squared spectral gain function Practical heuristic form of spectral subtraction rule:

16 Suppressing Behaviors

17 Interpretation Power Spectral Subtraction method is interpreted as a time- variant filter with magnitude frequency response: The short-time energy spectrum |Yi()|2 of noisy speech signal is calculated directly. The noise level ()2 is estimated by averaging over many non-speech frames where the background noise is assumed to be stationary. Negative values resulting from spectral subtraction are replaced by zero. This results into “musical noise”: a succession of randomly spaced spectral peaks emerges in the frequency bands -> the residual noise which is composed of narrow-band components located at random frequencies that turn on and off randomly in each short-time frame

18 magnitude subtraction

19 Solutions Flooring factor Over-subtraction factor
SNR-dependent subtraction factor Averaging estimated noise level over K frames Reduce noise variance at each frequency: apply a simple recursive first-order low-pass filter (using smoothing coef p controlling bandwidth & time constant of the LP filter)

20 Solutions Solutions? Magnitude averaging: replace Yi() in calculation of Gi() by a local average over frames EMSR (p7) augment Gi() with soft-decision VAD: Gi()  P(H1 | Yi()). Gi() instantaneous average probability that speech is present, given observation

21 MMSE Estimation Ephraim-Malah Suppression Rule (EMSR) with:
modified Bessel functions previous frame

22 Wavelet Denoising Additive noise model in Wavelet domain:
Hard-Thresholding Soft-Thresholding Shrinking

23 Hard Thresholding

24 Soft Thresholding

25 Optimal Shrinking

26 Noise Reduction for ASR

27

28

29

30 Short-time Stationary Processes
Strictly speaking: the estimation of statistical quantities via time averaging is only admissible when the signal is stationary and ergodic.

31 Short-time Stationary Processes

32 Power Spectral Density


Download ppt "Speech Enhancement Summer 2009"

Similar presentations


Ads by Google