Speech Enhancement Using Spectral Subtraction

Slides:



Advertisements
Similar presentations
Figures for Chapter 7 Advanced signal processing Dillon (2001) Hearing Aids.
Advertisements

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Time-Frequency Analysis Analyzing sounds as a sequence of frames
KARAOKE FORMATION Pratik Bhanawat (10bec113) Gunjan Gupta Gunjan Gupta (10bec112)
Speech Enhancement through Noise Reduction By Yating & Kundan.
Image Processing Lecture 4
Advanced Speech Enhancement in Noisy Environments
AGC DSP AGC DSP Professor A G Constantinides©1 Modern Spectral Estimation Modern Spectral Estimation is based on a priori assumptions on the manner, the.
Page 0 of 34 MBE Vocoder. Page 1 of 34 Outline Introduction to vocoders MBE vocoder –MBE Parameters –Parameter estimation –Analysis and synthesis algorithm.
2004 COMP.DSP CONFERENCE Survey of Noise Reduction Techniques Maurice Givens.
Background Noise Definition: an unwanted sound or an unwanted perturbation to a wanted signal Examples: – Clicks from microphone synchronization – Ambient.
Reduction of Additive Noise in the Digital Processing of Speech Avner Halevy AMSC 664 Final Presentation May 2009 Dr. Radu Balan Department of Mathematics.
Speech Enhancement Based on a Combination of Spectral Subtraction and MMSE Log-STSA Estimator in Wavelet Domain LATSI laboratory, Department of Electronic,
Background Noise Definition: an unwanted sound or an unwanted perturbation to a wanted signal Examples: Clicks from microphone synchronization Ambient.
Advances in WP1 and WP2 Paris Meeting – 11 febr
Warped Linear Prediction Concept: Warp the spectrum to emulate human perception; then perform linear prediction on the result Approaches to warp the spectrum:
Despeckle Filtering in Medical Ultrasound Imaging
Introduction to Spectral Estimation
Normalization of the Speech Modulation Spectra for Robust Speech Recognition Xiong Xiao, Eng Siong Chng, and Haizhou Li Wen-Yi Chu Department of Computer.
FEATURE EXTRACTION FOR JAVA CHARACTER RECOGNITION Rudy Adipranata, Liliana, Meiliana Indrawijaya, Gregorius Satia Budhi Informatics Department, Petra Christian.
A VOICE ACTIVITY DETECTOR USING THE CHI-SQUARE TEST
EBB Chapter 2 SIGNALS AND SPECTRA Chapter Objectives: Basic signal properties (DC, RMS, dBm, and power); Fourier transform and spectra; Linear systems.
Nico De Clercq Pieter Gijsenbergh Noise reduction in hearing aids: Generalised Sidelobe Canceller.
DIGITAL IMAGE PROCESSING Instructors: Dr J. Shanbehzadeh M.Gholizadeh M.Gholizadeh
Page 0 of 23 MELP Vocoders Nima Moghadam SN#: Saeed Nari SN#: Supervisor Dr. Saameti April 2005 Sharif University of Technology.
Reduction of Additive Noise in the Digital Processing of Speech Avner Halevy AMSC 663 Mid Year Progress Report December 2008 Professor Radu Balan 1.
Pitch-synchronous overlap add (TD-PSOLA)
Nico De Clercq Pieter Gijsenbergh.  Problem  Solutions  Single-channel approach  Multichannel approach  Our assignment Overview.
The Physical Layer Lowest layer in Network Hierarchy. Physical transmission of data. –Various flavors Copper wire, fiber optic, etc... –Physical limits.
INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.
Authors: Sriram Ganapathy, Samuel Thomas, and Hynek Hermansky Temporal envelope compensation for robust phoneme recognition using modulation spectrum.
1 Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition Qi Li, Senior Member, IEEE, Jinsong Zheng, Augustine.
Chapter 6. Effect of Noise on Analog Communication Systems
ECE 5525 Osama Saraireh Fall 2005 Dr. Veton Kepuska
23 November Md. Tanvir Al Amin (Presenter) Anupam Bhattacharjee Department of Computer Science and Engineering,
Noise Reduction Two Stage Mel-Warped Weiner Filter Approach.
Ch5 Image Restoration CS446 Instructor: Nada ALZaben.
Study of Broadband Postbeamformer Interference Canceler Antenna Array Processor using Orthogonal Interference Beamformer Lal C. Godara and Presila Israt.
P. N. Kulkarni, P. C. Pandey, and D. S. Jangamashetti / DSP 2009, Santorini, 5-7 July DSP 2009 (Santorini, Greece. 5-7 July 2009), Session: S4P,
0 - 1 © 2007 Texas Instruments Inc, Content developed in partnership with Tel-Aviv University From MATLAB ® and Simulink ® to Real Time with TI DSPs Spectrum.
IIT Bombay ICSCI 2004, Hyderabad, India, Feb’ 04 Introduction Analysis / synthesis Spec. Sub. Methodology Results Conclusion and.
Voice Activity Detection based on OptimallyWeighted Combination of Multiple Features Yusuke Kida and Tatsuya Kawahara School of Informatics, Kyoto University,
ICASSP 2006 Robustness Techniques Survey ShihHsiang 2006.
EEL 6586: AUTOMATIC SPEECH PROCESSING Speech Features Lecture Mark D. Skowronski Computational Neuro-Engineering Lab University of Florida February 27,
Digital Audio Signal Processing Lecture-3 Noise Reduction
1 Introduction1 Introduction 2 Noise red. tech 3 Spect. Subtr. 4. QBNE 5 Invest. QBNE 6 Conc., & future work2 Noise red. tech 3 Spect. Subtr.4. QBNE5 Invest.
APPLICATION OF A WAVELET-BASED RECEIVER FOR THE COHERENT DETECTION OF FSK SIGNALS Dr. Robert Barsanti, Charles Lehman SSST March 2008, University of New.
A. R. Jayan, P. C. Pandey, EE Dept., IIT Bombay 1 Abstract Perception of speech under adverse listening conditions may be improved by processing it to.
COMMUNICATION SYSTEM EEEB453 Chapter 5 (Part III) DIGITAL TRANSMISSION Intan Shafinaz Mustafa Dept of Electrical Engineering Universiti Tenaga Nasional.
Signal Analyzers. Introduction In the first 14 chapters we discussed measurement techniques in the time domain, that is, measurement of parameters that.
Comparison of filters for burst detection M.-A. Bizouard on behalf of the LAL-Orsay group GWDAW 7 th IIAS-Kyoto 2002/12/19.
1 Introduction1 Introduction 2 Spectral subtraction 3 QBNE 4 Results 5 Conclusion, & future work2 Spectral subtraction 3 QBNE4 Results5 Conclusion, & future.
Detection of nerves in Ultrasound Images using edge detection techniques NIRANJAN TALLAPALLY.
Estimation of Doppler Spectrum Parameters Comparison between FFT-based processing and Adaptive Filtering Processing J. Figueras i Ventura 1, M. Pinsky.
Presented By: Shamil. C Roll no: 68 E.I Guided By: Asif Ali Lecturer in E.I.
[1] National Institute of Science & Technology Technical Seminar Presentation 2004 Suresh Chandra Martha National Institute of Science & Technology Audio.
Voice Activity Detection Based on Sequential Gaussian Mixture Model Zhan Shen, Jianguo Wei, Wenhuan Lu, Jianwu Dang Tianjin Key Laboratory of Cognitive.
Speech Enhancement Summer 2009
CS 591 S1 – Computational Audio
Digital Communications Chapter 13. Source Coding
Vocoders.
Speech Enhancement with Binaural Cues Derived from a Priori Codebook
1 Vocoders. 2 The Channel Vocoder (analyzer) : The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 HZ and 300.
Linear Predictive Coding Methods
朝陽科技大學 資訊工程系 謝政勳 Application of GM(1,1) Model to Speech Enhancement and Voice Activity Detection 朝陽科技大學 資訊工程系 謝政勳
Human Speech Perception and Feature Extraction
Correlation, Energy Spectral Density and Power Spectral Density
Presenter: Shih-Hsiang(士翔)
Combination of Feature and Channel Compensation (1/2)
Presentation transcript:

Speech Enhancement Using Spectral Subtraction Presentation by Sevakula Rahul Kumar (5464) under the guidance of Dr. Kishore Kumar

GOALS Study of Speech Enhancement using Spectral Subtractions. Simulating the algorithm’s effect of removal of noise using modifications of Magnitude averaging and Half Wave Rectification to reduce Spectral error . Study of how to implement it practically with hardware.

Assumptions : Background Noise added acoustically and digitally to the speech. The background noise environment remains locally stationary to the degree that its spectral magnitude expected value just prior to speech activity equals its expected value during speech activity

Theory Additive Noise Model: x(k) = s(k) + n(k) Taking the Fourier transform gives X (e jω) = S (e jω) + N (e jω) where x(k) ↔ X (ejω)

Spectral Subtractor Estimator: The spectral subtraction filter H(ejω) is calculated by replacing the noise spectrum N(ejω) with spectra which can be readily measured. The Magnitude |N(ejω)| of N(ejω) is replaced by its average value μ(ejω) taken during non-speech activity These substitutions result in the spectral subtraction estimator

Spectral Error: Modifications to reduce the auditory effects of the spectral error include: a) Magnitude averaging- Local averaging of spectral magnitudes can be used to reduce the error. Replacing |X(ejω)| with where M is the number of frames over which averaging is done where, The sample mean of |N (e jω)| will converge to μ(e jω) as a longer average is taken.

b) Half-wave rectification- Wherever the signal spectrum magnitude |X (e jω)| is less than the average noise spectrum magnitude μ(e jω), the output is set to zero. This modification can be simply implemented by half-wave rectifying H (e jω). The estimator then becomes The advantage of half-wave rectification is that the noise floor is reduced by μ(ejω The disadvantage of half-wave rectification can exhibit itself in the situation where the sum of the noise plus speech at a frequency ω is less than μ(e jω)

c) Residual noise reduction- In the absence of speech activity the difference NR = N - μe jθN , which shall be called the noise residual The residual noise reduction scheme is implemented as where, and max = maximum value of noise residual measured during non-speech activity.

d) Additional signal attenuation during non-speech activity- The energy content of relative to provides an accurate indicator of the presence of speech activity within a given analysis frame Empirically, it is determined that the average (before versus after) power ratio is down at least 12 dB. This implies a measure for detecting the absence of speech given by During the absence of speech activity there are at least three options prior to resynthesis: do nothing, attenuate the output by a fixed factor, or set the output to zero. Thus, the output spectral estimate including output attenuation during non-speech activity is given by

Algorithm Implementation Input-Output Data Buffering: Voice Activity Detection: To obtain the noise characteristics it is essential to find the pauses in speech activity. The method that is used here is to make decisions based on compound parameter using various parameters like energy, zero crossing rate and the normalized linear prediction error. In each speech frame the energy in the frame, E, the linear prediction error normalized with respect to the energy of the signal, LPE, and the zero crossing rate, ZCR, are calculated. In general, the frames that contain speech have more energy than those that do not contain speech. However, this method of distinguishing between frames fails at low SNR, where the noise energy is comparable to the signal energy. The zero crossing rate is evidently quite high for noise compared to speech. Using all these three parameters, a compound parameter, D, is calculated as Then the value of is used to determine whether a signal has speech activity or not. The threshold values for the input signal have to be obtained empirically. The frames are thus classified as speech and nonspeech frames.

Spectral Error Reduction: In this we apply Magnitude Averaging, Bias Estimation, Bias Removal and Half Wave Rectification, Residual Noise Reduction and Additional Noise Suppression during Non Speech Activity methods . Synthesis:

Suggestions for future work Conclusions It can be concluded that this method improves the intelligibility of noisy signals even at low SNR. However, the presence of musical noise is intolerable to the human auditory system. This necessitates the development of a better algorithm which is able to mask this musical noise Suggestions for future work Speech Enhancement using better algorithms like Signal Subspace Approach, Energy Constrained Signal Subspace Approach and Signal/Noise KLT Approach A problem which is common to the approaches to speech enhancement developed in this project and also in general is the non stationary behavior of the energy of the residual noise, i.e., the non uniformity of the residual noise from frame-to-frame.

Questions ???