Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced Speech Enhancement in Noisy Environments Qiming Zhu Supervisor: Prof. John Soraghan Centre for excellence in Signal and Image Processing Dept.

Similar presentations


Presentation on theme: "Advanced Speech Enhancement in Noisy Environments Qiming Zhu Supervisor: Prof. John Soraghan Centre for excellence in Signal and Image Processing Dept."— Presentation transcript:

1

2 Advanced Speech Enhancement in Noisy Environments Qiming Zhu Supervisor: Prof. John Soraghan Centre for excellence in Signal and Image Processing Dept Electronic and Electrical Engineering

3 Introduction Speech Enhancement – Improved Minima Controlled Recursive Averaging (IMCRA) Robust Voice Activity Detection (VAD) – 1-D Local Binary Pattern (LBP) – 1-D LBP of energy based VAD – Performance Evaluation Improved IMCRA – Performance Evaluation Discussion & Conclusion Presentation structure

4 Automatic speech recognition (ASR) – Speech recognition system aims to create intelligent machines that can ‘hear’, ‘understand’ and ‘comply’ to speech input. – Speech enhancement and VAD are applied as the integral parts in ASR system. Aim of current research – Improve the recognition system performance in babble noisy background. Introduction

5 IMCRA: IMCRA * Israel Cohen, ‘Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging.’ (IEEE Tran. On speech and audio, 2003) IMCRA Processing

6 IMCRA Performance – Clean Signal: – Noisy Signal at 0 dB: – Enhanced by IMCRA: IMCRA with babble

7 1-D LBP

8 1-D LBP calculate the LBP code after thresholding the neighbour samples. 1-D LBP code LBP code calculation for p=8 *Navin Chatlani et al, ‘Local binary patterns for 1-D signal processing’, (EUSIPCO 2010)

9 1-D LBP histogram Overview of 1-D LBP procedure on a histogram 1-D LBP perform the Histogram with the window data

10 1-D LBP of energy Short-time energy and the histogram Speech Signals and the Short-time Energy a) energy of clean speech signal, b) energy of noisy speech signal, c) histogram of clean speech energy, d) histogram of noisy speech energy.

11 1-D LBP of energy with offset value

12 System block diagram 1-D LBP of energy based VAD VAD block diagram

13 VAD performance

14

15 Experimental background – 198 samples from VoxForge database, includes 9 people: 6 males and 3 females. Sampling frequency at 16 kHz. – Babble noise from NOISEX-92 Database added at SNR from -10 dB to 10 dB. – Energy widow size set to be 5 ms, p=2, histogram size set to be 30 ms. – Segmental SNR and weighted spectrum slope (WSS) are used to compare the performance. Improved IMCRA *Klatt et al, ‘Prediction in perceived phonetic distance from critical band spectra’, IEEE Conference on Acoustics, 1982

16 Performance – Clean signal: – Noisy signal ( SNR at 0 dB): – IMCRA: – Improved IMCRA: Improved IMCRA with babble noise

17 Performance Improved IMCRA with babble noise Segmental SNR

18 Performance Improved IMCRA with babble noise Weighted spectrum slope

19 Conclusion for the results – 1-D LBP in energy domain can distinguish the voiced and unvoiced components of noisy speech signals. – LBP in energy domain is shown to be superior to the G.729 VAD and Navin’s LBP VAD. – Improved IMCRA is superior to IMCRA with enhanced segmental SNR and higher likelihood. Future work – Applied this algorithm as the pre-processing of a ASR system. Discussion

20 Thank Prof. John Soraghan for the idea of babble noise reduction. Thank Paul and Navin for the previous work on 1-D LBP. Acknowledge

21 Thank you! Any Question?


Download ppt "Advanced Speech Enhancement in Noisy Environments Qiming Zhu Supervisor: Prof. John Soraghan Centre for excellence in Signal and Image Processing Dept."

Similar presentations


Ads by Google