Advanced Speech Enhancement in Noisy Environments

Name: Advanced Speech Enhancement in Noisy Environments
Uploaded: 2017-07-13T11:53:27+00:00
Duration: PTM8S8
Channel: Colten Henson
Description: Advanced Speech Enhancement in Noisy Environments

Advanced Speech Enhancement in Noisy Environments
Qiming Zhu Supervisor: Prof. John Soraghan Centre for excellence in Signal and Image Processing Dept Electronic and Electrical Engineering

Presentation structure
Introduction Speech Enhancement Improved Minima Controlled Recursive Averaging (IMCRA) Robust Voice Activity Detection (VAD) 1-D Local Binary Pattern (LBP) 1-D LBP of energy based VAD Performance Evaluation Improved IMCRA Discussion & Conclusion

Introduction Automatic speech recognition (ASR)
Speech recognition system aims to create intelligent machines that can ‘hear’, ‘understand’ and ‘comply’ to speech input. Speech enhancement and VAD are applied as the integral parts in ASR system. Aim of current research Improve the recognition system performance in babble noisy background.

IMCRA IMCRA: IMCRA Processing
* Israel Cohen, ‘Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging.’ (IEEE Tran. On speech and audio, 2003)

IMCRA with babble IMCRA Performance Clean Signal:
Noisy Signal at 0 dB: Enhanced by IMCRA:

1-D LBP 2-D LBP 1-D LBP Extensively used in 2-D image processing
Used for 1-D signal processing (Navin Chatlani, EUSIPCO 2010, Qiming Zhu, EUSIPCO 2012) LBP Code Calculation: 𝑳𝑩𝑷 𝑷 𝒙 𝒊 = 𝒓=𝟎 𝑷 𝟐 −𝟏 𝑺 𝒙 𝒊+𝒓− 𝑷 𝟐 −𝒙 𝒊 𝟐 𝒓 +𝑺 𝒙 𝒊+𝒓+𝟏 −𝒙 𝒊 𝟐 𝒓+ 𝑷 𝟐 where P is the number of neighbouring samples used. The Sign function S[∙] is: 𝑺 𝒙 = 𝟏, 𝒇𝒐𝒓 𝒙≥𝟎 𝟎, 𝒇𝒐𝒓 𝒙<𝟎 On-set detection of Myoelectric signal (Paul McCool, EUSIPCO 2012)

LBP code calculation for p=8
1-D LBP code 1-D LBP calculate the LBP code after thresholding the neighbour samples. LBP code calculation for p=8 *Navin Chatlani et al, ‘Local binary patterns for 1-D signal processing’, (EUSIPCO 2010)

1-D LBP histogram The distribution of the LBP codes can perform a histogram to describe the continuous signal 𝑥 𝑖 with the window size of N: 𝑯 𝒃 = 𝑷 𝟐 ≤𝒊≤𝑵− 𝑷 𝟐 𝜹( 𝑳𝑩𝑷 𝑷 𝒙 𝒊 ,𝒃) where 𝑏=0,1,⋯,𝐵 and B is the number of histogram bins. δ i,j is Kronecker Delta function. 1-D LBP perform the Histogram with the window data Overview of 1-D LBP procedure on a histogram

Speech Signals and the Short-time Energy
1-D LBP of energy Short-time energy and the histogram Speech Signals and the Short-time Energy a) energy of clean speech signal, b) energy of noisy speech signal, c) histogram of clean speech energy, d) histogram of noisy speech energy.

1-D LBP of energy with offset value
LBP code with offset values 𝜶 𝐿𝐵𝑃 𝑃 ′ 𝐸 𝑚 = 𝑟=0 𝑃 2 −1 𝑆 𝐸 𝑚+𝑟− 𝑃 2 −𝐸 𝑚 −𝛼 2 𝑟 +𝑆 𝐸 𝑚+𝑟+1 −𝐸 𝑚 −𝛼 2 𝑟+ 𝑃 2 𝑯 𝟎 of the Energy with Different offset value 𝛂 a) 𝐸 𝑚 of noisy signal, b) 𝐻 0 with 𝛼=0.01, c) 𝐻 0 with 𝛼=0.02, d) 𝐻 0 with 𝛼=0.03, e) 𝐻 0 with 𝛼=0.04, f) 𝐻 0 with 𝛼=0.05

1-D LBP of energy based VAD
System block diagram VAD block diagram

VAD performance Experimental background
Test speech sampling frequency is 16 kHz.The total length of the test set used is 73 seconds. Mixed with babble noise from 0-20 dB. 𝛼 set to be 0.03. VAD 1: 1-D LBP of energy based VAD. VAD 0: VAD proposed by Navin Chatlani. G.729: G.729 B Standard VAD. HR0: Speech absence hit-rate: 𝑯𝑹 𝟎 = 𝑵 𝟎,𝟎 𝑵 𝟎 𝒓𝒆𝒇 FAR0: Speech absence false alarm rate: 𝑭𝑨𝑹 𝟎 =𝟏− 𝑵 𝟏,𝟏 𝑵 𝟏 𝒓𝒆𝒇

VAD performance VAD performance VAD performance

Improved IMCRA Experimental background
198 samples from VoxForge database, includes 9 people: 6 males and 3 females. Sampling frequency at 16 kHz. Babble noise from NOISEX-92 Database added at SNR from -10 dB to 10 dB. Energy widow size set to be 5 ms, p=2, histogram size set to be 30 ms. Segmental SNR and weighted spectrum slope (WSS) are used to compare the performance. *Klatt et al, ‘Prediction in perceived phonetic distance from critical band spectra’, IEEE Conference on Acoustics, 1982

Improved IMCRA with babble noise
Performance Clean signal: Noisy signal ( SNR at 0 dB): IMCRA: Improved IMCRA:

Performance Segmental SNR

Performance Weighted spectrum slope

Discussion Conclusion for the results Future work
1-D LBP in energy domain can distinguish the voiced and unvoiced components of noisy speech signals. LBP in energy domain is shown to be superior to the G.729 VAD and Navin’s LBP VAD. Improved IMCRA is superior to IMCRA with enhanced segmental SNR and higher likelihood. Future work Applied this algorithm as the pre-processing of a ASR system.

Acknowledge Thank Prof. John Soraghan for the idea of babble noise reduction. Thank Paul and Navin for the previous work on 1-D LBP.

Thank you! Any Question?

Advanced Speech Enhancement in Noisy Environments

Similar presentations

Presentation on theme: "Advanced Speech Enhancement in Noisy Environments"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Advanced Speech Enhancement in Noisy Environments

Similar presentations

Presentation on theme: "Advanced Speech Enhancement in Noisy Environments"— Presentation transcript:

Similar presentations

About project

Feedback