Presentation is loading. Please wait.

Presentation is loading. Please wait.

INTRODUCTION  Sibilant speech is aperiodic.  the fricatives /s/, / ʃ /, /z/ and / Ʒ / and the affricatives /t ʃ / and /d Ʒ /  we present a sibilant.

Similar presentations


Presentation on theme: "INTRODUCTION  Sibilant speech is aperiodic.  the fricatives /s/, / ʃ /, /z/ and / Ʒ / and the affricatives /t ʃ / and /d Ʒ /  we present a sibilant."— Presentation transcript:

1

2

3 INTRODUCTION  Sibilant speech is aperiodic.  the fricatives /s/, / ʃ /, /z/ and / Ʒ / and the affricatives /t ʃ / and /d Ʒ /  we present a sibilant detection algorithm robust to high levels of noise

4 Gaussian for noisy speech signal  Xk,i = power  K = frequency  i = time-frame  µk,i = mean power

5 PSD for / ʃ /

6 Log-likelihood  µ k,N1 = µ k,N2 = a k  µ k,S = a k + b k

7 Maximizing the log-likelihood  74% of sibilant within 60 and 130 ms.  |t| < 30 ms high probability sibilant  |t| > 65 ms high probability outside the sibilant.  reduces contribution of the  transition region 30 ms < |t| < 65 ms

8 Maximizing the log-likelihood

9

10

11 Estimate noise and siblant

12 Estimated sibilant mean power

13 Maximum filter  W = 30

14 Normalization  To make the estimate independent of the overall speech level

15 Gaussian Mixture Model  For each frame has two Gaussian mix- ture models (GMMs):  one trained on non-sibilant speech  and the other on sibilant speech.

16 EXPERIMENTS  Filter for1.5 kHz to 8 kHz.  The weighting function used for three Hamming windows

17 GMMs  The input for the GMMs was a 14- component vector  containing the estimated sibilant power spectrum from  1.5 kHz to 8 kHz every 500 Hz

18 Result  White Gaussian noise was added to the speech files  it is more difficult to detect sibilants in white noise than in other typical stationary noise

19 Result  P miss = miss probability  P fa = false alarm probability

20 Result

21

22 CONCLUSIONS  we have presented a sibilant detection algorithm with noise  sibilant mean power estimation stage  likelihood ratio of two GMMs,  Test in TIMIT.  80% classification accuracy for positive SNRs.

23 For Future  it is possible that its classification accuracy could be further improved by applying temporal constraints to the classification decisions.

24  Thank you


Download ppt "INTRODUCTION  Sibilant speech is aperiodic.  the fricatives /s/, / ʃ /, /z/ and / Ʒ / and the affricatives /t ʃ / and /d Ʒ /  we present a sibilant."

Similar presentations


Ads by Google