Presentation is loading. Please wait.

Presentation is loading. Please wait.

IIT Bombay 17 th National Conference on Communications, 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech.

Similar presentations


Presentation on theme: "IIT Bombay 17 th National Conference on Communications, 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech."— Presentation transcript:

1

2 IIT Bombay 17 th National Conference on Communications, 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech Using Rate of Change of Spectral Moments A. R. Jayan P. S. Rajath Bhat P. C. Pandey {arjayan, rajathbhat, pcpandey}@ee.iitb.ac.in EE Dept, IIT Bombay 30 th January, 2011

3 IIT Bombay 17 th National Conference on Communications, 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3 2/21 PRESENTATION OUTLINE 1. Introduction  Speech landmarks  Landmark detection  Clear speech  Automated speech intelligibility enhancement 2. Methodology  Band energy parameters  Spectral moments  Rate of change function 3. Evaluation and results  VCV utterances  Sentences 4. Conclusion

4 IIT Bombay 17 th National Conference on Communications, 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3 3/21 1. INTRODUCTION Speech landmarks Regions, associated with spectral transitions, containing important information for speech perception Landmarks and related events [Park, 2008] Segment typeLandmarkDescription VowelVowel (V)Vowel nucleus GlideGlide (G)Slow formant transitions Consonant Glottis (g) Sonorant (s) Burst (b) Vocal fold vibration Nasal closure / release Turbulence noise

5 IIT Bombay 17 th National Conference on Communications, 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3 4/21 Landmark detection Processing  Extraction of parameters characterizing the landmark  Computation of the rate of change (ROC) of parameters  Locating the landmark using ROC(s) Applications  Intelligibility enhancement  Speech recognition  Vocal tract shape estimation

6 IIT Bombay 17 th National Conference on Communications, 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3 5/21 Clear speech  Speech produced with clear articulation when talking to a hearing- impaired listener, or in a noisy environment More intelligible for ▪ Hearing impaired listeners (~17% higher, Picheny et al.,1985) ▪ Listeners in noisy environments (Payton et al., 1994) ▪ Non-native listeners (Bradlow and Bent, 2002) ▪ Children with learning disabilities (Bradlow et al., 2003)  Pronounced acoustic landmarks

7 IIT Bombay 17 th National Conference on Communications, 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3 6/21 Conv. Clear Example: ‘The book tells a story’ (Recordings from http://www.acoustics.org/press/145th/clr-spch-tab.htm)

8 IIT Bombay 17 th National Conference on Communications, 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3 7/21 Automated speech intelligibility enhancement Automated detection of landmarks  High detection rate with low false detections  Good temporal accuracy (5-10 ms)  Computational efficiency Modification of speech characteristics Intensity / duration / spectral modifications around landmarks with minimal perceptual distortions of the acoustic cues in the speech signal

9 IIT Bombay 17 th National Conference on Communications, 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3 8/21 Problems in stop consonant perception  Transient sound with low intensity  Severely affected by noise / hearing impairment Stop landmarks :  Closure  Burst onset  Onset of voicing Example: /apa/

10 IIT Bombay 17 th National Conference on Communications, 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3 9/21 Some of the earlier landmark detection techniques  Liu (1996): Rate-of-rise measures of parameters from a set of fixed spectral bands (Speech recognition, g, s, b landmarks, 80 TIMIT sentences, detection rate: 84 % at 20-30 ms, 50 % at 5-10 ms)  Salomon et al. (2002): Temporal parameters related to periodicity, envelope, spectral fine structure (Speech recognition, onsets and offsets of vowels, sonorants, & consonants, 120 TIMIT sentences, detection rate: 90 % at 20 ms)  Sainath and Hazan (2006): Sinusoidal model parameters (Speech segmentation, 453 TIMIT sentences, word error rates: 20 % )  Niyogi & Sondhi (2002): Stop landmark detection using total energy, energy above 3 kHz & Wiener entropy (Speech recognition, stop consonants, 320 TIMIT sentences, detection rate: 90 % at 20 ms)  Jayan & Pandey (2009): Stop landmark detection using GMM parameters (Speech enhancement, 50 TIMIT sentences, detection rate: 73 % at 5 ms)

11 IIT Bombay 17 th National Conference on Communications, 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3 10/21 Improving landmark detection  Parameters ▪ Capturing spectral transitions ▪ Adaptation to speech variability  Rate of change measure ▪ Range of parameter variations ▪ Correlation among parameters  Adaptive time steps ▪ Small time step for abrupt variations ▪Large time step for slow variations Objective of the present investigation Detection of burst landmarks for automated intelligibility enhancement

12 IIT Bombay 17 th National Conference on Communications, 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3 11/21 2. METHODOLOGY Band energy parameters Log of spectral peaks in three bands ▪ b1: 1.2-2.0 kHz ▪ b2: 2.0-3.5 kHz ▪ b3: 3.5-5.0 kHz  Mag. spectrum (10 kHz sampling) computed using 512-point DFT, 6 ms Hanning window, 1 frame per ms, and smoothed by 20-point moving average.  Smoothed mag. spectrum X(n, k) used for calculating log of spectral peak in band i n = time index, k = frequency index

13 IIT Bombay 17 th National Conference on Communications, 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3 12/21 Example : Band energy parameters for /aga/ Time (ms) (a) Speech waveform (b) Band energy's

14 IIT Bombay 17 th National Conference on Communications, 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3 13/21 Spectral moments Normalized spectrum  Centroid : frequency of energy concentration n = time index, k = frequency index, N = DFT size  Variance : spread of energy around the centroid  Skewness : measure of spectral symmetry  Kurtosis : measure of spectral peakiness

15 IIT Bombay 17 th National Conference on Communications, 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3 14/21  Example : Band energy parameters & spectral moments for /aga/ Time (ms) (a) Waveform (b) (c) (d)

16 IIT Bombay 17 th National Conference on Communications, 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3 15/21 Measures of rate of change ● First difference based rate of change (ROC) K = time step ● Mahalanobis distance based rate of change (ROC-MD) A single measure indicative of the overall variation, taking care of parameter range and correlation effects y ( n ) = parameter set at time n K = time step  = covariance matrix, pre-calculated using the parameter set from segments with energy above a threshold

17 IIT Bombay 17 th National Conference on Communications, 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3 16/21 Detection of voicing offset and onset ▪ Band energy in 0-400 Hz ▪ ROC( n ) computed with time step 50 ms ▪ Voicing offset [g-] : ROC( n )  -12 dB ▪ Voicing onset [g+] : ROC( n )  +12 dB Burst onset landmark detection Most prominent peak in the ROC-MD( n ) between g- and g+ Example /aga/ (b) ROC-MD (c) ROC Time (ms) (a) Waveform

18 IIT Bombay 17 th National Conference on Communications, 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3 17/21 3. EVALUTATION & RESULTS Effects of rate of change functions & parameters on burst detection ROC and parameters 1 ) ROC(BE): Sum of normalized ROCs of [ E b1, E b2, E b3 ] 2 ) ROC-MD(BE): ROC-MD of [ E b1, E b2, E b3 ] 3 ) ROC-MD(SM): ROC-MD of [ F c, F , F k, F s ] 4 ) ROC-MD(BE,SM): ROC-MD of [E b1, E b2, E b3, F c, F , F k, F s ] Material: VCV utterances, TIMIT sentences Time steps: 3, 6 ms Temporal accuracies: 3, 5, 10, 15, 20 ms

19 IIT Bombay 17 th National Conference on Communications, 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3 18/21 VCV utterances ▪ 6 stop consonants ( b, d, g, p, t, k ) ▪ 3 vowel contexts ( a, i, u ) ▪ 10 speakers (5 M, 5 F) ▪ 180 tokens

20 IIT Bombay 17 th National Conference on Communications, 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3 19/21 TIMIT Sentences ▪ 5 speakers (2 M, 3 F) ▪ 10 sentences from each speaker ▪ 238 tokens Error type Insertion rates (%) ROC(BE)ROC-MD(BE)ROC-MD(SM)ROC-MD(BE,SM) Vowel / sem. vowel13111311 Frication511109 Glottal stops / clicks4334

21 IIT Bombay 17 th National Conference on Communications, 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3 20/21 4. CONCLUSION  Increase in time steps reduced detection accuracy.  Mahalanobis distance based ROC was more effective than first- difference based rate of change.  Spectral moments were useful as additional parameters in improving burst-onset detection.

22 IIT Bombay 17 th National Conference on Communications, 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3 21/21 Thank you


Download ppt "IIT Bombay 17 th National Conference on Communications, 28-30 Jan. 2011, Bangalore, India Sp Pr. 1, P3 1/21 Detection of Burst Onset Landmarks in Speech."

Similar presentations


Ads by Google