Presentation is loading. Please wait.

Presentation is loading. Please wait.

1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter.

Similar presentations


Presentation on theme: "1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter."— Presentation transcript:

1 1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter

2 Outline Introduction: What are wavelets/phonemes Problem specification Motivation Experimental Setup Wavelet-based feature extractor architecture Results Conclusions References

3 What are Wavelets The wavelet is a well localised function both in the time and frequency domains Alternative proposed to overcome the resolution problem of the STFT for analyzing nonstationary signals Uses a constant-Q analysis to represent the signal in a time-scale plane Showed potential in applications of speech recognition such as speech analysis, pitch detection, and speech compression

4 What are Wavelets (2) Daubechies 4-tap filter Wavelet equation Scaling equation

5 What are Wavelets (3) P. P. Vaidyanathan, “ Lossless systems in wavelet transforms ”. IEEE International Symposium on Circuits and Systems, 1991. Discrete time Wavelet transforms and magnitude responses of wavelet filters at 3 different scales

6 What are Phonemes Phonemes are the smallest units in the sound system of a language that allows distinguishing between the meanings of words Phonemes Categories: 1.Vowels are produced with periodic excitation and are thus characterized by resonance frequencies (200Hz-3500Hz) 2.Fricatives are generated due to turbulence at narrow constriction and are characterized by a noisy broad- spectrum 3.Plosives are produced by a complete closure of the vocal tract followed by its sudden release. Spectral content is usually weak in energy

7 Problem Specification Mel-frequency cepstral coefficients are the most widely speech features in the problem of speech recognition The mel-scaled filterbank is a series of triangular BPF designed to simulate the human auditory system

8 Problem Specification (2) In this work we attempt to extract features based on a wavelet analysis making use of the flexibility that it provides in manipulating time versus frequency resolution in order to design the appropriate classifiers for the different types of signals that we have.

9 Problem Specification (3) Perform phoneme recognition among three classes: Perform phoneme recognition within each category 1.Vowels ‘ae’/bat ‘aa’/ Bob ‘iy’/beat ‘uw’/boot 2.Fricatives ‘sh’/she ‘v’/vowel ‘s’/see ‘dh’/thee 3.Plosives Stops ‘b’/bob ‘p’/poop ‘d’/dot ‘k’/cot

10 Motivation Sample vowels spectrograms Low- Frequency Formants

11 Motivation (2) Sample fricatives spectrograms Strong High- Frequency Content

12 Motivation (3) Sample plosives spectrograms Weak Overall Frequency Content

13 The Experimental Setup Timit speech database Speech signals sampled at 16khz Phonemes extracted from 200 training utterances and 150 test utterances Phoneme classTraining DataTest Data Vowels322299 Stops370381 Fricatives396347

14 The Experimental Setup (2) VowelsTraining DataTest Data ‘ae’10086 ‘iy’100 ‘aa’10093 ‘uw’2220 FricsTraining DataTest Data ‘sh’9652 ‘v’‘v’10097 ‘s’‘s’100 ‘dh’10098 StopsTraining DataTest Data ‘b’‘b’7081 ‘p’‘p’100 ‘d’‘d’ ‘k’‘k’

15 The Experimental Setup (3) Features Extracted: 1.13 dimensional MFCC vectors 2. Variable dimensions Wavelet-DCT vectors depending on the phoneme class ML and MAP classifiers used with Gaussian Mixture Models where Mixture=4

16 Previous Work Mel wavelet cepstral coefficients Applying wavelet analysis to speech segmentation and classification Mel-scaled discrete wavelet coefficients Applying sampled continuous wavelet transform in phoneme recognition Symmetric octave filter bank

17 A Basic Feature Extractor Architecture Provides us with three degrees of freedom: The wavelet type The fractional moment k The decomposition

18 The Vowel-Fricative/Stop Feature Extraction The wavelet type: ‘sym4’. k=1 The decomposition:

19 The Plosive-Fricative Feature Extractor The wavelet type: ‘haar’ k=1 The decomposition:

20 The Vowel Feature Extractor The wavelet type: ‘sym4’ k=1 The decomposition:

21 The Fricative Feature Extractor The wavelet type: ‘sym6’ k=1 The decomposition:

22 The Plosive Feature Extractor The wavelet type: ‘haar’ k=0.85 The decomposition:

23 The Complete Classifier Architecture

24 Preliminaries: Consistency

25 Preliminaries: Discrimination v s s vv v s s v

26 Preliminaries: Behavior

27 Results for Vowels Wavelet-DCT MFCC Maximum LikelihoodMaximum A-Priori aeiyaauw ae704111 iy19603 aa110784 uw12215 86.622 % aeiyaauw ae73580 iy79210 aa100821 uw22511 86.288 % aeiyaauw ae68774 iy99402 aa70824 uw11513 85.953 % aeiyaauw ae671090 iy59311 aa130773 uw11711 82.943 %

28 Results for Fricatives Wavelet-DCT MFCC Maximum LikelihoodMaximum A-Priori shvsdh sh42190 v075121 s202780 dh016082 79.827 % shvsdh sh42091 v064231 s211771 dh016181 76.081 % shvsdh sh420100 v068029 s200800 dh034163 72.911 % shvsdh sh44170 v077020 s170821 dh126071 78.963 %

29 Results for Plosives Wavelet-DCT MFCC Maximum LikelihoodMaximum A-Priori bpdk b5010813 p11541223 d 75218 k5271751 54.331 % bpdk b501399 p1057924 d19115317 k3371446 54.068 % bpdk b4615182 p1164817 d2395216 k4281751 55.906 % bpdk b4817115 p6681016 d23115313 k2271655 52.000 %

30 Results of Category Classification Wavelets perform considerably better than MFCC in discriminating between vowels on one side and fricatives (95% vs. 90%) or plosives (98% vs. 95%) on the other. For classifying between fricatives and plosives, wavelets fall only marginally behind MFCC (90% vs. 91%).

31 Conclusions and Future Work The results obtained from the wavelet-based feature extraction are quite promising. Designing specific wavelets that would be optimized for the task at hand. Consider an algorithm that would select the optimum decomposition for a family of signals. Incorporating confidence scoring. Further investigation into the fractional moments.

32 References G. Strang and T. Nguyen, Wavelets and Filter Banks. Wellesley-Cambridge Press, 1997. P. P. Vaidyanathan, “Lossless systems in wavelet transforms”. IEEE International Symposium on Circuits and Systems, 1991. K. Kim, D. H. Youn and C. Lee “Evaluation of wavelet filters for speech recognition”. IEEE International Conference on Systems, Man, and Cybernetics, 2000, vol. 4, pp. 2891-2894. 2000. Z. Tufekci and J. N. Gowdy, “Feature extraction using discrete wavelet transform for speech recognition”. Proceedings of the IEEE Southeastcon 2000, pp. 116-123. 2000. B. T. Tan, M. Fu and A. Spray “The use of wavelet transforms in phoneme recognition”. Proceedings of the Fourth International Conference on Spoken Language, ICSLP 96, vol. 4, pp. 2431-2434. Oct 3-6, 1996. B. T. Tan, R. Lang, H. Schroder, A. Spray, and P. Dermody. "Applying wavelet analysis to speech segmentation and classification." Wavelet Applications, Harold H. Szu, Editor, Proc. SPIE 2242, pp. 750-761, 1994.


Download ppt "1.130 -Wavelets, Filter Banks and Applications Wavelet-Based Feature Extraction for Phoneme Recognition and Classification Ghinwa Choueiter."

Similar presentations


Ads by Google