Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.

Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008

1 Previous work Characteristic acoustic features Jitter and Shimmer (C. Müller et. al.) Phonetic cues (S. Schoetz) Cepstral coefficients Motivation and intuition behind this work Features such as cepstral coefficients characterize the exact content of the signal. Much of this Information is not useful for age/gender classification, e.g. we can identify age/gender from a speech in a foriegn language that we do not understand. Therefore, features which characterize slowly varying temporal envelope should be more advantageous.

2 Mel Cepstrum Modulation Spectrum features (V. Tyagi et. al.) n: time instant k: cepstral coefficient index q: Modulation freuency index P: Context Window Width (11 frames) Cepstrum Computation Input Speech P Frames n k MCMS Features

Experimental Setup. Task 7 Target classes: Children (<= 13 years) Young Male (>13, <=20 years) Young Female (>13, <=20 years) Adult Male (>20, <=65 years) Adult Female (>20, <=65 years) Senior Male (> 65 years) Senior Female (> 65 years)

4 Experimental Setup Dataset German SpeechDat Corpus 4000 Native German Speakers 80 speaker of each class were used for training, ~44 utterances each. 20 speakers of each class were used for testing. Data from different domain (VoiceClass, 660 utterances) was also used for testing. Total ~6000 utterances used for testing. A human-labelling experiment on a subset of test data yielded ~55% Overall classification accuracy.

5 Performance Both systems have equal dimension (21) of Feature vectors and hence same number Of parameters. Both systems are based on GMM (Gaussian Mixture Model) acoustic model and maximumLikelihood classifier.

6 Analysis Performance of MCMS features as function of duration and in/out-domain data. Classification accuracy saturates at 3 modulation frequencies (3-14 Hz) and starts dropping after 4 Modulation frequencies. This also explains why MFCC Features perform worse than MCMS features. Modulation Frequency response of first 3 MCMS filters. These 3 filters provide complimentary Information. For speech recognition, 7 filters (3-22 Hz) provide best performance.

7 Questions? Thank You.

Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.

Similar presentations

Presentation on theme: "Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008.

Similar presentations

Presentation on theme: "Age and Gender Classification using Modulation Cepstrum Jitendra Ajmera (presented by Christian Müller) Speaker Odyssey 2008."— Presentation transcript:

Similar presentations

About project

Feedback