Presentation is loading. Please wait.

Presentation is loading. Please wait.

Modeling Prosodic Sequences with K-Means and Dirichlet Process GMMs Andrew Rosenberg Queens College / CUNY Interspeech 2013 August 26, 2013.

Similar presentations


Presentation on theme: "Modeling Prosodic Sequences with K-Means and Dirichlet Process GMMs Andrew Rosenberg Queens College / CUNY Interspeech 2013 August 26, 2013."— Presentation transcript:

1

2 Modeling Prosodic Sequences with K-Means and Dirichlet Process GMMs Andrew Rosenberg Queens College / CUNY Interspeech 2013 August 26, 2013

3 Prosody Prosody – Pitch, Intensity, Rhythm, Silence Prosody carries information about a speaker’s intent and identity. Here: prosodic recognition of Speaking Style Nativeness Speaker 8/26/13 1

4 Approach Unsupervised clustering of acoustic/prosodic features. Sequence modeling of cluster identities 8/26/13 2

5 K-Means K-means is a simple distance based clustering algorithm. Iterative, non-deterministic (sensitive to initialization) Must specify K. We evaluate K between 2 and 100. Optimal value from cross-validation for each task 8/26/13 3

6 Dirichlet Process GMMs Non-parametric infinite mixture model need a prior of π – the dirichlet process and a prior over N – a zero mean gaussian still need to set hyper parameters α and G 0 Stick-breaking & Chinese Restaurant metaphors Blei and Jordan 2005 Variational Inference “Rich get Richer” 8/26/13 4 Plate notation from M. Jordan 2005 NIPS tutorial

7 DPGMM “Rich get Richer” 8/26/13 5 Artificially omit the largest cluster α = 0. 25

8 Prosodic Event Distribution ToBI Prosodic Labels Pitch Accents, Phrase Accent/Boundary Tones 8/26/13 6 Accent Type Distribution Phrase Ending Distribution

9 Sequence Modeling SRILM 3-gram model Backoff & GT smoothing Clusters learned over all material Sequence models trained over train sets 8/26/13 7

10 Experiments Speaking Style, Nativeness, Speaker Recognition Evaluation 500 samples between 10-100 syllables (~2-20 seconds) ToBI, K-Means, DPGMM, DPGMM’ (removing the largest cluster) 5 fold Cross-validation to learn hyperparameters Classification Train one SRILM model per class. Classify by lowest perplexity Outlier Detection Train a single model. Classifier learns a perplexity threshold 8/26/13 8

11 Data Boston Directions Corpus READ, SPONTANEOUS 4 speakers (used for Speaker Classification) Boston University Radio News Corpus BROADCAST NEWS 6 speakers Columbia Games Corpus SPONTANEOUS DIALOG 13 speakers Native Mandarin Chinese Speakers reading BURNC stories. 4 speakers All ToBI Labeled 8/26/13 9

12 Features Villing (2004) pseudosyllabification Syllables with mean intensity below 10dB are considered “silent” 7 Features Mean range normalized intensity Mean range normalized delta intensity Mean z-score normalized log f0 Mean z-score normalized delta log f0 Syllable duration Duration of previous silence (if any) Duration of following silence (if any) 8/26/13 10

13 Consistency with ToBI labels V-Measure between ToBI Accent Types and clusters ToBI Intonational Phrase-ending Tones and clusters K-means, solid line DPGMM, gray line for reference (doesn’t vary by more than 0.001) 8/26/13 11 AccentingPhrasing

14 Speaking Style Recognition 4 styles: READ, SPON, BN, DIALOG Single speaker for evaluation. 8/26/13 12 Classification Outlier Detection - Dialog

15 Nativeness Recognition Native (BURNC) vs. Non-Native Single speaker for evaluation. 8/26/13 13 Classification Outlier Detection - Native

16 Speaker Recognition 4 BDC Speakers 6 tasks for training, 3 for testing 8/26/13 14 Classification Outlier Detection 6 BURNC Speakers Detect f2b vs. others

17 Conclusions K-means works well to represent prosodic information DPGMM does not work so well out-of-the-box. Despite being non-parametric, hyperparameter setting is still critically important Future Work Larger acoustic/prosodic feature set. requires pre-processing Evaluating the universality of prosodic representations Integration of K-means and DPGMM. Use one to seed the other. 8/26/13 15

18 Thank you andrew@cs.qc.cuny.edu http://speech.cs.qc.cuny.edu 8/26/13 16


Download ppt "Modeling Prosodic Sequences with K-Means and Dirichlet Process GMMs Andrew Rosenberg Queens College / CUNY Interspeech 2013 August 26, 2013."

Similar presentations


Ads by Google