Presentation is loading. Please wait.

Presentation is loading. Please wait.

Language modeling for speaker recognition Dan Gillick January 20, 2004.

Similar presentations


Presentation on theme: "Language modeling for speaker recognition Dan Gillick January 20, 2004."— Presentation transcript:

1 Language modeling for speaker recognition Dan Gillick January 20, 2004

2 Dan Gillick (2)January 20, 2004Language modeling for speaker recognition Outline Author identification Trying to beat Doddington’s “idiolect” modeling strategy (speaker recognition) My next project

3 Dan Gillick (3)January 20, 2004Language modeling for speaker recognition Author ID (undergrad. thesis) Problem: –train models for each of k authors –given some test text written by 1 of those authors, identify the correct author Variations: –different kinds of models –different size test samples –different k

4 Dan Gillick (4)January 20, 2004Language modeling for speaker recognition Character n-gram models What? –27 tokens: a-z, –some text generated from such a trigram model: “you orthad gool of anythilly uncand or prafecaustiont and to hing that put ably”

5 Dan Gillick (5)January 20, 2004Language modeling for speaker recognition Character n-gram models Why? –very simple –data sparseness less troublesome than with word n-grams –supposed to be state-of-the-art or at least close to it (Khmelev, D, Tweedie, F.J. “Using Markov Chains for the Identification of Writers”: Literary and Linguistic Computing, 16(4): 299-307. 2001.)

6 Dan Gillick (6)January 20, 2004Language modeling for speaker recognition Character n-grams: Setup task: pick correct author from 10 possible authors training data: 3 novels for each author test data: text from a held-out novel jack-knifing: 4 novels for each of 20 authors

7 Dan Gillick (7)January 20, 2004Language modeling for speaker recognition Character n-grams: Results task: picking 1 author from 10 possible authors training data size: 3 novels

8 Dan Gillick (8)January 20, 2004Language modeling for speaker recognition Character n-gram models Why does it work? –captures some word choice information –picks up word endings (–ing, -tion, -ly, etc.) –not hurt much by data sparseness issues

9 Dan Gillick (9)January 20, 2004Language modeling for speaker recognition Key-list models Incentive: –ought to be able to beat character n-grams –develop a new modeling method more focused on that which differentiates between authors (characters and words are both useful for topic recognition, but that doesn’t mean they are best for author recognition)

10 Dan Gillick (10)January 20, 2004Language modeling for speaker recognition Key-list models Idea: –convert the text stream into a stream of only authorship-relevant symbols (I called these lists of symbols key-lists) –each symbol is a regular expression to allow for broad definitions (/*tion/ captures any nounification) –text not accounted for by the key-list is represented by,, or markers –build n-gram models from these new streams

11 Dan Gillick (11)January 20, 2004Language modeling for speaker recognition Key-list models sample trigram: Regular ExpressionDescription (\w)(,)(\s)comma (\w)(\.)(\s)period (\b)(of|for|to|around|after| … )(\b)common prepositions (\b)(was|were \w*ed(\b)passive voice (\b)(is|was|will|are|were|am)(\b)is conjugations (\b)(\w*ing)(\b)ends in –ing (\b)(\w*ly)(\b)adverb (\b)(and|but|or|not|if|then|else)(\b)logical (\b)(as)(\b)as (\b)(would|should|could)(\b)modal verbs Sample key-list:

12 Dan Gillick (12)January 20, 2004Language modeling for speaker recognition Key-list models: Results task: picking 1 author from 10 possible authors training data size: 3 novels

13 Dan Gillick (13)January 20, 2004Language modeling for speaker recognition Key-list models: Results Some other interesting results: –key-lists with just punctuation (as well as,, ) performed almost as well as the best key-lists –all key-lists were outperformed by the best n- letter model when test data size < 10,000 chars. but all key-list models eventually surpassed the n-letter models

14 Dan Gillick (14)January 20, 2004Language modeling for speaker recognition Key-list models Things I didn’t do: –vary amount of training data –spend a long time trying different key-lists –combine key-list results with each other or with the character results –a lot of other stuff The thesis is available on the web: http://www.dgillick.com/resource/thesis.pdf http://www.dgillick.com/resource/thesis.pdf

15 Dan Gillick (15)January 20, 2004Language modeling for speaker recognition Outline Author identification Trying to beat Doddington’s “idiolect” modeling strategy (speaker recognition) My next project

16 Dan Gillick (16)January 20, 2004Language modeling for speaker recognition G. Doddington’s LM strategy create LMs with a limited vocabulary of the most commonly occurring 2000 bigrams to smooth out zeroes, boost each bigram prob. by 0.001 score by calculating: logprob(test|target) – logprob(test|bkg) logprobs are joint probabilities logprob(AB) = logprob(A) + logprob(B|A)

17 Dan Gillick (17)January 20, 2004Language modeling for speaker recognition G. Doddington’s LM: Setup Switchboard 1 data: –collected in early ’90s from all over the US –2,400 (~5 min.) conversations among 543 speakers –corpus divided into 6 splits and tested using jack- knifing through the splits –manual transcripts provided by MS. State Task: –8 conversation sides used as training data to build models for each target speaker –1 conversation side used as test data –background model built from 3 splits of held-out data –jack-knifing allowed for almost 10,000 trials

18 Dan Gillick (18)January 20, 2004Language modeling for speaker recognition G. Doddington’s LM: Results Notes: –these results are my own attempt to replicate the original experiments –SRI reported EER = 8.65% for this same experiment

19 Dan Gillick (19)January 20, 2004Language modeling for speaker recognition Adapted bigram models Incentive: –adapting target models from a much larger background model should yield better estimates of probabilities in the language models Specifically: –use same 2000 bigram vocabulary –target probabilities are a mixture of training probabilities and background probabilities –mixture weight is 2:1 target data:bkg. data

20 Dan Gillick (20)January 20, 2004Language modeling for speaker recognition Adapted bigram models: Results Notes: –nearly identical performance –combination of the 2 systems yields almost no improvement –why isn’t the adapted version better?

21 Dan Gillick (21)January 20, 2004Language modeling for speaker recognition Can anything improve on 8.68? Trigrams? –use same count threshold to make a list of the top 700 trigrams (“a lot of”, “I don’t know” were among the most common) Character models? –worked well for authorship… –included all character combinations (no limited vocabulary) –tried bigram and trigram models

22 Dan Gillick (22)January 20, 2004Language modeling for speaker recognition Scores and combinations adapt. word bigrams EER = 8.89% adapt. word trigrams EER = 11.88% adapt.char. bigrams EER = 13.73% adapt. char. trigrams EER = 17.92% adapted words EER = 8.46% adapted words + adapted characters EER = 7.89% adapted characters EER = 13.24% GD bigrams EER = 8.68%

23 Dan Gillick (23)January 20, 2004Language modeling for speaker recognition Final Comparison

24 Dan Gillick (24)January 20, 2004Language modeling for speaker recognition What about less training data? 1 conversation-side training –character models might provide more of an advantage with less data? –not so. GD EER = 22.5% adapted character EER = 30% adapted word EER = 20% –maybe these character models pick up on the topic of that 1 conversation –haven’t tried any other size training data

25 Dan Gillick (25)January 20, 2004Language modeling for speaker recognition Outline Author identification Trying to beat GD’s result My next project

26 Dan Gillick (26)January 20, 2004Language modeling for speaker recognition Key-lists for speaker recognition key-list n-grams picked up on phrasing (comma and period were valuable tokens) –automatic transcripts don’t have punctuation but they do have pause and duration information use reg. exps. and duration info. to capture idiosynchratic speaker phrasing capture other speech information in key- lists? (energy, f0, etc.)

27 Dan Gillick (27)January 20, 2004Language modeling for speaker recognition Acknowledgements Thanks to: Anand and Luciana at SRI for trying to help me replicate their results Barbara for providing advice Barry and Kofi for helping with computers and stuff George


Download ppt "Language modeling for speaker recognition Dan Gillick January 20, 2004."

Similar presentations


Ads by Google