Download presentation

Presentation is loading. Please wait.

Published byKatelynn Lascelles Modified about 1 year ago

1
Acoustic Vector Re- sampling for GMMSVM- Based Speaker Verification Man-Wai MAK and Wei RAO The Hong Kong Polytechnic University

2
2 Outline GMM-UBM for Speaker Verification GMM-SVM for Speaker Verification Data-Imbalance Problem in GMM-SVM Utterance Partitioning for GMM-SVM Experiments on NIST SRE

3
3 Speaker Verification To verify the identify of a claimant based on his/her own voices Is this Mary’s voice? I am Mary

4
4 Feature Extraction John’s Model Impostor Model Score Normalization and Decision Making + _ Decision Threshold Accept/Reject John’s “Voiceprint” Impostors “Voiceprints” I’m John Scores Verification Process

5
5 Acoustic Features Speech is a continuous evolution of the vocal tract Need to extract a sequence of spectra or sequence of spectral coefficients Use a sliding window - 25 ms window, 10 ms shift DCT Log|X(ω)| MFCC

6
6 GMM-UBM for Speaker Verification The acoustic vectors (MFCC) of speaker s is modeled by a prob. density function parameterized by Gaussian mixture model (GMM) for speaker s:

7
7 The acoustic vectors of a general population is modeled by another GMM called the universal background model (UBM): Parameters of the UBM GMM-UBM for Speaker Verification

8
8 Client Speaker Model Universal Background Model MAP Enrollment Utterance (X (s) ) of Client Speaker GMM-UBM for Speaker Verification

9
9 2-class Hypothesis problem: H0: MFCC sequence X (c) comes from to the true speaker H1: MFCC sequence X (c) comes from an impostor Verification score is a likelihood ratio: Feature extraction Background Model Decision + − Score Speaker Model GMM-UBM Scoring

10
10 Outline GMM-UBM for Speaker Verification GMM-SVM for Speaker Verification Data-Imbalance Problem in GMM-SVM Acoustic Vector Resampling for GMM-SVM Results on NIST SRE

11
11 UBM Feature Extraction Mean Stacking MAP Adaptation GMMsupervector Mapping GMM-SVM for Speaker Verification

12
12 UBM Feature Extraction Feature Extraction Compute GMM- Supervector of Target Speaker s Compute GMM- Supervectors of Background Speakers Feature Extraction UBM Compute GMM- Supervector of Claimant c GMM-SVM Scoring SVM Scoring …

13
13 GMM-UBM Scoring Vs. GMM-SVM Scoring GMM-UBM: GMM-SVM: Normalized GMM- supervector of claimant’s utterance Normalized GMM- supervector of target- speaker’s utterance

14
14 Outline GMM-UBM for Speaker Verification GMM-SVM for Speaker Verification Data-Imbalance Problem in GMM-SVM Utterance Partitioning for GMM-SVM Results on NIST SRE

15
15 l For each target speaker, we only have one utterance (GMM- supervector) from the target speaker and many utterances from the background speakers. l So, we have a highly imbalance learning problem. Only one training vector from the target speaker Data Imbalance in GMM-SVM

16
16 Orientation of the decision boundary depends mainly on impostor- class data Data Imbalance in GMM-SVM

17
17 A 3-dim two-class problem illustrating the problem that the SVM decision plane is largely governed by the impostor-class supervectors. Impostor Class Speaker Class Region for which the target-speaker vector can be located without changing the orientation of the decision plane Data Imbalance in GMM-SVM

18
18 Outline GMM-UBM for Speaker Verification GMM-SVM for Speaker Verification Data-Imbalance Problem in GMM-SVM Utterance Partitioning for GMM-SVM Results on NIST SRE

19
19 Partition an enrollment utterance of a target speaker into number of sub-utterances, with each sub-utterance producing one GMM-supervector. Utterance Partitioning

20
20 Target-speaker’s Enrollment Utterance Feature Extraction Background-speakers’ Utterances Feature Extraction MAP Adaptation and Mean Stacking SVM Training UBM SVM of Target Speaker s Utterance Partitioning

21
21 Length-Representation Trade-off When the number of partitions increases, the length of sub- utterance decreases. If the utterance-length is too short, the supervectors of the sub-utterances will be almost the same as that of the UBM Supervector corresponding to the UBM

22
22 1.Randomly rearrange the sequence of acoustic vectors in an utterance; 2.Partition the acoustic vectors of an utterance into N segments; 3.If Step 1 and Step 2 are repeated R times, we obtain RN+1 target- speaker’s supervectors. Utterance Partitioning with Acoustic Vector Resampling (UP-AVR) Procedure of UP-AVR: Goal: Increase the number of sub-utterances without compromising their representation power MFCC seq. before randomization MFCC seq. after randomization

23
23 Utterance Partitioning with Acoustic Vector Resampling (UP-AVR) )( 4 )( 0 )( 4 )( 0,,,, 1 B bb ss mm mm )( utt B b Target-speaker’s EnrollmentUtterance FeatureExtractionand Index Randomization Background-speakers’Utterances (s) 0 X 1 X 2 X 4 X 3 X )(b 0 1 X ) 2 1 X ) 1 1 X ) 4 1 X ) 3 1 X ) 0 2 X ) 2 2 X ) 1 2 X ) 4 2 X ) 3 2 X ) 0 B X ) 2 B X ) 1 B X ) 4 B X ) 3 B X MAP Adaptation and Mean Stacking SVMTraining (s) 4 0,,XX UBM )( 1 utt b )( 2 b (s) utt SVM of Target Speaker s FeatureExtractionand Index Randomization

24
24 Utterance Partitioning with Acoustic Vector Resampling (UP-AVR) Characteristics of supervectors created by UP-AVR Average pairwise distance between sub-utt SVs is larger than the average pairwise distance between sub-utt SVs and full-utt SV. Average pairwise distance between speaker-class’s sub-utt SVs and impostor-class’s SVs is smaller than the average pairwise distance between speaker-class’s full-utt SV and impostor-class’s SVs. Imposter-class Speaker-class Sub-utt supervector Full-utt supervector

25
25 Nuisance Attribute Project (NAP) [Solomonoff et al., ICASSP2005] Nuisance Attribute Projection Sub-space representing session variability. Defined by V Recall the GMM-supervector kernel: Define the session- and speaker-dependent supervector as Remove the session-dependent part (h) by removing the sub-space that causes the session variability: The New kernel becomes Goal: To reduce the effect of session variability

26
26 Nuisance Attribute Project (NAP) [Solomonoff et al., ICASSP2005] Nuisance Attribute Projection Sub-space representing session variability. Defined by V

27
27 Enrollment Process of GMM-SVM with UP-AVR MFCCs of an utterance from target-speaker s MAP and Mean Stacking NAP Session- dependent supervectors Session- independent supervectors SVM Training UBM Resampling/ Partitioning SVM of target- speaker s

28
28 Verification Process of GMM-SVM with UP-AVR MFCCs of a test utterance from claimant c MAP and Mean Stacking NAP Session- dependent supervector Session- independent supervector SVM ScoringT-Norm Normalized score score UBM Tnorm Models SVM of target- speaker s

29
29 T-Norm (Auckenthaler, 2000) SVM Scoring T-Norm SVM 1 SVM Scoring T-Norm SVM R Compute Mean and Standard Deviation Z-norm from test utterance Goal: To shift and scale the verification scores so that a global decision threshold can be used for all speakers

30
30 Outline GMM-UBM for Speaker Verification GMM-SVM for Speaker Verification Data-Imbalance Problem in GMM-SVM Utterance Partitioning for GMM-SVM Experiments on NIST SRE

31
31 l Evaluations on NIST SRE 2002 and 2004 l NIST SRE 2002: Use NIST’01 for computing the UBMs, impostor-class supervectors of SVMs, Tnorm models, and NAP parameters 2983 true-speaker trials and impostor attempts 2-min utterances for training and about 1-min utt for test l NIST SRE 2004: Use the Fisher corpus for computing UBMs, impostor-class supervectors of SVMs, and Tnorm models NIST’99 and NIST’00 for computing NAP parameters 2386 true-speaker trials and impostor attempts 5-min utterances for training and testing Experiments Speech Data

32
32 l 12 MFCC + 12 ΔMFCC with feature warping l 1024-mixture GMMs for GMM-UBM l 256-mixture GMMs for GMM-SVM l MAP relevance factor = 16 l 300 impostor-class supervectors for GMM-SVM l 200 T-norm models l 64-dim session variability subspace (NAP corank, rank of V) Experiments Features and Models

33
33 No. of mixtures in GMM-SVM (NIST’02) Results Normalized Large number of features with small variance Threshold below which the variances of feature are deemed too small

34
34 Effects of NAP on Different NIST SRE Results Large eigenvalues mean large session variation

35
35 Effect of NAP Corank on Performance Results No NAP

36
36 Results Fig.4: Scores produced by SVMs that use one or more speaker- class supervectors (SVs) and 250 background SVs for training. The horizontal axis represents the training/testing SVs. Values inside the squared brackets are the mean difference between speaker scores and impostor scores. Comparing discriminative power of GMM-SVM and GMM- SVM with UP-AVR

37
37 Results EER and MinDCF vs. No. of Target-Speaker Supervectors NIST’02

38
38 Results Varying the number of resampling (R) and number of partitions (N) NIST’02

39
39 Table1: NIST’04 Results NIST’02

40
40 Performance on NIST’02 EER=9.05% EER=9.39% EER=8.16% Experiments and Results

41
41 EER=9.46% EER=10.42% EER=16.05% Performance on NIST’04 Experiments and Results GMM-UBM GMM-SVM GMM-SVM w/ UP- AVR

42
42 1.S.X. Zhang and M.W. Mak "Optimized Discriminative Kernel for SVM Scoring and its Application to Speaker Verification", IEEE Trans. on Neural Networks, to appear. 2.M.W. Mak and W. Rao, "Utterance Partitioning with Acoustic Vector Resampling for GMM-SVM Speaker Verification", Speech Communication, vol. 53 (1), Jan. 2011, Pages M.W. Mak and W. Rao, "Acoustic Vector Resampling for GMMSVM-Based Speaker Verification, Interspeech Sept. 2010, Makuhari, Japan, pp S.Y. Kung, M.W. Mak, and S.H. Lin. Biometric Authentication: A Machine Learning Approach, Prentice Hall, W. M. Campbell, D. E. Sturim, and D. A. Reynolds, “Support vector machines using GMM supervectors for speaker verification,” IEEE Signal Processing Letters, vol. 13, pp. 308–311, D. A. Reynolds, T. F. Quatieri, and R. B. Dunn, “Speaker verification using adapted Gaussian mixture models,” Digital Signal Processing, vol. 10, pp. 19– 41, References

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google