Presentation is loading. Please wait.

Presentation is loading. Please wait.

Regularized Adaptation for Discriminative Classifiers Xiao Li and Jeff Bilmes University of Washington, Seattle.

Similar presentations


Presentation on theme: "Regularized Adaptation for Discriminative Classifiers Xiao Li and Jeff Bilmes University of Washington, Seattle."— Presentation transcript:

1 Regularized Adaptation for Discriminative Classifiers Xiao Li and Jeff Bilmes University of Washington, Seattle

2 Xiao Li and Jeff Bilmes University of Washington, Seattle 2 This work … Investigates links between a number discriminative classifiers Presents a general adaptation strategy – “regularized adaptation”

3 Xiao Li and Jeff Bilmes University of Washington, Seattle 3 Adaptation for generative models Target sample distribution is different from that of training Has long been studied in speech recognition for generative models  Maximum likelihood linear regression  Maximum a posteriori  Eigenvoice

4 Xiao Li and Jeff Bilmes University of Washington, Seattle 4 Discriminative classifiers  Directly model the conditional relation of a label given features  Often yield more robust classification performance than generative models Popularly used:  Support vector machines (SVM)  Multi-layer perceptrons (MLP)  Conditional maximum entropy models

5 Xiao Li and Jeff Bilmes University of Washington, Seattle 5 Existing Discriminative Adaptation Strategies SVMs:  Combine SVs with selected adaptation data (Matic 93)  Combine selected SVs with adaptation data (Li 05) MLPs:  Linear input network (Neto 95, Abrash 97)  Retrain both layers from unadapted model (Neto 95)  Retrain part of last layer (Stadermann 05)  Retrain first layer Conditional MaxEnt:  Gaussian prior (Chelba 04)

6 Xiao Li and Jeff Bilmes University of Washington, Seattle 6 SVMs and MLPs – Links Binary classification (x t y t ) Discriminant function Accuracy-regularization objective Nonlinear transform Empirical riskRegularizer SVM:maximum margin MLP:weight decay MaxEnt:Gaussian smoothing

7 Xiao Li and Jeff Bilmes University of Washington, Seattle 7 SVMs and MLPs – Differences Nonlinear transform Φ θ Typical loss func. Q Typical training SVMs Reproducing kernel Hinge lossQuadratic prog. MLPs Input-to-hidden layer Log lossGradient descent

8 Xiao Li and Jeff Bilmes University of Washington, Seattle 8 Adaptation Adaptation data  May be in a small amount  May be unbalanced in classes We intend to utilize  Unadapted model w 0  Adaptation data (x t, y t ), t=1:T

9 Xiao Li and Jeff Bilmes University of Washington, Seattle 9 Regularized Adaptation Generalized objective w.r.t. adapt data Relations with existing SVM adapt. algs.  hinge loss (retrain SVM)  hard boosting (Matic 93) Margin error

10 Xiao Li and Jeff Bilmes University of Washington, Seattle 10 New Regularized Adaptation for SVMs Soft boosting – combine margin errors adapt data d0d0 Decision function using adapt data only

11 Xiao Li and Jeff Bilmes University of Washington, Seattle 11 Regularized Adaptation for SVMs (Cont.) Theorem, for linear SVMs In practice, we use α=1

12 Xiao Li and Jeff Bilmes University of Washington, Seattle 12 Reg. Adaptation for MLPs Extend this to a two-layer MLP Relations with existing MLP adapt. algs.  Linear input network: μ  ∞  Retrain from SI model: μ=0, ν=0  Retrain last layer: μ=0, ν  ∞  Retrain first layer: μ  ∞, ν=0  Regularized: choose μ,ν on a dev set This also relates to MaxEnt adaptation using Gaussian priors

13 Xiao Li and Jeff Bilmes University of Washington, Seattle 13 Experiments – Vowel Classification Application: the Vocal Joystick  A voice based computer interface for individuals with motor impairments  Vowel quality  angle Data set (extended)  Train/dev/eval: 21/4/10 speakers  6-fold cross-validation MLP configuration  7 frames of MFCC + deltas  50 hidden nodes Frame-level classification error rate

14 Xiao Li and Jeff Bilmes University of Washington, Seattle 14 Varying Adaptation Time Err% 4-class8-class SI7.60 ± 0.0832.02 ± 0.31 1s2s3s1s2s3s 1.160.410.3413.5211.8111.96 1.630.210.5312.159.647.88 2.931.661.9115.4513.3211.40 0.790.230.1211.569.127.35 0.220.190.1211.568.167.30

15 Xiao Li and Jeff Bilmes University of Washington, Seattle 15 Varying # vowels in adaptation (3s each) SI: 32%

16 Xiao Li and Jeff Bilmes University of Washington, Seattle 16 Varying # vowels in adaptation (3s each) SI: 32%

17 Xiao Li and Jeff Bilmes University of Washington, Seattle 17 Varying # vowels in adaptation (3s total) SI: 32%

18 Xiao Li and Jeff Bilmes University of Washington, Seattle 18 Varying # vowels in adaptation (3s total) SI: 32%

19 Xiao Li and Jeff Bilmes University of Washington, Seattle 19 Summary Drew links between discriminative classifiers Presented a general notion of “regularized adaptation” for discriminative classifiers  Natural adaptation strategies for SVMs and MLPs justified using a maximum margin argument  A unified view of different adaptation algorithms MLP experiments show superior performance especially for class-skewed data


Download ppt "Regularized Adaptation for Discriminative Classifiers Xiao Li and Jeff Bilmes University of Washington, Seattle."

Similar presentations


Ads by Google