Presentation is loading. Please wait.

Presentation is loading. Please wait.

Variational Infinite Hidden Conditional Random Fields with Coupled Dirichlet Process Mixtures K. Bousmalis, S. Zafeiriou, L.-P. Morency, M. Pantic, Z.

Similar presentations


Presentation on theme: "Variational Infinite Hidden Conditional Random Fields with Coupled Dirichlet Process Mixtures K. Bousmalis, S. Zafeiriou, L.-P. Morency, M. Pantic, Z."— Presentation transcript:

1 Variational Infinite Hidden Conditional Random Fields with Coupled Dirichlet Process Mixtures K. Bousmalis, S. Zafeiriou, L.-P. Morency, M. Pantic, Z. Ghahramani

2 Hidden Conditional Random Field X1X1 X2X2 X3X3 X4X4 X5X5 Head Nod Head Shake F0 Shoulder Shrug y P(y=‘Agreement’|X) = ? s1s1 s2s2 s3s3 s4s4 s5s5 P(y=‘Disagreement’|X) = ? haha hbhb hchc Shake Shrug Hand Wag Hand Scissor … F0 Energy Hidden States haha hbhb hchc

3 Weights and equivalent potentials for each relationship: – hidden states and labels θ y exp{θ y } – features and hidden states θ x exp{∑f t θ x } – transitions among hidden states and labels θ e exp{θ e } X1X1 X2X2 X3X3 X4X4 X5X5 y s1s1 s2s2 s3s3 s4s4 s5s5 haha hbhb hchc Hidden States Learned HCRF Model

4 Number of hidden states is not intuitive for behavior problems Computationally expensive cross-validation for the number of hidden states Solution: Allow for a potentially infinite number of hidden states X1X1 X2X2 X3X3 X4X4 X5X5 y s1s1 s2s2 s3s3 s4s4 s5s5 HCRF Problems haha hbhb hchc Hidden States

5 Motivation and Novelty Previous work introduced infinite-state HCRFs with an efficient MCMC sampling approach (IHCRF-MCMC) This work proposes a model that is a generalization of: Finite HCRFs : in terms of its ability to automatically determine its hidden structure without cross-validation IHCRF-MCMC: in terms of its ability to handle continuous input gracefully. We present a novel variational inference method for learning Deterministic alternative to MCMC Precise learning stopping criterion

6 Our Framework No a priori bound on the number of hidden states, by introducing a set of random variables These are drawn by distinct processes that allow the number of hidden states to grow with the data … and are incorporated in our potential:

7 The HCRF-DPM Model In our model, the π-variables are driven by coupled DPs. According to the stick-breaking properties: where ω μ = {h k, y}

8 The HCRF-DPM Model X y s1s1 s2s2 sΤsΤ αxαx αyαy αeαe πxπx πyπy πeπe ∞ with Actual Joint Distribution

9 Variational Approximation We approximate all π-variables (variational parameters τ) with a truncated stick-breaking representation, which approximates the infinite number of hidden states with a finite L: If L = 5: In practice, L is large enough for the actual sum to be really small!

10 Model Training Objective: Find the parameters {θ, τ} that minimize  [q||p] We alternate, until convergence, between – a coordinate descent method for finding τ – a HCRF-like gradient ascent method to find θ

11 Experiments-Human Behavior Classification performance (F1) on 1.Agreement vs. Disagreement (ADA2) 2.Agreement vs. Disagreement vs. Neutral (ADA3) 3.Extreme Pain vs. No Pain (PAIN2) 4.Extreme vs. Moderate vs. No Pain (PAIN3)

12 Agreement and Disagreement Canal 9 Dataset of Political Debates – Ground truth based ONLY on verbal content – 11 debates- 28 distinct individuals – 53 episodes of agreement – 94 episodes of disagreement – 130 neutral episodes Binary Visual Features: Presence per frame of 8 gestures Continuous Auditory Feature: F0, Energy

13 UNBC Dataset of Pain Different levels of elicited shoulder pain in 200 sequences – 25 subjects Annotations of 12 pain-related facial action units (AUs) 2 classification problems – Extreme pain vs Minimal pain – Incl. Moderate pain

14 Classification Performance 10 different random initializations HCRFs cross-validated for – 2, 3, 4 and 5 hidden states – Regularization factor of 1, 10, 100 HCRF-DPM L=10 F1

15 No Overfitting HCRF-DPM Performance on the Canal 9 Validation Set

16 Node Features—HCRF-DPM, L = 50 Node Features—finite HCRF, K = 50 Hidden States 1020304050 Hidden States 1020304050 Sparsity

17 Future Avenues More datasets Using HDPs and Pitman-Yor processes Infinite Latent Dynamic CRFs X

18 Thank you! Poster Stand #46 for more details

19 Dirichlet Process Mixture A DPM model: a hierarchical Bayesian model that uses a DP as a nonparametric prior

20 HCRF-DPM π-sticks Although the π-variables are drawn by distinct processes, they are coupled together by a common latent variable assignment. Feature 1 Feature 2 Label 1 Label 2 Prev. State 1 Prev. State 2 Prev. State 3

21 Variational Approximation with Actual Joint Distribution Approximate Joint Distribution with

22 Initialize α x,α y,α e,θ,τ Initialize nbItrs, nbVarItrs itr = 0 converged = FALSE while (not converged) and (itr < nbItrs) do varItrs = 0 varConverged = FALSE while (not varConverged) and (varItr < nbVarItrs) do Compute q(s t =h k | i), q(s t =h k | y), q(s t =h k, y, s t-1 = h a ), i.e. the approximate marginals by using the forward-backward algorithm Hyperparameter posterior sampling for α x, α y, α e Calculate  [q||p](varItr) Update τ varItr = varItr + 1 end while Gradient ascent to find θ(itr) by using a quasi-Newton method and an Armijo backtracking line search with projected gradients to keep θ non-negative itr = itr + 1 end while Model Training for Variational HCRF-DPM

23 Performance Evaluation Classification of Agreement and Disagreement – Leave-2-debates out for testing (5 folds) – Optimal parameter choice based on 3 debates Classification of Pain Levels – Leave-1-subject out for testing (25 folds) – Optimal parameter choice based on 7 subjects

24 Synthetic Dataset 0.4 0.1 0.4 0.1 0.4 0.1 0.4 0.10.70.1 0.70.1 0.7 0.1


Download ppt "Variational Infinite Hidden Conditional Random Fields with Coupled Dirichlet Process Mixtures K. Bousmalis, S. Zafeiriou, L.-P. Morency, M. Pantic, Z."

Similar presentations


Ads by Google