Latent Variables Naman Agarwal Michael Nute May 1, 2013.

Presentation on theme: "Latent Variables Naman Agarwal Michael Nute May 1, 2013."— Presentation transcript:

Latent Variables Naman Agarwal Michael Nute May 1, 2013

Latent Variables Contents Definition & Example of Latent Variables EM Algorithm Refresher Structured SVM with Latent Variables Learning under semi-supervision or indirect supervision –CoDL –Posterior Regularization –Indirect Supervision

Latent Variables General Definition & Examples A Latent Variable in a machine learning algorithm is one which is assumed to exist (or have null value) but which is not observed and is inferred from other observed variables. Generally corresponds to some meaningful element of the problem for which direct supervision is intractable. Latent variable methods often imagine the variable as part of the input/feature space (e.g. PCA, factor analysis), or as part of the output space (e.g. EM). –This distinction is only illustrative though and can be blurred, as we will see with indirect supervision. Latent Input Variables: (unobserved) Latent Output Variables: When we think of a latent variable as part of the output space, the method becomes an exercise in unsupervised or semi-supervised learning. (unobserved) (observed)

Example Paraphrase Identification Problem: Given sentences A and B, determine whether they are paraphrases of each other. Note that if they are paraphrases, then there will exist a mapping between named entities and predicates in the sentence. The mapping is not directly observed, but is a latent variable in the decision problem of determining whether the sentences say the same thing. A: Druce will face murder charges, Conte said. B: Conte said Druce will be charged with murder. (latent) Revised Problem: Given sentences A and B, determine the mapping of semantic elements between A and B. Now we are trying to learn specifically the mapping between them, so we can use the Boolean question in the previous problem as a latent variable. In practice, the Boolean question is easy to answer, so we can use it to guide the semi- supervised task of mapping semantic elements. This is called indirect supervision (more on that later). 1 Example taken from talk by D. Roth Language Technologies Institute Colloquium, Carnegie Mellon University, Pittsburgh, PA. Constraints Driven Structured Learning with Indirect Supervision. April 2010.Constraints Driven Structured Learning with Indirect Supervision.

The EM Algorithm Refresher In practice, many algorithms that use latent variables have a structure similar to the Expectations- Maximization algorithm (even though EM is not discriminative and others are). So lets review:

The EM Algorithm Hard EM vs. Soft EM (repeat until convergence)

Yu & JoachimsLearning Structured SVMs with Latent Variables Model Formulation Problem is now the difference of two convex functions, so we can solve it using a concave-convex procedure (CCCP).

Yu & JoachimsLearning Structured SVMs with Latent Variables Optimization Methodology & Notes

Learning under semi-supervision Labeled dataset is hard to obtain We generally have a small labeled dataset and a large unlabeled data-set Naïve Algorithm [A kind of EM] Train on labeled data set [Initialization] Make Inference on the unlabeled set [Expectation] Include them in your training [Maximization] Repeat Can we do better ? Indirect supervision Constraints Binary decision problems

Constraint Driven Learning Proposed by Chang et al [2007] Uses constraints obtained by domain-knowledge as to streamline semi-supervision Constraints are pretty general Incorporates soft constraints

Why are constraints useful ? [AUTHOR Lars Ole Anderson. ] [TITLE Program Analysis and specification for the C programming language. ] [ TECH-REPORT PhD thesis, ] [INSTITUTION DIKU, University of Copenhagen, ][DATE May 1994.] HMM trained on 30 data sets produces [AUTHOR Lars Ole Anderson. Program Analysis and ] [ TITLE specification for the ] [ EDITOR C ] BOOKTITLE programming language. ] [ TECH- REPORT PhD thesis, ] [INSTITUTION DIKU, University of Copenhagen, May ][DATE 1994.] Leads to noisy predictions. Simple constraint that state transition occurs only on punctuation marks produces the correct output

CoDL Framework

CoDL Objective

Learning Algorithm

Learning Algorithm (cntd.)

Posterior Regularization [Ganchev et al 09] Posterior Distribution of the latent variables Constraint specified in terms of expectation over q Set of all posterior distributions

The PR Algorithm

Indirect Supervision - Motivation Paraphrase Identification S1: Druce will face murder charges, Conte said. S2: Conte said Druce will be charged with murder. There exists some Latent Structure H between S1 and S2 H acts as a justification for the binary decision. Can be used as an intermediate step in learning the model

Supervision through Binary Problems Now we ask the previous question in the reverse direction Given answers to the binary problem, can we improve our latent structure identification Example – Field Identification in advertisements (size,rent etc.) Whether the text is a well formed advertisement Companion Binary Problem Labeled dataset – easy to obtain Structured Prediction Problem

The Model [Chang et al 2010] The weight vector scores all structures badly The weight vector scores some structure well

Loss Function Structured Prediction over the labeled dataset

Indirect Supervision Model Specification (i.e. there is no good predicted structure for the negative examples) (i.e. there is at least one good predicted structure for the positive examples) Fully-labeled training data: Binary-labeled training data: Setup: Two Conditions Imposed on the Weight Vector:

Latent Variables in NLP Overview of Three Methods Method2-Second DescriptionLatent VariableEM AnalogueKey Advantage Structural SVM 1 Structured SVM with latent variables & EM- like training Separate and independent from the output variable Enables Structured SVM learned with latent variable CoDL 2 Train on labeled data, generate K best structures of unlabeled data and train on that. Average the two. Output variable for unlabeled training examples Soft-EM with Uniform Distribution on top-K predicted outputs. Efficient semi- supervised learning when constraints are difficult to guarantee for predictions but easy to evaluate Indirect Supervision 3 Get small number of labeled & many where we know if label exists or not. Train a model on both at the same time. 1.Companion binary-decision variable 2.Output structure on positive, unlabeled examples Hard EM where label is applied only to examples where binary classifier is positive Combines information gain from indirect supervision (on lots of data) with direct supervision 1 Learning Structural SVMs with Latent Variables, Chun-Nam John Yu and T. Joachims, ICML, 2009. Learning Structural SVMs with Latent Variables 2 Guiding Semi-Supervision with Constraint-Driven Learning, M. Chang, L. Ratinov and D. Roth, ACL 2007 Guiding Semi-Supervision with Constraint-Driven Learning 3 Structured Output Learning with Indirect Supervision, M. Chang, V. Srikumar, D. Goldwasser and D. Roth, ICML 2010. Structured Output Learning with Indirect Supervision