Latent Variables Naman Agarwal Michael Nute May 1, 2013.

Slides:

Advertisements

Similar presentations

A Support Vector Method for Optimizing Average Precision

Advertisements

Self-Paced Learning for Semantic Segmentation

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Unified Expectation Maximization Rajhans Samdani Joint work with Ming-Wei Chang (Microsoft Research) and Dan Roth University of Illinois at Urbana-Champaign.

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.

Curriculum Learning for Latent Structural SVM

Guiding Semi- Supervision with Constraint-Driven Learning Ming-Wei Chang,Lev Ratinow, Dan Roth.

Albert Gatt Corpora and Statistical Methods Lecture 13.

SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.

CONSTRAINED CONDITIONAL MODELS TUTORIAL Jingyu Chen, Xiao Cheng.

Structured SVM Chen-Tse Tsai and Siddharth Gupta.

Machine learning continued Image source:

Supervised Learning Recap

Learning Structural SVMs with Latent Variables Xionghao Liu.

Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

Personalized Search Result Diversification via Structured Learning

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.

Bing LiuCS Department, UIC1 Learning from Positive and Unlabeled Examples Bing Liu Department of Computer Science University of Illinois at Chicago Joint.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Introduction to Machine Learning Approach Lecture 5.

Standard EM/ Posterior Regularization (Ganchev et al, 10) E-step: M-step: argmax w E q log P (x, y; w) Hard EM/ Constraint driven-learning (Chang et al,

SI485i : NLP Set 12 Features and Prediction. What is NLP, really? Many of our tasks boil down to finding intelligent features of language. We do lots.

STRUCTURED PERCEPTRON Alice Lai and Shi Zhi. Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable.

Dual Coordinate Descent Algorithms for Efficient Large Margin Structured Prediction Ming-Wei Chang and Scott Wen-tau Yih Microsoft Research 1.

Boris Babenko Department of Computer Science and Engineering University of California, San Diego Semi-supervised and Unsupervised Feature Scaling.

Semisupervised Learning A brief introduction. Semisupervised Learning Introduction Types of semisupervised learning Paper for review References.

Annealing Paths for the Evaluation of Topic Models James Foulds Padhraic Smyth Department of Computer Science University of California, Irvine* *James.

The Necessity of Combining Adaptation Methods Cognitive Computation Group, University of Illinois Experimental Results Title Ming-Wei Chang, Michael Connor.

Qual Presentation Daniel Khashabi 1. Outline  My own line of research  Papers:  Fast Dropout training, ICML, 2013  Distributional Semantics Beyond.

计算机学院计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知计算机学院 Perceptron Revisited: Linear Separators Binary classification.

Topic Modelling: Beyond Bag of Words By Hanna M. Wallach ICML 2006 Presented by Eric Wang, April 25 th 2008.

Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.

Unsupervised Constraint Driven Learning for Transliteration Discovery M. Chang, D. Goldwasser, D. Roth, and Y. Tu.

SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.

Machine Learning in Ad-hoc IR. Machine Learning for ad hoc IR We’ve looked at methods for ranking documents in IR using factors like –Cosine similarity,

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.

HAITHAM BOU AMMAR MAASTRICHT UNIVERSITY Transfer for Supervised Learning Tasks.

Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.

Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,

John Lafferty Andrew McCallum Fernando Pereira

Maximum Entropy Discrimination Tommi Jaakkola Marina Meila Tony Jebara MIT CMU MIT.

CS6772 Advanced Machine Learning Fall 2006 Extending Maximum Entropy Discrimination on Mixtures of Gaussians With Transduction Final Project by Barry.

NTU & MSRA Ming-Feng Tsai

Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

Final Report (30% final score) Bin Liu, PhD, Associate Professor.

Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)

Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819

Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.

Page 1 July 2008 ICML Workshop on Prior Knowledge for Text and Language Constraints as Prior Knowledge Ming-Wei Chang, Lev Ratinov, Dan Roth Department.

Web-Mining Agents: Transfer Learning TrAdaBoost R. Möller Institute of Information Systems University of Lübeck.

Static model noOverlaps :: ArgumentCandidate[] candidates -> discrete[] types for (i : (0.. candidates.size() - 1)) for (j : (i candidates.size()

Page 1 CS 546 Machine Learning in NLP Review 2: Loss minimization, SVM and Logistic Regression Dan Roth Department of Computer Science University of Illinois.

Dan Roth University of Illinois, Urbana-Champaign 7 Sequential Models Tutorial on Machine Learning in Natural.

High resolution product by SVM. L’Aquila experience and prospects for the validation site R. Anniballe DIET- Sapienza University of Rome.

Semi-Supervised Learning Using Label Mean

Semi-Supervised Clustering

PART 5: CONSTRAINTS DRIVEN LEARNING

Part 2 Applications of ILP Formulations in Natural Language Processing

By Dan Roth and Wen-tau Yih PowerPoint by: Reno Kriz CIS

Machine Learning Basics

J. Zhu, A. Ahmed and E.P. Xing Carnegie Mellon University ICML 2009

Concave Minimization for Support Vector Machine Classifiers

MAS 622J Course Project Classification of Affective States - GP Semi-Supervised Learning, SVM and kNN Hyungil Ahn

Dan Roth Department of Computer Science

Presentation transcript:

Latent Variables Naman Agarwal Michael Nute May 1, 2013

Latent Variables Contents Definition & Example of Latent Variables EM Algorithm Refresher Structured SVM with Latent Variables Learning under semi-supervision or indirect supervision –CoDL –Posterior Regularization –Indirect Supervision

Latent Variables General Definition & Examples A Latent Variable in a machine learning algorithm is one which is assumed to exist (or have null value) but which is not observed and is inferred from other observed variables. Generally corresponds to some meaningful element of the problem for which direct supervision is intractable. Latent variable methods often imagine the variable as part of the input/feature space (e.g. PCA, factor analysis), or as part of the output space (e.g. EM). –This distinction is only illustrative though and can be blurred, as we will see with indirect supervision. Latent Input Variables: (unobserved) Latent Output Variables: When we think of a latent variable as part of the output space, the method becomes an exercise in unsupervised or semi-supervised learning. (unobserved) (observed)

Example Paraphrase Identification Problem: Given sentences A and B, determine whether they are paraphrases of each other. Note that if they are paraphrases, then there will exist a mapping between named entities and predicates in the sentence. The mapping is not directly observed, but is a latent variable in the decision problem of determining whether the sentences say the same thing. A: Druce will face murder charges, Conte said. B: Conte said Druce will be charged with murder. (latent) Revised Problem: Given sentences A and B, determine the mapping of semantic elements between A and B. Now we are trying to learn specifically the mapping between them, so we can use the Boolean question in the previous problem as a latent variable. In practice, the Boolean question is easy to answer, so we can use it to guide the semisupervised task of mapping semantic elements. This is called indirect supervision (more on that later). 1 Example taken from talk by D. Roth Language Technologies Institute Colloquium, Carnegie Mellon University, Pittsburgh, PA. Constraints Driven Structured Learning with Indirect Supervision. April 2010.Constraints Driven Structured Learning with Indirect Supervision.

The EM Algorithm Refresher In practice, many algorithms that use latent variables have a structure similar to the Expectations- Maximization algorithm (even though EM is not discriminative and others are). So lets review:

The EM Algorithm Hard EM vs. Soft EM (repeat until convergence)

Yu & JoachimsLearning Structured SVMs with Latent Variables Model Formulation Problem is now the difference of two convex functions, so we can solve it using a concave-convex procedure (CCCP).

Yu & JoachimsLearning Structured SVMs with Latent Variables Optimization Methodology & Notes

Learning under semi-supervision Labeled dataset is hard to obtain We generally have a small labeled dataset and a large unlabeled data-set Naïve Algorithm [A kind of EM] Train on labeled data set [Initialization] Make Inference on the unlabeled set [Expectation] Include them in your training [Maximization] Repeat Can we do better ? Indirect supervision Constraints Binary decision problems

Constraint Driven Learning Proposed by Chang et al [2007] Uses constraints obtained by domain-knowledge as to streamline semi-supervision Constraints are pretty general Incorporates soft constraints

Why are constraints useful ? [AUTHOR Lars Ole Anderson. ] [TITLE Program Analysis and specification for the C programming language. ] [ TECH-REPORT PhD thesis, ] [INSTITUTION DIKU, University of Copenhagen, ][DATE May 1994.] HMM trained on 30 data sets produces [AUTHOR Lars Ole Anderson. Program Analysis and ] [ TITLE specification for the ] [ EDITOR C ] BOOKTITLE programming language. ] [ TECH- REPORT PhD thesis, ] [INSTITUTION DIKU, University of Copenhagen, May ][DATE 1994.] Leads to noisy predictions. Simple constraint that state transition occurs only on punctuation marks produces the correct output

CoDL Framework

CoDL Objective

Learning Algorithm

Learning Algorithm (cntd.)

Posterior Regularization [Ganchev et al 09] Posterior Distribution of the latent variables Constraint specified in terms of expectation over q Set of all posterior distributions

The PR Algorithm

Indirect Supervision - Motivation Paraphrase Identification S1: Druce will face murder charges, Conte said. S2: Conte said Druce will be charged with murder. There exists some Latent Structure H between S1 and S2 H acts as a justification for the binary decision. Can be used as an intermediate step in learning the model

Supervision through Binary Problems Now we ask the previous question in the reverse direction Given answers to the binary problem, can we improve our latent structure identification Example – Field Identification in advertisements (size,rent etc.) Whether the text is a well formed advertisement Companion Binary Problem Labeled dataset – easy to obtain Structured Prediction Problem

The Model [Chang et al 2010] The weight vector scores all structures badly The weight vector scores some structure well

Loss Function Structured Prediction over the labeled dataset

Indirect Supervision Model Specification (i.e. there is no good predicted structure for the negative examples) (i.e. there is at least one good predicted structure for the positive examples) Fully-labeled training data: Binary-labeled training data: Setup: Two Conditions Imposed on the Weight Vector:

Latent Variables in NLP Overview of Three Methods Method2-Second DescriptionLatent VariableEM AnalogueKey Advantage Structural SVM 1 Structured SVM with latent variables & EM- like training Separate and independent from the output variable Enables Structured SVM learned with latent variable CoDL 2 Train on labeled data, generate K best structures of unlabeled data and train on that. Average the two. Output variable for unlabeled training examples Soft-EM with Uniform Distribution on top-K predicted outputs. Efficient semisupervised learning when constraints are difficult to guarantee for predictions but easy to evaluate Indirect Supervision 3 Get small number of labeled & many where we know if label exists or not. Train a model on both at the same time. 1.Companion binary-decision variable 2.Output structure on positive, unlabeled examples Hard EM where label is applied only to examples where binary classifier is positive Combines information gain from indirect supervision (on lots of data) with direct supervision 1 Learning Structural SVMs with Latent Variables, Chun-Nam John Yu and T. Joachims, ICML, Learning Structural SVMs with Latent Variables 2 Guiding Semi-Supervision with Constraint-Driven Learning, M. Chang, L. Ratinov and D. Roth, ACL 2007 Guiding Semi-Supervision with Constraint-Driven Learning 3 Structured Output Learning with Indirect Supervision, M. Chang, V. Srikumar, D. Goldwasser and D. Roth, ICML Structured Output Learning with Indirect Supervision