Presentation is loading. Please wait.

Presentation is loading. Please wait.

Standard EM/ Posterior Regularization (Ganchev et al, 10) E-step: M-step: argmax w E q log P (x, y; w) Hard EM/ Constraint driven-learning (Chang et al,

Similar presentations


Presentation on theme: "Standard EM/ Posterior Regularization (Ganchev et al, 10) E-step: M-step: argmax w E q log P (x, y; w) Hard EM/ Constraint driven-learning (Chang et al,"— Presentation transcript:

1 Standard EM/ Posterior Regularization (Ganchev et al, 10) E-step: M-step: argmax w E q log P (x, y; w) Hard EM/ Constraint driven-learning (Chang et al, 07) E-step: M-step: argmax w E q log P (x, y; w) y * = argmax y P (y|x, w) Uy · b Not clear which version To use!!! Expectation Maximization and Variations  EM is the most popular algorithm for unsupervised and semi-supervised learning  E-step is an inference step: inferring posterior distribution over output variables  Hard EM – infer the most likely variable assignment at each iteration step.  Constrained EM: Posterior Regularization (PR) and Constraint-Driven Learning (CoDL)  There are different variations of EM which change the E-step (or the inference step) Infers posterior distribution spread over all outputs Infers posterior distribution peaked over just one output Tuning the Posterior Entropy: Unified Expectation Maximization (UEM)  UEM: A framework for explicitly tuning the entropy of the posterior distribution during the E-step or the inference-step and minimizes a modified KL divergence KL(q, P (y|x;w); °) where Changes the entropy of the posterior KL(q, p; °) =  y ° q(y) log q(y) – q(y) log p(y) A Framework For Tuning Posterior Entropy Rajhans Samdani 1, Ming-Wei Chang 2, and Dan Roth 1 1 University of Illinois at Urbana-Champaign, 2 Microsoft Research  Different ° values ! different EM algorithms: control the “hardness of inference” as means to better adapt learning to underlying distribution, data, initialization, constraints, etc.  The effect of changing ° on the posterior q : Conditional Distribution p q with ° = 1q with ° = 0q with ° = 1q with ° = -1 !  Unification: Changing ° values results in different existing EM algorithms No Constraints With Constraints ° 10-11 Hard EM CODL EM PR Deterministic Annealing (Smith and Eisner, 04; Hofmann, 99) Experimental Evidence for Tuning Posterior Entropy  Test if tuning the posterior entropy via ° improves the performance over baselines namely  EM or Posterior Regularization (PR) corresponds to ° = 1.0  Hard EM or Constraint-driven Learning (CODL) corresponds to ° = -1  Study the relation between the quality of initialization and ° (or the “hardness” of inference)  In almost all of our experiments, the best UEM algorithm corresponds to ° somewhere between 0 and 1 and which we discovered through our UEM framework  Food for thought: why and how is the posterior entropy exactly affecting the learning so much? Unsupervised POS Tagging Uniform Initialization Initialization with 5 examples Initialization with 10 examples Initialization with 20 examples Initialization with 40-80 examples ° Performance relative to EM Hard EMEM  Model as first order HMM  Try varying qualities of initialization:  Uniform initialization: initialize with equal probability for all states  Supervised initialization: initialize with parameters trained on varying amounts of labeled data  Observe: better quality initialization ) hard inference better Experiments: Entity-Relation Extraction  Extract entity types (e.g. Loc, Org, Per) and relation types (e.g. Lives-in, Org- based-in, Killed) between pairs of entities  Add constraints:  Type constraints between entity and relations  Expected count constraints to regularize the counts of ‘None’ relation  Semi-supervised learning with a small amount of labeled data Macro-f1 scores % of labeled data Experiments: Word Alignment  Word alignment from a language S to language T  We try En-Fr and En-Es pairs  We use an HMM-based model with agreement constraints for word alignment  By tuning ° in UEM, reduce error rate over PR by 20% and over CODL by 40% Word Alignment: ES-EN Alignment Error Rate No. of unlabeled sentences Supported by the Army Research Laboratory (ARL), Defense Advanced Research Projects Agency (DARPA), and the Office of Naval Research (ONR).. argmin q KL(q t (y),P (y | x; w)) E q [Uy] · b


Download ppt "Standard EM/ Posterior Regularization (Ganchev et al, 10) E-step: M-step: argmax w E q log P (x, y; w) Hard EM/ Constraint driven-learning (Chang et al,"

Similar presentations


Ads by Google