Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06.

Slides:



Advertisements
Similar presentations
Three Basic Problems Compute the probability of a text: P m (W 1,N ) Compute maximum probability tag sequence: arg max T 1,N P m (T 1,N | W 1,N ) Compute.
Advertisements

Data Mining Classification: Alternative Techniques
Classification Classification Examples
Three Basic Problems 1.Compute the probability of a text (observation) language modeling – evaluate alternative texts and models P m (W 1,N ) 2.Compute.
Cutting trees and the cutting lemma Presented by Amir Mograbi.
On-line learning and Boosting
Support Vector Machines
Part of Speech Tagging Importance Resolving ambiguities by assigning lower probabilities to words that don’t fit Applying to language grammatical rules.
Final review LING572 Fei Xia Week 10: 03/13/08 1.
Semi-supervised learning and self-training LING 572 Fei Xia 02/14/06.
Ch 10 Part-of-Speech Tagging Edited from: L. Venkata Subramaniam February 28, 2002.
Decision tree LING 572 Fei Xia 1/10/06. Outline Basic concepts Main issues Advanced topics.
Induction of Decision Trees
Decision List LING 572 Fei Xia 1/18/06. Outline Basic concepts and properties Case study.
Adaboost and its application
Introduction LING 572 Fei Xia Week 1: 1/3/06. Outline Course overview Problems and methods Mathematical foundation –Probability theory –Information theory.
Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)
Learning decision trees derived from Hwee Tou Ng, slides for Russell & Norvig, AI a Modern Approachslides Tom Carter, “An introduction to information theory.
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Learning decision trees
Syllabus Text Books Classes Reading Material Assignments Grades Links Forum Text Books עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms עידו.
Bagging LING 572 Fei Xia 1/24/06. Ensemble methods So far, we have covered several learning methods: FSA, HMM, DT, DL, TBL. Question: how to improve results?
Course Summary LING 572 Fei Xia 03/06/07. Outline Problem description General approach ML algorithms Important concepts Assignments What’s next?
Sequence labeling and beam search LING 572 Fei Xia 2/15/07.
Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.
Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Enterprise Data Quality CDEP: Tailoring Parser Configuration.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network Kristina Toutanova, Dan Klein, Christopher Manning, Yoram Singer Stanford University.
The classification problem (Recap from LING570) LING 572 Fei Xia, Dan Jinguji Week 1: 1/10/08 1.
Boosting Main idea: train classifiers (e.g. decision trees) in a sequence. a new classifier should focus on those cases which were incorrectly classified.
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.
Maximum Entropy Model LING 572 Fei Xia 02/08/07. Topics in LING 572 Easy: –kNN, Rocchio, DT, DL –Feature selection, binarization, system combination –Bagging.
Machine Learning CS 165B Spring 2012
Final review LING572 Fei Xia Week 10: 03/11/
1 Introduction to Parsing Lecture 5. 2 Outline Regular languages revisited Parser overview Context-free grammars (CFG’s) Derivations.
EMNLP’01 19/11/2001 ML: Classical methods from AI –Decision-Tree induction –Exemplar-based Learning –Rule Induction –T ransformation B ased E rror D riven.
Tree Kernels for Parsing: (Collins & Duffy, 2001) Advanced Statistical Methods in NLP Ling 572 February 28, 2012.
Dialogue Act Tagging Using TBL Sachin Kamboj CISC 889: Statistical Approaches to NLP Spring 2003 September 14, 2015September 14, 2015September 14, 2015.
APPLICATIONS OF DATA MINING IN INFORMATION RETRIEVAL.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Comparative study of various Machine Learning methods For Telugu Part of Speech tagging -By Avinesh.PVS, Sudheer, Karthik IIIT - Hyderabad.
인공지능 연구실 정 성 원 Part-of-Speech Tagging. 2 The beginning The task of labeling (or tagging) each word in a sentence with its appropriate part of speech.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
© Copyright 1992–2005 by Deitel & Associates, Inc. and Pearson Education Inc. All Rights Reserved. Tutorial 5 – Dental Payment Application: Introducing.
Universit at Dortmund, LS VIII
Benk Erika Kelemen Zsolt
Learning from Observations Chapter 18 Through
An Introduction to Support Vector Machines (M. Law)
Transformation-Based Learning Advanced Statistical Methods in NLP Ling 572 March 1, 2012.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
CLASSIFICATION: Ensemble Methods
13-1 Chapter 13 Part-of-Speech Tagging POS Tagging + HMMs Part of Speech Tagging –What and Why? What Information is Available? Visible Markov Models.
Dependency Parser for Swedish Project for EDA171 by Jonas Pålsson Marcus Stamborg.
For Monday No new reading Homework: –Chapter 18, exercises 3 and 4.
Decision Trees Binary output – easily extendible to multiple output classes. Takes a set of attributes for a given situation or object and outputs a yes/no.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Automatic Grammar Induction and Parsing Free Text - Eric Brill Thur. POSTECH Dept. of Computer Science 심 준 혁.
Data Mining and Decision Support
Course Review #2 and Project Parts 3-6 LING 572 Fei Xia 02/14/06.
Using decision trees to build an a framework for multivariate time- series classification 1 Present By Xiayi Kuang.
Decision List LING 572 Fei Xia 1/12/06. Outline Basic concepts and properties Case study.
CSA2050: Introduction to Computational Linguistics Part of Speech (POS) Tagging II Transformation Based Tagging Brill (1995)
Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.
Instance-Based Learning Evgueni Smirnov. Overview Instance-Based Learning Comparison of Eager and Instance-Based Learning Instance Distances for Instance-Based.
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Perceptrons Lirong Xia.
A task of induction to find patterns
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Perceptrons Lirong Xia.
Presentation transcript:

Transformation-based error- driven learning (TBL) LING 572 Fei Xia 1/19/06

Outline Basic concept and properties Relation between DT, DL, and TBL Case study

Basic concepts and properties

TBL overview Introduced by Eric Brill (1992) Intuition: –Start with some simple solution to the problem –Then apply a sequence of transformations Applications: –Classification problems –Other kinds of problems: e.g., parsing

TBL flowchart

Transformations A transformation has two components: –A trigger environment: e.g., the previous tag is DT –A rewrite rule: change the current tag from MD to N If (prev_tag == T) then MD  N Similar to a rule in decision tree, but the rewrite rule can be complicated (e.g., change a parse tree)  a transformation list is a processor and not (just) a classifier.

Learning transformations 1.Initialize each example in the training data with a classifier 2.Consider all the possible transformations, and choose the one with the highest score. 3.Append it to the transformation list and apply it to the training corpus to obtain a “new” corpus. 4.Repeat steps 2-3.  Steps 2-3 can be expensive. Various ways that try to solve the problem.

Using TBL The initial state-annotator The space of allowable transformations –Rewrite rules –Triggering environments The objective function: minimize error rate directly. –for comparing the corpus to the truth –for choosing a transformation

Using TBL (cont) Two more parameters: –Whether the effect of a transformation is visible to following transformations –If so, what’s the order in which transformations are applied to a corpus? left-to-right right-to-left

The order matters Transformation: If prev_label=A then change the cur_label from A to B. Input: A A A A A A Output: –Not immediate results: A B B B B B –Immediate results, left-to-right: A B A B A B –Immediate results, right-to-left: A B B B B B

Relation between DT, DL, and TBL

DT and TBL DT is a subset of TBL (Proof) when depth(DT)=1: 1.Label with S 2.If X then S  A 3.S  B

DT is a subset of TBL L1: Label with S’ L1’ T1T2 L2:: Label with S’’ L2’ Label with S If X then S  S’ S  S’’ L1’ L2’ Depth=n: Depth=n+1:

DT is a subset of TBL Label with S If X then S  S’ S  S’’ L1’ (renaming X with X’) L2’ (renaming X with X’’) X’  X X’’  X

DT is a proper subset of TBL There exists a problem that can be solved by TBL but not a DT, for a fixed set of primitive queries. Ex: Given a sequence of characters –Classify a char based on its position If pos % 4 == 0 then “yes” else “no” –Input attributes available: previous two chars

Transformation list: –Label with S: A/S A/S A/S A/S A/S A/S A/S –If there is no previous character, then S  F A/F A/S A/S A/S A/S A/S A/S –If the char two to the left is labeled ith F, then S  F A/F A/S A/F A/S A/F A/S A/F –If the char two to the left is labeled with F, then F  S A/F A/S A/S A/S A/F A/S A/S –F  yes –S  no

DT and TBL TBL is more powerful than DT Extra power of TBL comes from –Transformations are applied in sequence –Results of previous transformations are visible to following transformations.

DL and TBL DL is a proper subset of TBL. In two-class TBL: –(if q then y’  y)  (if q then y) –If multiple transformations apply to an example, only the last one matters

Two-class TBL  DL ? Two-class TBL  DL –Replace “if q then y’  y” with “if q then y” –Reverse the rule order DL  two-class TBL –Replace “if q then y” with “if q then y’  y” –Reverse the rule order  does not hold for “dynamic” problems: –Dynamic problem: the anwers to questions are not static: –Ex: in POS tagging, when the tag of a word is changed, it changes the answers to questions for nearby words.

DT, DL, and TBL (summary) K-DT is a proper subset of k-DL. DL is a proper subset of TBL. Extra power of TBL comes from –Transformations are applied in sequence –Results of previous transformations are visible to following transformations. TBL transforms training data. It does not split training data. TBL is a processor, not just a classifier

Case study

TBL for POS tagging The initial state-annotator: most common tag for a word. The space of allowable transformations –Rewrite rules: change cur_tag from X to Y. –Triggering environments (feature types): unlexicalized or lexicalized

Unlexicalized features t -1 is z t -1 or t -2 is z t -1 or t -2 or t -3 is z t -1 is z and t +1 is w …

Lexicalized features w 0 is w. w -1 is w w -1 or w -2 is w t -1 is z and w 0 is w. …

TBL for POS tagging (cont) The objective function: tagging accuracy –for comparing the corpus to the truth: –For choosing a transformation: choose the one that results in the greatest error reduction. The order of applying transformations: left-to- right. The results of applying transformations are not visible to other transformations.

Learned transformations

Experiments

Uncovered issues Efficient learning algorithms Probabilistic TBL –Top-N hypothesis –Confidence measure

TBL Summary TBL is more powerful than DL and DT. –Transformations are applied in sequence –It can handle dynamic problems well. TBL is more than a classifier –Classification problems: POS tagging –Other problems: e.g., parsing TBL performs well because it minimizes errors directly. Learning can be expensive  various methods