Report on Semi-supervised Training for Statistical Parsing Zhang Hao 2002-12-18.

Report on Semi-supervised Training for Statistical Parsing Zhang Hao 2002-12-18

Brief Introduction  Why semi-supervised training?  Co-training framework and applications  Can parsing fit in this framework?  How?  Conclusion

Why Semi-supervised Training  Compromise between su… and unsu…  Pay-offs: –Minimize the need for labeled data –Maximize the value of unlabeled data –Easy portability

Co-training Scenario  Idea: two different students learn from each other, incrementally, mutually improving  “ 二人行必有我师 ” difference(motive) –mutual learning(optimize)-> agreement(objective).  Task: to optimize the objective function of agreement.  Heuristic selection is important: what to learn?

[Blum & Mitchell, 98] Co- training Assumptions  Classification problem  Feature redundancy –Allows different views of data –Each view is sufficient for classification  View independency of features, given class

[Blum & Mitchell, 98] Co- training example  “Course home page” classification (y/n)  Two views: content text/anchor text (more perfect example: two sides of a coin)  Two naïve Bayes classifiers: should agree

[Blum & Mitchell, 98] Co- Training Algorithm Given: A set L of labeled training examples A set U of unlabeled examples Create a pool U’ of examples by choosing u examples at random from U. Loop for k iterations: Use L to train a classifier h1 that considers only the x1 portion of x Use L to train a classifier h2 that considers only the x2 portion of x Allow h1 to label p positive and n negative examples from U’ Allow h2 to label p positive and n negative examples from U’ Add these self-labeled examples to L Randomly choose 2p+2n examples from U to replenish U’ n:p matches the ratio of negtive to positive examples The selected examples are those “most confidently” labeled ones, i.e. heuristic selection

Family of Algorithms Related to Co-training MethodFeature Split (Yes) Feature Split (No) IncrementalCo-trainingSelf-training IterativeCo-EMEM [Nigam & Ghani 2000]

Parsing As Supertagging and Attaching [Sarkar 2001]  The difference between parsing and other NLP applications:WSD, WBPC, TC, NEI –A tree vs. A label –Composite vs. Monolithic –Large parameter space vs. Small …  LTAG –Each word is tagged with a lexicalized elementary tree (supertagging) –Parsing is a process of substitution and adjoining of elementary trees –Supertagger finishes a very large part of job a traditional parser must do

A glimpse of Suppertags

Two Models to Co-training  H1: selects trees based on previous context (tagging probability model)  H2: computes attachment between trees and returns best parse (parsing probability model)

[Sarkar 2000] Co-training Algorithm 1. Input: labeled and unlabeled 2. Update cache Randomly select sentences from unlabeled and refill cache If cache is empty; Exit 3. Train models H1 and H2 using labeled 4. Apply H1 and H2 to cache 5. Pick most probable n from H1 (run through H2) and add to labeled 6. Pick most probable n from H2 and add to labeled 7. n=n+k; Go to step 2

JHU SW2002 tasks  Co-train Collins CFG parser with Sarkar LTAG parser  Co-train Rerankers  Co-train CCG supertaggers and parsers

Co-training: The Algorithm  Requires: Two learners with different views of the task Cache Manager (CM) to interface with the disparate learners A small set of labeled seed data and a larger pool of unlabelled data  Pseudo-Code: –Init: Train both learners with labeled seed data –Loop: CM picks unlabelled data to add to cache Both learners label cache CM selects newly labeled data to add to the learners' respective training sets Learners re-train

Novel Methods-Parse Selection  Want to select training examples for one parser (student) labeled by the other (teacher) so as to minimize noise and maximize training utility. –Top-n: choose the n examples for which the teacher assigned the highest scores. –Difference: choose the examples for which the teacher assigned a higher score than the student by some threshold. –Intersection: choose the examples that received high scores from the teacher but low scores from the student. –Disagreement: choose the examples for which the two parsers provided different analyses and the teacher assigned a higher score than the student.

Effect of Parse Selection

CFG-LTAG Co-training

Re-rankers Co-training  What is Re-ranking? –A re-ranker reorders the output of an n- best (probabilistic) parser based on features of the parse –While parsers use local features to make decisions, re-rankers use features that can span the entire tree –Instead of co-training parsers, co-train different re-rankers

Re-rankers Co-training  Motivation: Why re-rankers? – Speed parse data once reordered many times – Objective function The lower runtime of re-rankers allows us to explicitly maximize agreement between parses

Re-rankers Co-training  Motivation: Why re-rankers? – Accuracy Re-rankers can improve performance of existing parsers Collins ’00 cites a 13 percent reduction of error rate by re-ranking – Task closer to classification A re-ranker can be seen as a binary classifier: either a parse is the best for a sentence or it isn’t This is the original domain cotraining was intended for

Re-rankers Co-training  Experimental. But much to be explored. Remember: a re-ranker is easier to develop –Reranker 1: Log linear model –Reranker 2: Linear perceptron model – Room for improvement: Current best parser: 89.7 Oracle that picks best parse from top 50: 95 +

JHU SW2002 Conclusion –Largest experimental study to date on the use of unlabelled data for improving parser performance. –Co-training enhances performance for parsers and taggers trained on small (500—10,000 sentences) amounts of labeled data. –Co-training can be used for porting parsers trained on one genre to parse on another without any new human-labeled data at all, improving on state-of-the-art for this task. –Even tiny amounts of human-labelled data for the target genre enhace porting via co-training. –New methods for Parse Selection have been developed, and play a crucial role.

How to Improve Our Parser?  Similar setting: limited labeled data(Penn CTB) large amount of unlabeled and somewhat deferent domain data(PKU People Daily)  To try: –Re-rankers’ developing cycle is much shorter, worthy of trying. Many ML techniques may be utilized. –Re-rankers’ agreement is still an open question

Report on Semi-supervised Training for Statistical Parsing Zhang Hao 2002-12-18.

Similar presentations

Presentation on theme: "Report on Semi-supervised Training for Statistical Parsing Zhang Hao 2002-12-18."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Report on Semi-supervised Training for Statistical Parsing Zhang Hao 2002-12-18.

Similar presentations

Presentation on theme: "Report on Semi-supervised Training for Statistical Parsing Zhang Hao 2002-12-18."— Presentation transcript:

Similar presentations

About project

Feedback