Download presentation

Presentation is loading. Please wait.

Published byJonas Wells Modified over 6 years ago

1
STRUCTURED PERCEPTRON Alice Lai and Shi Zhi

2
Presentation Outline Introduction to Structured Perceptron ILP-CRF Model Averaged Perceptron Latent Variable Perceptron

3
Motivation An algorithm to learn weights for structured prediction Alternative to POS tagging with MEMM and CRF (Collins 2002) Convergence guarantees under certain conditions even for inseparable data Generalizes to new examples and other sequence labeling problems

4
POS Tagging Example D N V A D N V A D N V A D N V A D N V A Example:the mansawdog

5
MEMM Approach Conditional model: probability of the current state given previous state and current observation For tagging problem, define local features for each tag in context Features are often indicator functions Learn parameter vector α with Generalized Iterative Scaling or gradient descent

6
Global Features Local features are defined only for a single label Global features are defined for an observed sequence and a possible label sequence Simple version: global features are local features summed over an observation-label sequence pair Compared to original perceptron algorithm, we have prediction of a vector of labels instead of a single label Which of the possible incorrect label vectors do we use as the negative example in training?

7
Structured Perceptron Algorithm

8
Properties

9
Global vs. Local Learning Global learning (IBT): constraints are used during training Local learning (L+I): classifiers are trained without constraints, constraints are applied later to produce global output Example: ILP-CRF model [Roth and Yih 2005]

10
Perceptron IBT

11
Perceptron I+L

12
ILP-CRF Introduction [Roth and Yih 2005] ILP-CRF model for Semantic Role Labeling as a sequence labeling problem Viterbi inference for CRFs can include constraints Cannot handle long-range or general constraints Viterbi is a shortest path problem that can be solved with ILP Use integer linear programming to express general constraints during inference Allows incorporation of expressive constraints, including long-range constraints between distant tokens that cannot be handled by Viterbi s A B C A B C A B C A B C A B C t

13
ILP-CRF Models CRF trained with max log-likelihood CRF trained with voted perceptron I+L IBT Local training (L+I) Perceptron, winnow, voted perceptron, voted winnow

14
ILP-CRF Results Sequential Models Local L+I IBT

15
ILP-CRF Conclusions Performance of local learning models perform poorly improves dramatically when constraints are added at evaluation Performance is comparable to IBT methods The best models for global and local training show comparable results L+I vs. IBT: L+I requires fewer training examples, is more efficient, outperforms IBT in most situations (unless local problems are difficult to solve) [Punyakanok et. al, IJCAI 2005]

16
Variations: Voted Perceptron For iteration t=1,…,T For example i=1,…,n Given parameter,by Viterbi Decoding, Get sequence labels for one example Each example define a tagging sequence. The voted perceptron: takes the most frequently ocurring output in the set

17
Variations: Voted Perceptron Averaged algorithm(Collins‘02): approximation of the voted method. It takes the averaging parameter instead of final parameter Performance: Higher F-Measure, Lower error rate Greater Stability on variance in its scores Variation: modified averaged algorithm for latent perceptron

18
Variations: Latent Structure Perceptron Model Definition is the parameter for perceptron. is the feature encoding function mapping to feature vector In NER task, x is word sequence, y is the named-entity type sequence, h is the hidden latent variable sequence. Features: unigram bigram for word, POS and orthography (prefix, upper/lower case) Why latent variables? Capture latent dependencies (i.e. hidden sub-structure)

19
Variations: Latent Structure Perceptron Purely Latent Structure Perceptron(Connor’s) Training(Structure perceptron with margin) C: margin Alpha: learning rate Variation: modified averaging parameter method(Sun’s): re-initiate the parameter with averaged parameter in each k iteration. Advantage: reduce overfitting of the latent perceptron

20
Variations: Latent Structure Perceptron Disadvantage of purely latent perceptron: h* is found and then forgotten for each x. Solution: Online Latent Classifier (Connor’s) Two classifiers: latent classifier: parameter: u label classifier: parameter: w

21
Variations: Latent Structure Perceptron Online Latent Classifier Training(Connor’s)

22
Variations: Latent Structure Perceptron Experiments: Bio-NER with purely latent perceptron cc: cut-off Odr:#order dependency Train-time F-measure High-order

23
Variations: Latent Structure Perceptron Experiments: Semantic Role Labeling with argument/predicate as latent structure X: She likes yellow flowers (sentence) Y: agent predicate ------ patient (role) H: predicate: only one; argument: at least one (latent structure) Optimization for (h*,y*): search all possible argument/predicate structure. For more complex data, need other methods. On test set:

24
Summary Structured Perceptron definition and motivation IBT vs. L+I Variations of Structure Perceptron References: Discriminative Training for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms, M. Collins, EMNLP 2002. Latent Variable Perceptron Algorithm for Structured Classification, Sun, Xu, Takuya Matsuzaki, Daisuke Okanohara and Jun'ichi Tsujii, IJCAI 2009. Integer Linear Programming Inference for Conditional Random Fields, D. Roth, W. Yih, ICML 2005. Online Latent Structure Training for Language Acquisition, M. Connor and C. Fisher and D. Roth, IJCAI 2011

Similar presentations

© 2021 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google