Download presentation

Presentation is loading. Please wait.

1
Course Summary LING 572 Fei Xia 03/06/07

2
Outline Problem description General approach ML algorithms Important concepts Assignments What’s next?

3
Problem descriptions

4
Two types of problems Classification problem Sequence Labeling problem In both cases: –A predefined set of labels: C = {c 1, c 2, …c n } –Training data: { (x i, y i ) }, where y i 2 C, and y i is known or unknown. –Test data

5
NLP tasks Classification problems: –Document classification –Spam detection –Sentiment analysis –…–… Sequence labeling problems: –POS tagging –Word segmentation –Sentence segmentation –NE detection –Parsing –IGT detection –…–…

6
General approach

7
Step 1: Preprocessing Converting the NLP task to a classification or sequence labeling problem Creating the attribute-value table: –Define feature templates –Instantiate feature templates and select features –Decide what kind of feature values to use (e.g., binarizing features or not) –Converting a multi-class problem to a binary problem (optional)

8
Feature selection Dimensionality reduction –Feature selection Wrapping methods Filtering methods: –Mutual info, 2, Information gain, …. –Feature extraction Term clustering: Latent semantic indexing (LSI)

9
Multiclass Binary One-vs-all All-pairs Error-correcting Output Codes (ECOC)

10
Step 2: Training and decoding Choose a ML learner Train and test on development set, with different settings of non-model parameters Choose the best setting for the development set Run the learner on the test data with the best setting

11
Step 3: Post-processing Label sequence the output we want System combination –Voting: majority voting, weighted voting –More sophisticated models

12
Supervised algorithms

13
Main ideas kNN and Ricchio: finding the nearest neighbors / prototypes DT and DL: finding the right group NB, MaxEnt: calculating P(y | x) Bagging: Reducing the instability Boosting: Forming a committee TBL: Improving the current guess

14
ML learners Modeling Training Testing (a.k.a. decoding)

15
Modeling NB: assuming features are conditionally independent. MaxEnt:

16
Training kNN: no training Rocchio: calculate prototypes DT: build a decision tree –Choose a feature and then split data DL: build a decision list: –Choose a decision rule and then spit data TBL: build a transformation list by –Choose a transformation and then update the current label field

17
Training (cont) NB: calculate P(c i ) and P(f j | c i ) by simple counting. MaxEnt: calculate the weights of feature functions by iteration. Bagging: create bootstrap samples and learn base classifiers. Boosting: learn base classifiers and their weights.

18
Testing kNN: calculate distances between x and x i, find the closest neighbors. Rocchio: calculate distances between x and prototypes. DT: traverse the tree DL: find the first matched decision rule. TBL: apply transformations one by one.

19
Testing (cont) NB: calc MaxEnt: calc Bagging: run the base classifiers and choose the class with highest votes. Boosting: run the base classifiers and calc the weighted sum.

20
Sequence labeling problems With classification algorithms: –Having features that refer to previous tags –Using beam search to find good sequences With sequence labeling algorithms: –HMM –TBL –MEMM –CRF –…–…

21
Semi-supervised algorithms Self-training Co-training … Adding some unlabeled data to the labeled data

22
Unsupervised algorithms MLE EM: –General algorithm: E-step, M-step –EM for PM models Forward-backward for HMM Inside-outside for PCFG IBM models for MT

23
Important concepts

24
Concepts Attribute-value table Feature templates vs. features Weights: –Feature weights –Classifier weights –Instance weights –Feature values

25
Concepts (cont) Maximum entropy vs. Maximum likelihood Maximize likelihood vs. minimize training error Training time vs. test time Training error vs. test error Greedy algorithm vs. iterative approach

26
Concepts (cont) Local optima vs. global optima Beam search vs. Viterbi algorithm Sample vs. resample Model parameters vs. non-model parameters

27
Assignments

28
Read code: –NB: binary features? –DT: difference between DT and C4.5 –Boosting: AdaBoost and AdaBoostM2 –MaxEnt: binary features? Write code: –Info2Vectors –BinVectors – 2– 2 Complete two projects

29
Projects Steps: –Preprocessing –Training and testing –Postprocssing Two projects: –Project 1: Document classification –Project 2: IGT detection

30
Project 1: Document classification A typical classification problem Data are prepared already –Feature template: word appeared in the doc –Feature value: word frequency

31
Project 2: IGT detection Can be framed as a sequence labeling problem –Preprocessing: Define label set –Postprocessing: Tag sequence spans Sequence labeling problem using classification algorithm with beam search To use classification classifiers: –Preprocessing: Define features Choose feature values …

32
Project 2 (cont) Preprocessing: –Define label set –Define feature templates –Decide on feature values Training and decoding –Write beam search Postprocessing –Convert label sequence spans

33
Project 2 (cont) Presentation Final report A typical conference paper: –Introduction –Previous work –Methodology –Experiments –Discussion –Conclusion

34
Using Mallet Difficulties: –Java –A large package Benefits: –Java –A large package –Many learning algorithms: comparing the implementation with “standard” algorithms

35
Bugs in Mallet? In Hw9, include a new section: –Bugs –Complaints –Things you like about Mallet

36
Course summary 9 weeks: 18 sessions 2 kinds of problems 9 supervised algorithms 1 semi-supervised algorithm 1 unsupervised algorithm 4 related issues: feature selection, multiclass binary, system combination, beam search 2 projects 1 well-known package 9 assignments, including 1 presentation and 1 final report N papers

37
What’s the next? Learn more about the algorithms covered in class. Learn new algorithms: –SVM, CRF, regression algorithms, graphical models, … Try new tasks: –Parsing, spam filtering, reference resolution, …

38
Misc Hw7: due tomorrow 11pm Hw8: due Thursday 11pm Hw9: due 3/13 11pm Presentation: No more than 15+5 minutes

39
What must be included in the presentation? Label set Feature templates Effect of beam search 3+ ways to improve the system and results on dev data (test_data/) Best system: results on dev data and the setting Results on test data (more_test_data/)

40
Grades, etc. 9 assignments + class participation Hw1-Hw6: –Total: 740 –Max: 696.56 –Min: 346.52 –Ave: 548.74 –Median: 559.08

Similar presentations

© 2020 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google