Presentation is loading. Please wait.

Presentation is loading. Please wait.

Course Summary LING 572 Fei Xia 03/06/07. Outline Problem description General approach ML algorithms Important concepts Assignments What’s next?

Similar presentations

Presentation on theme: "Course Summary LING 572 Fei Xia 03/06/07. Outline Problem description General approach ML algorithms Important concepts Assignments What’s next?"— Presentation transcript:

1 Course Summary LING 572 Fei Xia 03/06/07

2 Outline Problem description General approach ML algorithms Important concepts Assignments What’s next?

3 Problem descriptions

4 Two types of problems Classification problem Sequence Labeling problem In both cases: –A predefined set of labels: C = {c 1, c 2, …c n } –Training data: { (x i, y i ) }, where y i 2 C, and y i is known or unknown. –Test data

5 NLP tasks Classification problems: –Document classification –Spam detection –Sentiment analysis –…–… Sequence labeling problems: –POS tagging –Word segmentation –Sentence segmentation –NE detection –Parsing –IGT detection –…–…

6 General approach

7 Step 1: Preprocessing Converting the NLP task to a classification or sequence labeling problem Creating the attribute-value table: –Define feature templates –Instantiate feature templates and select features –Decide what kind of feature values to use (e.g., binarizing features or not) –Converting a multi-class problem to a binary problem (optional)

8 Feature selection Dimensionality reduction –Feature selection Wrapping methods Filtering methods: –Mutual info,  2, Information gain, …. –Feature extraction Term clustering: Latent semantic indexing (LSI)

9 Multiclass  Binary One-vs-all All-pairs Error-correcting Output Codes (ECOC)

10 Step 2: Training and decoding Choose a ML learner Train and test on development set, with different settings of non-model parameters Choose the best setting for the development set Run the learner on the test data with the best setting

11 Step 3: Post-processing Label sequence  the output we want System combination –Voting: majority voting, weighted voting –More sophisticated models

12 Supervised algorithms

13 Main ideas kNN and Ricchio: finding the nearest neighbors / prototypes DT and DL: finding the right group NB, MaxEnt: calculating P(y | x) Bagging: Reducing the instability Boosting: Forming a committee TBL: Improving the current guess

14 ML learners Modeling Training Testing (a.k.a. decoding)

15 Modeling NB: assuming features are conditionally independent. MaxEnt:

16 Training kNN: no training Rocchio: calculate prototypes DT: build a decision tree –Choose a feature and then split data DL: build a decision list: –Choose a decision rule and then spit data TBL: build a transformation list by –Choose a transformation and then update the current label field

17 Training (cont) NB: calculate P(c i ) and P(f j | c i ) by simple counting. MaxEnt: calculate the weights of feature functions by iteration. Bagging: create bootstrap samples and learn base classifiers. Boosting: learn base classifiers and their weights.

18 Testing kNN: calculate distances between x and x i, find the closest neighbors. Rocchio: calculate distances between x and prototypes. DT: traverse the tree DL: find the first matched decision rule. TBL: apply transformations one by one.

19 Testing (cont) NB: calc MaxEnt: calc Bagging: run the base classifiers and choose the class with highest votes. Boosting: run the base classifiers and calc the weighted sum.

20 Sequence labeling problems With classification algorithms: –Having features that refer to previous tags –Using beam search to find good sequences With sequence labeling algorithms: –HMM –TBL –MEMM –CRF –…–…

21 Semi-supervised algorithms Self-training Co-training …  Adding some unlabeled data to the labeled data

22 Unsupervised algorithms MLE EM: –General algorithm: E-step, M-step –EM for PM models Forward-backward for HMM Inside-outside for PCFG IBM models for MT

23 Important concepts

24 Concepts Attribute-value table Feature templates vs. features Weights: –Feature weights –Classifier weights –Instance weights –Feature values

25 Concepts (cont) Maximum entropy vs. Maximum likelihood Maximize likelihood vs. minimize training error Training time vs. test time Training error vs. test error Greedy algorithm vs. iterative approach

26 Concepts (cont) Local optima vs. global optima Beam search vs. Viterbi algorithm Sample vs. resample Model parameters vs. non-model parameters

27 Assignments

28 Read code: –NB: binary features? –DT: difference between DT and C4.5 –Boosting: AdaBoost and AdaBoostM2 –MaxEnt: binary features? Write code: –Info2Vectors –BinVectors – 2– 2 Complete two projects

29 Projects Steps: –Preprocessing –Training and testing –Postprocssing Two projects: –Project 1: Document classification –Project 2: IGT detection

30 Project 1: Document classification A typical classification problem Data are prepared already –Feature template: word appeared in the doc –Feature value: word frequency

31 Project 2: IGT detection Can be framed as a sequence labeling problem –Preprocessing: Define label set –Postprocessing: Tag sequence  spans Sequence labeling problem  using classification algorithm with beam search To use classification classifiers: –Preprocessing: Define features Choose feature values …

32 Project 2 (cont) Preprocessing: –Define label set –Define feature templates –Decide on feature values Training and decoding –Write beam search Postprocessing –Convert label sequence  spans

33 Project 2 (cont) Presentation Final report A typical conference paper: –Introduction –Previous work –Methodology –Experiments –Discussion –Conclusion

34 Using Mallet Difficulties: –Java –A large package Benefits: –Java –A large package –Many learning algorithms: comparing the implementation with “standard” algorithms

35 Bugs in Mallet? In Hw9, include a new section: –Bugs –Complaints –Things you like about Mallet

36 Course summary 9 weeks: 18 sessions 2 kinds of problems 9 supervised algorithms 1 semi-supervised algorithm 1 unsupervised algorithm 4 related issues: feature selection, multiclass  binary, system combination, beam search 2 projects 1 well-known package 9 assignments, including 1 presentation and 1 final report N papers

37 What’s the next? Learn more about the algorithms covered in class. Learn new algorithms: –SVM, CRF, regression algorithms, graphical models, … Try new tasks: –Parsing, spam filtering, reference resolution, …

38 Misc Hw7: due tomorrow 11pm Hw8: due Thursday 11pm Hw9: due 3/13 11pm Presentation: No more than 15+5 minutes

39 What must be included in the presentation? Label set Feature templates Effect of beam search 3+ ways to improve the system and results on dev data (test_data/) Best system: results on dev data and the setting Results on test data (more_test_data/)

40 Grades, etc. 9 assignments + class participation Hw1-Hw6: –Total: 740 –Max: 696.56 –Min: 346.52 –Ave: 548.74 –Median: 559.08

Download ppt "Course Summary LING 572 Fei Xia 03/06/07. Outline Problem description General approach ML algorithms Important concepts Assignments What’s next?"

Similar presentations

Ads by Google