Download presentation
Presentation is loading. Please wait.
1
Efficient Text Categorization with a Large Number of Categories Rayid Ghani KDD Project Proposal
2
Text Categorization Numerous Applications Search Engines/Portals Customer Service …. Domains: Topics Genres Languages $$$ Making
3
How do people deal with a large number of classes? Use fast multiclass algorithms (Naïve Bayes) Builds one model per class Use Binary classification algorithms (SVMs) and break an n class problems into n binary problems What happens with a 1000 class problem? Can we do better?
4
ECOC to the Rescue! An n-class problem can be solved by solving log 2 n problems More efficient than one-per-class Does it actually perform better?
5
What is ECOC? Solve multiclass problems by decomposing them into multiple binary problems Use a learner to learn the binary problems
6
Training ECOC 0 0 1 1 0 1 0 1 0 0 0 1 1 1 0 0 1 0 0 1 ABCDABCD f 1 f 2 f 3 f 4 f 5 X 0 0 1 1 1 Testing ECOC
7
ECOC - Picture 0 0 1 1 0 1 0 1 0 0 0 1 1 1 0 0 1 0 0 1 ABCDABCD A D C B f 1 f 2 f 3 f 4 f 5
8
ECOC - Picture 0 0 1 1 0 1 0 1 0 0 0 1 1 1 0 0 1 0 0 1 ABCDABCD A D C B f 1 f 2 f 3 f 4 f 5
9
ECOC - Picture 0 0 1 1 0 1 0 1 0 0 0 1 1 1 0 0 1 0 0 1 ABCDABCD A D C B f 1 f 2 f 3 f 4 f 5
10
ECOC - Picture 0 0 1 1 0 1 0 1 0 0 0 1 1 1 0 0 1 0 0 1 ABCDABCD A D C B f 1 f 2 f 3 f 4 f 5 X 1 1 1 1 0
11
Classification Performance EfficiencyEfficiency NB ECOC Preliminary Results This Proposal ECOC reduces the error of the Naïve Bayes Classifier by 66% with no increase in computational cost
12
Proposed Solutions Design codewords that minimize cost and maximize “performance” Investigate the assignment of codewords to classes Learn the decoding function Incorporate unlabeled data into ECOC
13
Use unlabeled data Current learning algorithms using unlabeled data (EM, Co-Training) don’t work well with a large number of categories ECOC works great with a large number of classes but there is no framework for usaing unlabeled data
14
Use Unlabeled Data ECOC decomposes multiclass problems into binary problems Co-Training works great with binary problems ECOC + Co-Train = Learn each binary problem in ECOC with Co-Training (and variants of Co-Training such as Co-EM)
15
Summary
16
Testing ECOC To test a new instance Apply each of the n classifiers to the new instance Combine the predictions to obtain a binary string(codeword) for the new point Classify to the class with the nearest codeword (usually hamming distance is used as the distance measure)
17
The Decoding Step Standard: Map to the nearest codeword according to hamming distance Can we do better?
18
The Real Question? Tradeoff between “learnability” of binary problems and the error-correcting power of the code
19
Codeword assignment Standard Procedure: Assign codewords to classes randomly Can we do better?
20
Goal of Current Research Improve classification performance without increasing cost Design short codes that perform well Develop algorithms that increase performance without affecting code length
21
Previous Results Performance increases with length of code Gives the same percentage increase in performance over NB regardless of training set size BCH Codes > Random Codes > Hand- constructed Codes
22
Others have shown that ECOC Works great with arbitrary long codes Longer codes = More Error-Correcting Power = Better Performance Longer codes = More Computational Cost
23
ECOC to the Rescue! An n-class problem can be solved by solving log 2 n problems More efficient than one-per-class Does it actually perform better?
24
Previous Results Industry Sector Data Set Naïve Bayes Shrinkage 1 ME 2 ME/ w Prior 3 ECOC 63-bit 66.1%76%79%81.1%88.5% ECOC reduces the error of the Naïve Bayes Classifier by 66% with no increase in computational cost 1.(McCallum et al. 1998) 2,3. (Nigam et al. 1999) (Ghani 2000)
25
Design codewords Maximize Performance (Accuracy, Precision, Recall, F1?) Minimize length of codes Search in the space of codewords through gradient descent G=Error + Code_Length
26
Codeword Assignment Generate the confusion matrix and use that to assign the most confusable classes the codewords that are farthest apart Pros Focusing on confusable classes more can help Cons Individual binary problems can be very hard
27
The Decoding Step Weight the individual classifiers according to their training accuracies and do weighted majority decoding. Pose the decoding as a separate learning problem and use regression/Neural Network 10011001 11011101
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.