Presentation is loading. Please wait.

Presentation is loading. Please wait.

Improving Text Classification by Shrinkage in a Hierarchy of Classes Andrew McCallum Just Research & CMU Tom Mitchell CMU Roni Rosenfeld CMU Andrew Y.

Similar presentations


Presentation on theme: "Improving Text Classification by Shrinkage in a Hierarchy of Classes Andrew McCallum Just Research & CMU Tom Mitchell CMU Roni Rosenfeld CMU Andrew Y."— Presentation transcript:

1 Improving Text Classification by Shrinkage in a Hierarchy of Classes Andrew McCallum Just Research & CMU Tom Mitchell CMU Roni Rosenfeld CMU Andrew Y. Ng MIT AI Lab

2 2 The Task: Document Classification (also “Document Categorization”, “Routing” or “Tagging”) Automatically placing documents in their correct categories. MagnetismRelativityEvolutionBotany Irrigation Crops corn wheat silo farm grow... corn tulips splicing grow... water grating ditch farm tractor... selection mutation Darwin Galapagos DNA...... “grow corn tractor…” Training Data: Testing Data: Categories: (Crops)

3 3 The Idea: “Shrinkage” / “Deleted Interpolation” We can improve the parameter estimates in a leaf by averaging them with the estimates in its ancestors. MagnetismRelativity Physics EvolutionBotany Irrigation Crops BiologyAgriculture Science corn wheat silo farm grow... corn tulips splicing grow... water grating ditch farm tractor... “corn grow tractor…” selection mutation Darwin Galapagos DNA...... Testing Data: Training Data: Categories: (Crops)

4 4 A Probabilistic Approach to Document Classification Maximum a posteriori estimate of Pr(w|c), with a Dirichlet prior,  =1 (AKA Laplace smoothing) Naïve Bayes where N(w,d) is number of times word w occurs in document d. where c j is a class, d is a document, w d i is the i th word of document d

5 5 “Shrinkage” / “Deleted Interpolation” [James and Stein, 1961] / [Jelinek and Mercer, 1980] (Uniform) MagnetismRelativity Physics EvolutionBotany Irrigation Crops BiologyAgriculture Science

6 6 Learning Mixture Weights Crops Agriculture Science Learn the ’s via EM, performing the E-step with leave-one-out cross-validation. Uniform corn wheat silo farm grow... Use the current ’s to estimate the degree to which each node was likely to have generated the words in held out documents. E-step M-step Use the estimates to recalculate new values for the ’s.

7 7 Learning Mixture Weights E-step M-step

8 8 Newsgroups Data Set mac ibm graphics windows X guns mideast auto motorcycle atheism christian misc baseball hockey misc computersreligionsportpoliticsmotor 15 classes, 15k documents, 1.7 million words, 52k vocabulary (Subset of Ken Lang’s 20 Newsgroups set)

9 9 Newsgroups Hierarchy Mixture Weights

10 10 Newsgroups Hierarchy Mixture Weights 235 training documents (15/class) 7497 training documents (~500/class)

11 11 Industry Sector Data Set water air railroad trucking misc coal oil&gas film communication electric water gas appliance furniture integrated transportationutilitiesconsumerenergyservices 71 classes, 6.5k documents, 1.2 million words, 30k vocabulary... … (11) www.marketguide.com

12 12 Industry Sector Classification Accuracy

13 13 Newsgroups Classification Accuracy

14 14 Yahoo Science Data Set dairy crops agronomy forestry AI HCI craft missions botany evolution cell magnetism relativity courses agriculturebiologyphysicsCSspace 264 classes, 14k documents, 3 million words, 76k vocabulary... … (30) www.yahoo.com/Science...

15 15 Yahoo Science Classification Accuracy

16 16 Pruning the tree for computational efficiency water air railroad trucking misc coal oil&gas film communication electric water gas appliance furniture integrated transportationutilitiesconsumerenergyservices... … (11) www.marketguide.com

17 17 Related Work Shrinkage in Statistics: –[Stein 1955], [James & Stein 1961] Deleted Interpolation in Language Modeling: –[Jelinek & Mercer 1980], [Seymore & Rosenfeld 1997] Bayesian Hierarchical Modeling for n-grams –[MacKay & Peto 1994] Class hierarchies for text classification –[Koller & Sahami 1997] Using EM to set mixture weights in a hierarchical clustering model for unsupervised learning –[Hofmann & Puzicha 1998]

18 18 Conclusions Shrinkage in a hierarchy of classes can dramatically improve classification accuracy (29%) Shrinkage helps especially when training data is sparse. In models more complex than naïve Bayes, it should be even more helpful. [The hierarchy can be pruned for exponential reduction in computation necessary for classification; only minimal loss of accuracy.]

19 19 Future Work Learning hierarchies that aid classification. Using more complex generative models. –Capturing word dependancies –Clustering words in each ancestor


Download ppt "Improving Text Classification by Shrinkage in a Hierarchy of Classes Andrew McCallum Just Research & CMU Tom Mitchell CMU Roni Rosenfeld CMU Andrew Y."

Similar presentations


Ads by Google