Presentation is loading. Please wait.

Presentation is loading. Please wait.

Large Scale Multi-Label Classification via MetaLabeler Lei Tang Arizona State University Suju Rajan and Vijay K. Narayanan Yahoo! Data Mining & Research.

Similar presentations


Presentation on theme: "Large Scale Multi-Label Classification via MetaLabeler Lei Tang Arizona State University Suju Rajan and Vijay K. Narayanan Yahoo! Data Mining & Research."— Presentation transcript:

1 Large Scale Multi-Label Classification via MetaLabeler Lei Tang Arizona State University Suju Rajan and Vijay K. Narayanan Yahoo! Data Mining & Research

2 Yahoo! Data Mining & Research Large Scale Multi-Label Classification Huge number of instances and categories Common for online contents Web Page Classification Query Categorization Video Annotation/Organization Social Bookmark/Tag Recommendation

3 Yahoo! Data Mining & Research Challenges Multi-Class: t housands of categories Multi-Label: e ach instance has >1 labels Large Scale: huge number of instances and categories –Our query categorization problem: 1.5M queries, 7K categories –Yahoo! Directory 792K docs, 246K categories in Liu et al. 05 Most existing multi-label methods do not scale –structural SVM, mixture model, collective inference, maximum- entropy model, etc. The simplest One-vs-Rest SVM is still widely used

4 Yahoo! Data Mining & Research One-vs-Rest SVM x1x1 C 1, C 3 x2x2 C 1, C 2, C 4 x3x3 C2C2 x4x4 C 2, C 4 x1x1 + x2x2 + x3x3 - x4x4 - x1x1 - x2x2 + x3x3 + x4x4 + x1x1 + x2x2 - x3x3 - x4x4 - x1x1 - x2x2 + x3x3 - x4x4 + C1C1 C2C2 C3C3 C4C4 SVM 1 SVM 2 SVM 3 SVM 4 C1C1 C2C2 C3C3 C4C4 Predict

5 Yahoo! Data Mining & Research One-vs-Rest SVM Pros: –Simple, Fast, Scalable –Each label trained independently, easy to parallel Cons: –Highly skewed class distribution (few +, many -) –Biased prediction scores Output reasonable good ranking (Rifkin and Klauta 04) –e.g. 4 categories C 1, C 2, C 3, C 4 –True Labels for x 1 : C 1, C 3 –Prediction Scores: {s 1, s 3 } > {s 2, s 4 } Predict the number of labels?

6 Yahoo! Data Mining & Research MetaLabeler Algorithm 1.Obtain a ranking of class membership for each instance –Any genetic ranking algorithm can be applied –Use One-vs-Rest SVM 2.Build a Meta Model to predict the number of top classes –Construct Meta Label –Construct Meta Feature –Build Meta Model

7 Yahoo! Data Mining & Research Meta Model – Training Q 2 = cotton children jeans Labels: Children clothing Q 3 = leather fashion in 1990s Labels: Fashion Women Clothing Leather Clothing Q 1 = affordable cocktail dress Labels: Formal wear Women Clothing Q1: 2 Q2: 1 Q3: 3 Meta data Query: #labels Meta-Model One-vs-Rest SVM Clothing Women Clothing Formal wear Fashion Children Clothing Regression Leather clothing How to handle predictions like 2.5 labels?

8 Yahoo! Data Mining & Research Meta Feature Construction Content-Based –Use raw data –Raw data contains all the info Score-Based –Use prediction scores –Bias with scores might be learned Rank-Based –Use sorted prediction scores C1C2C3C C1C2C3C4 Meta Feature Meta Feature

9 Yahoo! Data Mining & Research MetaLabeler Prediction Given one instance: –Obtain the rankings for all labels; –Use the meta model to predict the number of labels –Pick the top-ranking labels MetaLabeler –Easy to implement –Use existing SVM package/software directly –Can be combined with a hierarchical structure easily Simply build a Meta Model at each internal node

10 Yahoo! Data Mining & Research Baseline Methods Existing thresholding methods (Yang 2001) –Rank-based Cut (Rcut) output fixed number of top-ranking labels for each instance –Proportion-based Cut For each label, choose a portion of test instances as positive Not applicable for online prediction –Score-based Cut (Scut, aka. threshold tuning) For each label, determine a threshold based on cross-validation Tends to overfit and is not very stable MetaLabeler: A local RCut method –Customize the number of labels for each instance

11 Yahoo! Data Mining & Research Publicly Available Benchmark Data Yahoo! Web Page Classification –11 data sets: each constructed from a top-level category 2 nd level topics are the categories –16-32k instances, 6-15k features, categories – labels per instance, maximum 17 labels –Each label has at least 100 instances RCV1:RCV1 –A large scale text corpus –101 categories, 3.2 labels per instance –For evaluation purpose, use 3000 for training, 3000 for testing –Highly skewed distribution (some labels have only 3-4 instances)

12 Yahoo! Data Mining & Research MetaLabeler of Different Meta Features Which type of meta feature is more predictive? Content-based MetaLabeler outperforms other meta features

13 Yahoo! Data Mining & Research Performance Comparison MetaLabeler tends to outperform other methods

14 Yahoo! Data Mining & Research Bias with MetaLabeler The distribution of number of labels is imbalanced –Most instances have small number of labels; –Small portion of data instances have many more labels Imbalanced Distribution leads to bias in MetaLabeler –Prefer to predict lesser labels –Only predict many labels with strong confidence

15 Yahoo! Data Mining & Research Scalability Study Threshold tuning requires cross-validation, otherwise overfit MetaLabeler simply adds some meta labels and learn One-vs- Rest SVMs

16 Yahoo! Data Mining & Research Scalability Study (cond.) Threshold tuning: linearly increasing with number of categories in the data –E.g categories -> 6000 thresholds to be tuned MetaLabeler: upper bounded by the maximum number of labels with one instance –E.g categories –but one instance has at most 15 labels –Just need to learn additional 15 binary SVMs Meta Model is “independent” of number of categories

17 Yahoo! Data Mining & Research Application to Large Scale Query Categorization Query categorization problem: –1.5 million unique queries: 1M for training, 0.5M for testing –120k features –A 8-level taxonomy of 6433 categories Multiple labels –e.g. 0% interest credit card no transfer fee Financial Services/Credit, Loans and Debt/Credit/Credit Card/ Balance Transfer Financial Services/Credit, Loans and Debt/Credit/Credit Card/ Low Interest Card Financial Services/Credit, Loans and Debt/Credit/Credit Card/ Low-No-fee Card 1.23 labels on average At most 26 labels

18 Yahoo! Data Mining & Research Flat Model Flat Model: do not leverage the hierarchical structure –Threshold tuning on training data alone takes 40 hours to finish while MetaLabeler costs 2 hours.

19 Yahoo! Data Mining & Research Hierarchical Model - Training Root Training Data N New Training Data Step 1: Generate Training Data Step 2: Roll up labels Step 4: Train One vs. Rest SVM Other Step 3: Create “Other” Category

20 Yahoo! Data Mining & Research Hierarchical Model - Prediction Root Query q Predict using SVMs trained at root level Query q Stop !!! Stop if reaching a leaf node or “other” category m1m1 m2m2 m3m3 m2m2 m3m3 m4m4 c1c1 c2c2 c3c3 Other Stop !!!

21 Yahoo! Data Mining & Research Hierarchical Model + MetaLabeler Precision decrease by 1-2%, but recall is improved by 10% at deeper levels.

22 Yahoo! Data Mining & Research Features in MetaLabeler FeatureRelated Categories Overstock.com –Mass Merchants/…/discount department stores –Apparel & Jewelry –Electronics & Appliances –Home & Garden –Books-Movies-Music-Tickets Blizard –Toys & Hobbies/…/Video Game –Computing/…/Computer Game Software –Entertainment & Social Event/…/Fast Food Restaurant –Reference/News/Weather Information Threading – Books-Movies-Music-Tickets/…/Computing Books – Computing/…/Programming – Health and Beauty/…/Unwanted Hair – Toys and Hobbies/…/Sewing

23 Yahoo! Data Mining & Research Conclusions & Future Work MetaLabeler is promising for large-scale multi-label classification –Core idea: learn a meta model to predict the number of labels –Simple, efficient and scalable –Use existing SVM software directly –Easy for practical deployment Future work –How to optimize MetaLabeler for desired performance ? E.g. > 95% precision –Application to social networking related tasks

24 Questions?

25 Yahoo! Data Mining & Research References Liu, T., Yang, Y., Wan, H., Zeng, H., Chen, Z., and Ma, W Support vector machines classification with a very large-scale taxonomy. SIGKDD Explor. Newsl. 7, 1 (Jun. 2005), Rifkin, R. and Klautau, A In Defense of One-Vs-All Classification. J. Mach. Learn. Res. 5 (Dec. 2004), Yang, Y A study of thresholding strategies for text categorization. In Proceedings of the 24th Annual international ACM SIGIR Conference on Research and Development in information Retrieval (New Orleans, Louisiana, United States). SIGIR '01. ACM, New York, NY,

26 Yahoo! Data Mining & Research Hierarchical vs. Flat Model Flat model –Build a one-vs-rest SVM for all the labels –No taxonomy information during training. Hierarchical model has about 5% higher recall fat deeper levels.


Download ppt "Large Scale Multi-Label Classification via MetaLabeler Lei Tang Arizona State University Suju Rajan and Vijay K. Narayanan Yahoo! Data Mining & Research."

Similar presentations


Ads by Google