Download presentation

Presentation is loading. Please wait.

Published byNatalia Lillard Modified over 4 years ago

1
Multi-label Classification without Multi-label Cost - Multi-label Random Decision Tree Classifier 1.IBM Research – China 2.IBM T.J.Watson Research Center Presenter: Xiatian Zhang xiatianz@cn.ibm.com Authors: Xiatian Zhang, Quan Yuan, Shiwan Zhao, Wei Fan, Wentao Zheng, Zhong Wang

2
Multi-label Classification Classical Classification (Single Label Classification) –The classes are exclusive: if an example belongs to one class, it can’t be belongs to others Multi-label Classification –A picture, video, article may belong to several compatible categories –A pieces of gene can control several biological functions Tree LakeIce Winter Park

3
Existed Multi-label Classification Methods Grigorios Tsoumakas et al[2007] summarize the existing methods for ML-Classification Two Strategies –Problem Transformation –Transfer Multi-label Classification Problem to Single Classification Problem –Algorithm Adaptation –Adapt Single-label Classifiers to Solve the Multi-label Classification Problem –With high complexity

4
Problem Transformation Approaches Label Powerset (LP) –Label Powerset considers each unique subset of labels that exists in the multi-label dataset as a single label L1L2L3 L1 L2 L3 Binary Relevance (BR) –Binary Relevance learns one binary classier for each label L1 L2 L3 L4 Classifier L1+L2+L3+ L1-L2-L3- Classifier1Classifier2Classifier3

5
Large Number of Labels Problem Hundreds and even more labels –Text categorization –protein function classification –semantic annotation of multimedia The Impacts to Multi-label Classification Methods –Label Powerset: the number of training examples for each particular label will be much less –Binary Relevance: The computational complexity is with linear complexity with respect to the number of labels –Algorithm Adaptation: Even more worse than Binary Relevance

6
HOMER for Large Number of Labels Problem HOMER (Hierarchy Of Multilabel classifERs) is developed by Grigorios Tsoumakas et al, 2008. The HOMER algorithm constructs a Hierarchy Of Mul-tilabel classifERs, each one dealing with a much smaller set of labels.

7
Our Method – Without Label Cost Without Label Cost –Training Time is almost irrelevant with number of labels |L| But with Reliable Quality –The classification Quality can be compared to mainstream methods over different data sets. How to make it?

8
Our Method – Without Label Cost cont. Binary Relevance Method based on Random Decision Tree Random Decision Tree [Fan et al, 2003] –Training Process is irrelevant with label information –Random Construction with very low cost –Stable quality on many applications

9
Random Decision Tree – Tree Construction At each node, an un-used feature is chosen randomly –A discrete feature is un-used if it has never been chosen previously on a given decision path starting from the root to the current node. –A continuous feature can be chosen multiple times on the same decision path, but each time a different threshold value is chosen It stop when one of the following happens: –A node becomes too small (<= 4 examples). –Or the total height of the tree exceeds some limits: –Such as the total number of features. The construction process is irrelevant with label information

10
Random Decision Tree - Node Statistics Classification and Probability Estimation: –Each node of the tree keeps the number of examples belonging to each class. The node statistics process cost a little computation resource F1<0.5 F2>0.7 F3>0.3 +:200 -: 10 +:30 -: 70 Y Y N N N …

11
Random Decision Tree - Classification During classification, each tree outputs posterior probability: P(+|x)=30/100 =0.3 F1<0.5 F2>0.7 F3>0.3 +:200 -: 10 +:30 -: 70 Y Y N N N …

12
Random Decision Tree - Ensemble For a instance x, average the estimated probability on each tree and take the average probability as the predicted probability of x. P’(+|x)=30/50 =0.6 P(+|x)=30/100=0.3 (P(+|x)+P’(+|x))/2 = 0.45 F3>0.3 F2<0.6 F1>0.7 +:100 -:120 +:30 -: 20 Y Y N N N … F1<0.5 F2>0.7 F3>0.3 +:200 -: 10 +:30 -: 70 Y Y N N N …

13
Multi-label Random Decision Tree F1<0.5 F2>0.7 F3>0.3 Y Y N N N … L1+:30 L1-: 70 L2+:50 L2-: 50 L1+:200 L1-: 10 L2+:40 L2-: 60 F3>0.5 F2<0.7 F1>0.7 Y Y N N N … L1+:30 L1-: 20 L2+:20 L2-: 80 L1+:100 L1-:120 L1+:200 L1-: 10 P(L1+|x)=30/100=0.3P’(L1+|x)=30/50 =0.6 P(L2+|x)=50/100=0.5P’(L2+|x)=20/100=0.2 (P(L1+|x)+P’(L1+|x))/2 = 0.45 (P(L2+|x)+P’(L2+|x))/2 = 0.35

14
Why RDT Works? Ensemble Learning View –Our Analysis –Other Explanations Non-Parametric Estimation

15
Complexity of Multi-label Random Decision Tree Training Complexity: –m is the number of trees, and n is the number of instances –t is the average number of labels on each leaf nodes, t<<n, and t<<|L|. –It is irrelevant with number of labels |L|. –Complexity of C4.5: V i is the size of values of i-th attribute. –Complexity of HOMER: Test Complexity: –q is the average depth of branches of trees –It is also irrelevant with number of labels |L|

16
Experiment – Metrics and Datasets Quality Metrics: Datasets:

17
Experiment - Quality

18
Experiment – Computational Cost

19
Experiment – Computational Cost cont.

20
Experiment – Computational Cost cont

21
Future Works Leverage the relationship of labels. Apply ML-RDT for Recommendation Parallelization and Streaming Implementation

Similar presentations

© 2019 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google