Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multi-label Classification without Multi-label Cost - Multi-label Random Decision Tree Classifier 1.IBM Research – China 2.IBM T.J.Watson Research Center.

Similar presentations


Presentation on theme: "Multi-label Classification without Multi-label Cost - Multi-label Random Decision Tree Classifier 1.IBM Research – China 2.IBM T.J.Watson Research Center."— Presentation transcript:

1 Multi-label Classification without Multi-label Cost - Multi-label Random Decision Tree Classifier 1.IBM Research – China 2.IBM T.J.Watson Research Center Presenter: Xiatian Zhang xiatianz@cn.ibm.com Authors: Xiatian Zhang, Quan Yuan, Shiwan Zhao, Wei Fan, Wentao Zheng, Zhong Wang

2 Multi-label Classification  Classical Classification (Single Label Classification) –The classes are exclusive: if an example belongs to one class, it can’t be belongs to others  Multi-label Classification –A picture, video, article may belong to several compatible categories –A pieces of gene can control several biological functions Tree LakeIce Winter Park

3 Existed Multi-label Classification Methods  Grigorios Tsoumakas et al[2007] summarize the existing methods for ML-Classification  Two Strategies –Problem Transformation –Transfer Multi-label Classification Problem to Single Classification Problem –Algorithm Adaptation –Adapt Single-label Classifiers to Solve the Multi-label Classification Problem –With high complexity

4 Problem Transformation Approaches  Label Powerset (LP) –Label Powerset considers each unique subset of labels that exists in the multi-label dataset as a single label L1L2L3 L1 L2 L3  Binary Relevance (BR) –Binary Relevance learns one binary classier for each label L1 L2 L3 L4 Classifier L1+L2+L3+ L1-L2-L3- Classifier1Classifier2Classifier3

5 Large Number of Labels Problem  Hundreds and even more labels –Text categorization –protein function classification –semantic annotation of multimedia  The Impacts to Multi-label Classification Methods –Label Powerset: the number of training examples for each particular label will be much less –Binary Relevance: The computational complexity is with linear complexity with respect to the number of labels –Algorithm Adaptation: Even more worse than Binary Relevance

6 HOMER for Large Number of Labels Problem  HOMER (Hierarchy Of Multilabel classifERs) is developed by Grigorios Tsoumakas et al, 2008.  The HOMER algorithm constructs a Hierarchy Of Mul-tilabel classifERs, each one dealing with a much smaller set of labels.

7 Our Method – Without Label Cost  Without Label Cost –Training Time is almost irrelevant with number of labels |L|  But with Reliable Quality –The classification Quality can be compared to mainstream methods over different data sets.  How to make it?

8 Our Method – Without Label Cost cont.  Binary Relevance Method based on Random Decision Tree  Random Decision Tree [Fan et al, 2003] –Training Process is irrelevant with label information –Random Construction with very low cost –Stable quality on many applications

9 Random Decision Tree – Tree Construction  At each node, an un-used feature is chosen randomly –A discrete feature is un-used if it has never been chosen previously on a given decision path starting from the root to the current node. –A continuous feature can be chosen multiple times on the same decision path, but each time a different threshold value is chosen  It stop when one of the following happens: –A node becomes too small (<= 4 examples). –Or the total height of the tree exceeds some limits: –Such as the total number of features.  The construction process is irrelevant with label information

10 Random Decision Tree - Node Statistics  Classification and Probability Estimation: –Each node of the tree keeps the number of examples belonging to each class.  The node statistics process cost a little computation resource F1<0.5 F2>0.7 F3>0.3 +:200 -: 10 +:30 -: 70 Y Y N N N …

11 Random Decision Tree - Classification  During classification, each tree outputs posterior probability: P(+|x)=30/100 =0.3 F1<0.5 F2>0.7 F3>0.3 +:200 -: 10 +:30 -: 70 Y Y N N N …

12 Random Decision Tree - Ensemble  For a instance x, average the estimated probability on each tree and take the average probability as the predicted probability of x. P’(+|x)=30/50 =0.6 P(+|x)=30/100=0.3 (P(+|x)+P’(+|x))/2 = 0.45 F3>0.3 F2<0.6 F1>0.7 +:100 -:120 +:30 -: 20 Y Y N N N … F1<0.5 F2>0.7 F3>0.3 +:200 -: 10 +:30 -: 70 Y Y N N N …

13 Multi-label Random Decision Tree F1<0.5 F2>0.7 F3>0.3 Y Y N N N … L1+:30 L1-: 70 L2+:50 L2-: 50 L1+:200 L1-: 10 L2+:40 L2-: 60 F3>0.5 F2<0.7 F1>0.7 Y Y N N N … L1+:30 L1-: 20 L2+:20 L2-: 80 L1+:100 L1-:120 L1+:200 L1-: 10 P(L1+|x)=30/100=0.3P’(L1+|x)=30/50 =0.6 P(L2+|x)=50/100=0.5P’(L2+|x)=20/100=0.2 (P(L1+|x)+P’(L1+|x))/2 = 0.45 (P(L2+|x)+P’(L2+|x))/2 = 0.35

14 Why RDT Works?  Ensemble Learning View –Our Analysis –Other Explanations  Non-Parametric Estimation

15 Complexity of Multi-label Random Decision Tree  Training Complexity: –m is the number of trees, and n is the number of instances –t is the average number of labels on each leaf nodes, t<<n, and t<<|L|. –It is irrelevant with number of labels |L|. –Complexity of C4.5: V i is the size of values of i-th attribute. –Complexity of HOMER:  Test Complexity: –q is the average depth of branches of trees –It is also irrelevant with number of labels |L|

16 Experiment – Metrics and Datasets  Quality Metrics:  Datasets:

17 Experiment - Quality

18 Experiment – Computational Cost

19 Experiment – Computational Cost cont.

20 Experiment – Computational Cost cont

21 Future Works  Leverage the relationship of labels.  Apply ML-RDT for Recommendation  Parallelization and Streaming Implementation


Download ppt "Multi-label Classification without Multi-label Cost - Multi-label Random Decision Tree Classifier 1.IBM Research – China 2.IBM T.J.Watson Research Center."

Similar presentations


Ads by Google