Presentation is loading. Please wait.

Presentation is loading. Please wait.

Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.

Similar presentations


Presentation on theme: "Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago."— Presentation transcript:

1 Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago

2 2 Outline Introduction Multi-Label Feature Selection for Graph Classification Experiments Conclusion

3 3 Introduction: Graph Data Program FlowsXML DocsChemical Compounds  In real apps, data are not directly represented as feature vectors, but graphs with complex structures. E.g. G(V, E, l) - y  Conventional data mining and machine learning approaches assume data are represented as feature vectors. E.g. (x 1, x 2, …, x d ) - y

4 4 Introduction: Graph Classification ++ -- ?? Training Graphs Testing Graph  Graph Classification:  Construct a classification model for graph data  Example: drug activity prediction  Given a set of chemical compounds labeled with activities to one type of disease or virus  Predict active / inactive for a testing compound

5 5 Feature Vectors H H H H N N x1x1 x2x2 Graph Classification using Subgraph Features H H H H H H O O C C H H H H H H H H H H H H H H H H 10 01 1 1 Subgraph Patterns … … … H H G1G1 G2G2 g1g1g1g1 g2g2g2g2 g3g3g3g3 H H H H N N H H H H N N O O O O C C C C C C C C O O O O C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C Classifier x1x1 x2x2 Graph ObjectsFeature Vectors Classifiers How to find a set of subgraph features in order to effectively perform graph classification?

6 6 Existing Methods for Subgraph Feature Selection Graphs + + - - Useful Subgraphs H H H H N N C C C C C C C C C C C C O O O O C C C C Feature Selection for Graph Classification Find a set of useful subgraph features for classification Existing Methods Select discriminative subgraph features Focused on single-label settings Assume one graph can only have one label + Lung Cancer Graph Label

7 7 Multi-Label Graphs In many real apps, one graph can have multiple labels. - Lung Cancer + Melanoma + Breast Cancer Graph Labels Anti-Cancer Drug Prediction

8 8 Multi-Label Graphs XML Document Classification (One document -> multiple tags) Program flow error detection (One program -> multiple types of errors) Kinase Inhibitor Discovery (One chemical -> multiple types of kinase) … Other Applications:

9 9 x x a a b b c c Multi-Label Feature Selection for Graph Classification Subgraph features F(p) Multi-label Classification Evaluation Criteria Multi-Label Graphs  Find useful subgraph features for graphs with multiple labels a a c c a a b b c c b b

10 10 Two Key Questions to Address Evaluation: How to evaluate a set of subgraph features using multiple labels of the graphs? (effective) Search Space Pruning: How to prune the subgraph search space using multiple labels of the graphs? (efficient)

11 11 What is a good feature? Dependence Maximization Maximize dependence between the features and the multiple labels of graphs Assumption Graphs with similar label sets should have similar features. 1 2 a c a b c d e d e a b f d f

12 12 Dependence Measure Hilbert-Schmidt Independence Criterion (HSIC) [Gretton et al. 05] Evaluates the dependence between input feature and label vectors in kernel space. Empirical Estimate is easy to calculate  K S : kernel matrix for graphs K S [i, j] : measures the similarity between graph i and j on the common subgraph features they contain (in S)  L : kernel matrix for label vectors L [i, j] : measures the similarity between label sets of graph i and graph j  H = I – 11 T /n : centering matrix HSIC = a c a b c using common subgraph features in S using label vectors in {0,1} Q

13 13 gHSIC Score Optimization -> gHSIC Criterion gHSIC Score: represents the i-th subgraph feature Objective: Maximize Dependence (HSIC) N H H good C C C C C C C C bad (the sum over all selected features)

14 14 Two Key Questions to Address How to evaluate a set of subgraph features with multiple labels of the graphs? (effective) How to prune the subgraph search space using multiple labels of the graphs? (efficient)

15 15 Finding a Needle in a Haystack Pattern Search Tree ┴ 0-edges 1-edge 2-edges … gSpan [Yan et. al ICDM’02] An efficient algorithm to enumerate all frequent subgraph patterns (frequency ≥ min_support) not frequent  Too many frequent subgraph patterns  Find the most useful one(s) using multiple labels Best node(s) in this tree How to find the Best node(s) in this tree without searching all the nodes? (Branch and Bound to prune the search space)

16 16 gHSIC Upper Bound gHSIC: represents the i-th subgraph feature An Upper-Bound of gHSIC: gHSIC-UB = Upper-Bound of gHSIC scores for all supergraphs of the Anti-monotonic with subgraph frequency ----> Pruning

17 17 Pruning Principle N H H best subgraph so far C C C C C C C C C C H H C C C C C C C C C C C C C C C C C C C C C C… C C C C C C C C best score so far upper bound … current node C C C C C C C C Pattern Search Tree current score sub-tree C C C C C C C C C C H H C C C C C C C C C C C C C C C C C C C C C C … gHSIC If best score ≥ upper bound We can prune the entire sub-tree

18 18 Experiment Setup Four methods are compared: Multi-label feature selection + Multi-label classification gMLC [This Paper] + BoosTexter [Schapire & Singer 00] Multi-label feature selection + Binary classification gMLC [This Paper] + BR-SVM [Boutell et al 04] (Binary Relevance) Single-label feature selection + Binary classification BR (Binary Relevance) + Information Gain + SVM Top-k frequent subgraphs + Multi-label classification gSpan [Yan & Han 02] + BoosTexter [Schapire & Singer 00]

19 19 Three multi-label graph classification tasks: Anti-cancer activity prediction Toxicology prediction of chemical compounds Kinase inhibitor prediction Data Sets

20 20 Evaluation Multi-Label Metrics [Elisseef&Weston NIPS’02] Ranking Loss ↓ Average number of label pairs being ranked incorrectly The smaller the better Average Precision ↑ Average fraction of correct labels in top ranked labels The larger the better 10 times 10-fold cross-validation

21 21 Experiment Results Anti-Cancer dataset Kinase Inhibition dataset PTC dataset Ranking Loss 1 – AvePrec

22 22 Experiment Results Ranking Loss (lower is better) Multi-Label FS + Multi-label Classifier Multi-Label FS + Single-label Classifiers Single-Label FS + Single-label Classifiers Anti-Cancer Dataset # Selected Features Our approach with multi-label classifier performed best at NCI and PTC datasets Unsupervised FS + Multi-label Classifier

23 23 Pruning Results Running Time #Subgraph Explored

24 24 Pruning Results Without gHSIC pruning gHSIC pruning (anti-cancer dataset) Running time (seconds) (lower is better)

25 25 Pruning Results gHSIC pruning Without gHSIC pruning # Subgraphs explored (lower is better) (anti-cancer dataset)

26 26 Conclusions Multi-Label Feature Selection for Graph Classification Evaluating subgraph features using multiple labels of the graphs (effective) Branch&bound pruning the search space using multiple labels of the graphs (efficient) Thank you!


Download ppt "Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago."

Similar presentations


Ads by Google