Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.

Slides:

Advertisements

Similar presentations

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.

Advertisements

 Data mining has emerged as a critical tool for knowledge discovery in large data sets. It has been extensively used to analyze business, financial,

Maximum Margin Markov Network Ben Taskar, Carlos Guestrin Daphne Koller 2004.

Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.

A Survey on Text Categorization with Machine Learning Chikayama lab. Dai Saito.

Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.

Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.

© 2008 IBM Corporation Mining Significant Graph Patterns by Leap Search Xifeng Yan (IBM T. J. Watson) Hong Cheng, Jiawei Han (UIUC) Philip S. Yu (UIC)

Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Graph Classification.

An Introduction to Support Vector Machines Martin Law.

Ranking with High-Order and Missing Information M. Pawan Kumar Ecole Centrale Paris Aseem BehlPuneet DokaniaPritish MohapatraC. V. Jawahar.

A k-Nearest Neighbor Based Algorithm for Multi-Label Classification Min-Ling Zhang

Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.

Subgraph Containment Search Dayu Yuan The Pennsylvania State University 1© Dayu Yuan9/7/2015.

Efficient Model Selection for Support Vector Machines

Data mining and machine learning A brief introduction.

1 Information Filtering & Recommender Systems (Lecture for CS410 Text Info Systems) ChengXiang Zhai Department of Computer Science University of Illinois,

Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.

Mehdi Kargar Aijun An York University, Toronto, Canada Keyword Search in Graphs: Finding r-cliques.

Boris Babenko Department of Computer Science and Engineering University of California, San Diego Semi-supervised and Unsupervised Feature Scaling.

Ranking with High-Order and Missing Information M. Pawan Kumar Ecole Centrale Paris Aseem BehlPuneet KumarPritish MohapatraC. V. Jawahar.

GA-Based Feature Selection and Parameter Optimization for Support Vector Machine Cheng-Lung Huang, Chieh-Jen Wang Expert Systems with Applications, Volume.

Special topics on text mining [ Part I: text classification ] Hugo Jair Escalante, Aurelio Lopez, Manuel Montes and Luis Villaseñor.

Skewing: An Efficient Alternative to Lookahead for Decision Tree Induction David PageSoumya Ray Department of Biostatistics and Medical Informatics Department.

Xiangnan Kong,Philip S. Yu Department of Computer Science University of Illinois at Chicago KDD 2010.

1 Frequent Subgraph Mining Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY June 12, 2010.

Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.

An Introduction to Support Vector Machines (M. Law)

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Frequent Subgraph Discovery Michihiro Kuramochi and George Karypis ICDM 2001.

Mohammad Hasan, Mohammed Zaki RPI, Troy, NY. Consider the following problem from Medical Informatics Healthy Diseased Damaged Tissue Images Cell Graphs.

A Systematic Exploration of the Feature Space for Relation Extraction Jing Jiang & ChengXiang Zhai Department of Computer Science University of Illinois,

CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.

Graph Query Reformulation with Diversity – Davide Mottin, Francesco Bonchi, Francesco Gullo 1 Graph Query Reformulation with Diversity Davide Mottin, University.

University at BuffaloThe State University of New York Lei Shi Department of Computer Science and Engineering State University of New York at Buffalo Frequent.

Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.

USE RECIPE INGREDIENTS TO PREDICT THE CATEGORY OF CUISINE Group 7 – MEI, Yan & HUANG, Chenyu.

Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.

Guest lecture: Feature Selection Alan Qi Dec 2, 2004.

1 Classification and Feature Selection Algorithms for Multi-class CGH data Jun Liu, Sanjay Ranka, Tamer Kahveci

Machine Learning Tutorial-2. Recall, Precision, F-measure, Accuracy Ch. 5.

Post-Ranking query suggestion by diversifying search Chao Wang.

Date: 2011/1/11 Advisor: Dr. Koh. Jia-Ling Speaker: Lin, Yi-Jhen Mr. KNN: Soft Relevance for Multi-label Classification (CIKM’10) 1.

Discriminative Frequent Pattern Analysis for Effective Classification By Hong Cheng, Xifeng Yan, Jiawei Han, Chih- Wei Hsu Presented by Mary Biddle.

26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

Xiangnan Kong,Philip S. Yu An Ensemble-based Approach to Fast Classification of Multi-label Data Streams Dept. of Computer Science University of Illinois.

Combining Evolutionary Information Extracted From Frequency Profiles With Sequence-based Kernels For Protein Remote Homology Detection Name: ZhuFangzhi.

Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.

Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.

Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:

1 Discriminative Frequent Pattern Analysis for Effective Classification Presenter: Han Liang COURSE PRESENTATION:

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.

Discriminative Machine Learning Topic 4: Weak Supervision M. Pawan Kumar Slides available online

Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.

Experience Report: System Log Analysis for Anomaly Detection

Semi-Supervised Clustering

KDD 2004: Adversarial Classification

Learning to Rank Shubhra kanti karmaker (Santu)

Hyper-parameter tuning for graph kernels via Multiple Kernel Learning

CARPENTER Find Closed Patterns in Long Biological Datasets

Mining Frequent Subgraphs

Discriminative Frequent Pattern Analysis for Effective Classification

Computer Vision Chapter 4

Graph Classification SEG 5010 Week 3.

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Feature Selection for Ranking

Presentation transcript:

Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago

2 Outline Introduction Multi-Label Feature Selection for Graph Classification Experiments Conclusion

3 Introduction: Graph Data Program FlowsXML DocsChemical Compounds  In real apps, data are not directly represented as feature vectors, but graphs with complex structures. E.g. G(V, E, l) - y  Conventional data mining and machine learning approaches assume data are represented as feature vectors. E.g. (x 1, x 2, …, x d ) - y

4 Introduction: Graph Classification ?? Training Graphs Testing Graph  Graph Classification:  Construct a classification model for graph data  Example: drug activity prediction  Given a set of chemical compounds labeled with activities to one type of disease or virus  Predict active / inactive for a testing compound

5 Feature Vectors H H H H N N x1x1 x2x2 Graph Classification using Subgraph Features H H H H H H O O C C H H H H H H H H H H H H H H H H Subgraph Patterns … … … H H G1G1 G2G2 g1g1g1g1 g2g2g2g2 g3g3g3g3 H H H H N N H H H H N N O O O O C C C C C C C C O O O O C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C C Classifier x1x1 x2x2 Graph ObjectsFeature Vectors Classifiers How to find a set of subgraph features in order to effectively perform graph classification?

6 Existing Methods for Subgraph Feature Selection Graphs Useful Subgraphs H H H H N N C C C C C C C C C C C C O O O O C C C C Feature Selection for Graph Classification Find a set of useful subgraph features for classification Existing Methods Select discriminative subgraph features Focused on single-label settings Assume one graph can only have one label + Lung Cancer Graph Label

7 Multi-Label Graphs In many real apps, one graph can have multiple labels. - Lung Cancer + Melanoma + Breast Cancer Graph Labels Anti-Cancer Drug Prediction

8 Multi-Label Graphs XML Document Classification (One document -> multiple tags) Program flow error detection (One program -> multiple types of errors) Kinase Inhibitor Discovery (One chemical -> multiple types of kinase) … Other Applications:

9 x x a a b b c c Multi-Label Feature Selection for Graph Classification Subgraph features F(p) Multi-label Classification Evaluation Criteria Multi-Label Graphs  Find useful subgraph features for graphs with multiple labels a a c c a a b b c c b b

10 Two Key Questions to Address Evaluation: How to evaluate a set of subgraph features using multiple labels of the graphs? (effective) Search Space Pruning: How to prune the subgraph search space using multiple labels of the graphs? (efficient)

11 What is a good feature? Dependence Maximization Maximize dependence between the features and the multiple labels of graphs Assumption Graphs with similar label sets should have similar features. 1 2 a c a b c d e d e a b f d f

12 Dependence Measure Hilbert-Schmidt Independence Criterion (HSIC) [Gretton et al. 05] Evaluates the dependence between input feature and label vectors in kernel space. Empirical Estimate is easy to calculate  K S : kernel matrix for graphs K S [i, j] : measures the similarity between graph i and j on the common subgraph features they contain (in S)  L : kernel matrix for label vectors L [i, j] : measures the similarity between label sets of graph i and graph j  H = I – 11 T /n : centering matrix HSIC = a c a b c using common subgraph features in S using label vectors in {0,1} Q

13 gHSIC Score Optimization -> gHSIC Criterion gHSIC Score: represents the i-th subgraph feature Objective: Maximize Dependence (HSIC) N H H good C C C C C C C C bad (the sum over all selected features)

14 Two Key Questions to Address How to evaluate a set of subgraph features with multiple labels of the graphs? (effective) How to prune the subgraph search space using multiple labels of the graphs? (efficient)

15 Finding a Needle in a Haystack Pattern Search Tree ┴ 0-edges 1-edge 2-edges … gSpan [Yan et. al ICDM’02] An efficient algorithm to enumerate all frequent subgraph patterns (frequency ≥ min_support) not frequent  Too many frequent subgraph patterns  Find the most useful one(s) using multiple labels Best node(s) in this tree How to find the Best node(s) in this tree without searching all the nodes? (Branch and Bound to prune the search space)

16 gHSIC Upper Bound gHSIC: represents the i-th subgraph feature An Upper-Bound of gHSIC: gHSIC-UB = Upper-Bound of gHSIC scores for all supergraphs of the Anti-monotonic with subgraph frequency ----> Pruning

17 Pruning Principle N H H best subgraph so far C C C C C C C C C C H H C C C C C C C C C C C C C C C C C C C C C C… C C C C C C C C best score so far upper bound … current node C C C C C C C C Pattern Search Tree current score sub-tree C C C C C C C C C C H H C C C C C C C C C C C C C C C C C C C C C C … gHSIC If best score ≥ upper bound We can prune the entire sub-tree

18 Experiment Setup Four methods are compared: Multi-label feature selection + Multi-label classification gMLC [This Paper] + BoosTexter [Schapire & Singer 00] Multi-label feature selection + Binary classification gMLC [This Paper] + BR-SVM [Boutell et al 04] (Binary Relevance) Single-label feature selection + Binary classification BR (Binary Relevance) + Information Gain + SVM Top-k frequent subgraphs + Multi-label classification gSpan [Yan & Han 02] + BoosTexter [Schapire & Singer 00]

19 Three multi-label graph classification tasks: Anti-cancer activity prediction Toxicology prediction of chemical compounds Kinase inhibitor prediction Data Sets

20 Evaluation Multi-Label Metrics [Elisseef&Weston NIPS’02] Ranking Loss ↓ Average number of label pairs being ranked incorrectly The smaller the better Average Precision ↑ Average fraction of correct labels in top ranked labels The larger the better 10 times 10-fold cross-validation

21 Experiment Results Anti-Cancer dataset Kinase Inhibition dataset PTC dataset Ranking Loss 1 – AvePrec

22 Experiment Results Ranking Loss (lower is better) Multi-Label FS + Multi-label Classifier Multi-Label FS + Single-label Classifiers Single-Label FS + Single-label Classifiers Anti-Cancer Dataset # Selected Features Our approach with multi-label classifier performed best at NCI and PTC datasets Unsupervised FS + Multi-label Classifier

23 Pruning Results Running Time #Subgraph Explored

24 Pruning Results Without gHSIC pruning gHSIC pruning (anti-cancer dataset) Running time (seconds) (lower is better)

25 Pruning Results gHSIC pruning Without gHSIC pruning # Subgraphs explored (lower is better) (anti-cancer dataset)

26 Conclusions Multi-Label Feature Selection for Graph Classification Evaluating subgraph features using multiple labels of the graphs (effective) Branch&bound pruning the search space using multiple labels of the graphs (efficient) Thank you!