Graph Classification.

Slides:



Advertisements
Similar presentations
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Advertisements

Data Mining and Machine Learning
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques
Pattern Recognition and Machine Learning
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Machine learning continued Image source:
Iowa State University Department of Computer Science Artificial Intelligence Research Laboratory Research supported in part by grants from the National.
Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.
Discriminative and generative methods for bags of features
Multiple Criteria for Evaluating Land Cover Classification Algorithms Summary of a paper by R.S. DeFries and Jonathan Cheung-Wai Chan April, 2000 Remote.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Bayesian Learning Rong Jin. Outline MAP learning vs. ML learning Minimum description length principle Bayes optimal classifier Bagging.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Ensemble Learning: An Introduction
Evaluating Hypotheses
Three kinds of learning
1 MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING By Kaan Tariman M.S. in Computer Science CSCI 8810 Course Project.
Data mining and statistical learning - lecture 13 Separating hyperplane.
Modeling Gene Interactions in Disease CS 686 Bioinformatics.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Sparse vs. Ensemble Approaches to Supervised Learning
Chapter 5 Data mining : A Closer Look.
Ensemble Learning (2), Tree and Forest
Introduction to machine learning
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Active Learning for Class Imbalance Problem
Copyright R. Weber Machine Learning, Data Mining ISYS370 Dr. R. Weber.
Data Mining Joyeeta Dutta-Moscato July 10, Wherever we have large amounts of data, we have the need for building systems capable of learning information.
Data mining and machine learning A brief introduction.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Text Feature Extraction. Text Classification Text classification has many applications –Spam detection –Automated tagging of streams of news articles,
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
CLASSIFICATION: Ensemble Methods
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Data Reduction via Instance Selection Chapter 1. Background KDD  Nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Ensemble Methods in Machine Learning
Post-Ranking query suggestion by diversifying search Chao Wang.
Rotem Golan Department of Computer Science Ben-Gurion University of the Negev, Israel.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Classification Ensemble Methods 1
Data Mining and Decision Support
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
CPT-S Topics in Computer Science Big Data 1 Yinghui Wu EME 49.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
The Elements of Statistical Learning
COMP61011 : Machine Learning Ensemble Models
Overview of Supervised Learning
Data Mining Practical Machine Learning Tools and Techniques
Learning with information of features
Prepared by: Mahmoud Rafeek Al-Farra
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
MACHINE LEARNING TECHNIQUES IN IMAGE PROCESSING
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Presentation transcript:

Graph Classification

Classification Outline Introduction, Overview Classification using Graphs Graph classification – Direct Product Kernel Predictive Toxicology example dataset Vertex classification – Laplacian Kernel WEBKB example dataset Related Works

Example: Molecular Structures Unknown Known A Toxic Non-toxic B E D B A C B D C A E B C D C A E Task: predict whether molecules are toxic, given set of known examples D F

Solution: Machine Learning Computationally discover and/or predict properties of interest of a set of data Two Flavors: Unsupervised: discover discriminating properties among groups of data (Example: Clustering) Supervised: known properties, categorize data with unknown properties (Example: Classification) Data Property Discovery, Partitioning Clusters Training Data Model Build Classification Predict Test Data

Classification Classification: The task of assigning class labels in a discrete class label set Y to input instances in an input space X Ex: Y = { toxic, non-toxic }, X = {valid molecular structures} Misclassified data instance (test error) Unclassified data instances Training the classification model using the training data Assignment of the unknown (test) data to appropriate class labels using the model 5

Classification Outline Introduction, Overview Classification using Graphs, Graph classification – Direct Product Kernel Predictive Toxicology example dataset Vertex classification – Laplacian Kernel WEBKB example dataset Related Works

Classification with Graph Structures Graph classification (between-graph) Each full graph is assigned a class label Example: Molecular graphs Vertex classification (within-graph) Within a single graph, each vertex is assigned a class label Example: Webpage (vertex) / hyperlink (edge) graphs A B E D NCSU domain Faculty C Toxic Course Student

Relating Graph Structures to Classes? Frequent Subgraph Mining (Chapter 7) Associate frequently occurring subgraphs with classes Anomaly Detection (Chapter 11) Associate anomalous graph features with classes *Kernel-based methods (Chapter 4) Devise kernel function capturing graph similarity, use vector-based classification via the kernel trick

Relating Graph Structures to Classes? This chapter focuses on kernel-based classification. Two step process: Devise kernel that captures property of interest Apply kernelized classification algorithm, using the kernel function. Two type of graph classification looked at Classification of Graphs Direct Product Kernel Classification of Vertices Laplacian Kernel See Supplemental slides for support vector machines (SVM), one of the more well-known kernelized classification techniques.

Walk-based similarity (Kernels Chapter) Intuition – two graphs are similar if they exhibit similar patterns when performing random walks H I J Random walk vertices heavily distributed towards A,B,D,E Random walk vertices heavily distributed towards H,I,K with slight bias towards L Similar! A B C K L Q R S D E F Random walk vertices evenly distributed Not Similar! T U V

Classification Outline Introduction, Overview Classification using Graphs Graph classification – Direct Product Kernel Predictive Toxicology example dataset. Vertex classification – Laplacian Kernel WEBKB example dataset. Related Works

Direct Product Graph – Formal Definition Input Graphs 𝐺 1 = 𝑉 1 , 𝐸 1 𝐺 2 =( 𝑉 2 , 𝐸 2 ) Direct Product Vertices 𝑉 𝐺 𝑥 = { 𝑎,𝑏 ∈ 𝑉 1 × 𝑉 2 } Direct Product Notation 𝐺 𝑋 = 𝐺 1 × 𝐺 2 Direct Product Edges Intuition Vertex set: each vertex of 𝑉 1 paired with every vertex of 𝑉 2 Edge set: Edges exist only if both pairs of vertices in the respective graphs contain an edge 𝐸 𝐺 𝑥 = { 𝑎,𝑏 , 𝑐,𝑑 | 𝑎,𝑐 ∈ 𝐸 1 𝑎𝑛𝑑 𝑏,𝑑 ∈ 𝐸 2 }

Direct Product Graph - example B A B D C A E C D Type-A Type-B

Direct Product Graph Example Type-A A B C D Type-B A B C D E A B C D E A B C D E A B C D E A B C D E 0 0 0 0 0 0 1 1 1 1 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 A B C D Intuition: multiply each entry of Type-A by entire matrix of Type-B

Direct Product Graph of Type-A and Type-B Direct Product Kernel (see Kernel Chapter) Compute direct product graph 𝑮 𝒙 Compute the maximum in- and out-degrees of Gx, di and do. Compute the decay constant γ < 1 / min(di, do) Compute the infinite weighted geometric series of walks (array A). Sum over all vertex pairs. Direct Product Graph of Type-A and Type-B

Kernel Matrix 𝐾 𝐺 1 , 𝐺 1 , 𝐾 𝐺 1 , 𝐺 2 , …, 𝐾 𝐺 1 , 𝐺 𝑛 𝐾 𝐺 2 , 𝐺 1 , 𝐾 𝐺 2 , 𝐺 2 , …, 𝐾 𝐺 2 , 𝐺 𝑛 .. . 𝐾 𝐺 𝑛 , 𝐺 1 , 𝐾 𝐺 𝑛 , 𝐺 2 , …, 𝐾( 𝐺 𝑛 , 𝐺 𝑛 ) Compute direct product kernel for all pairs of graphs in the set of known examples. This matrix is used as input to SVM function to create the classification model. *** Or any other kernelized data mining method!!!

Classification Outline Introduction, Overview Classification using Graphs, Graph classification – Direct Product Kernel Predictive Toxicology example dataset. Vertex classification – Laplacian Kernel WEBKB example dataset. Related Works

Predictive Toxicology (PTC) dataset The PTC dataset is a collection of molecules that have been tested positive or negative for toxicity. A B D # R code to create the SVM model data(“PTCData”) # graph data data(“PTCLabels”) # toxicity information # select 5 molecules to build model on sTrain = sample(1:length(PTCData),5) PTCDataSmall <- PTCData[sTrain] PTCLabelsSmall <- PTCLabels[sTrain] # generate kernel matrix K = generateKernelMatrix (PTCDataSmall, PTCDataSmall) # create SVM model model =ksvm(K, PTCLabelsSmall, kernel=‘matrix’) C B C A E D

Classification Outline Introduction, Overview Classification using Graphs, Graph classification – Direct Product Kernel Predictive Toxicology example dataset. Vertex classification – Laplacian Kernel WEBKB example dataset. Related Works

Kernels for Vertex Classification 𝐾= 𝑖=1 ∞ 𝛾 𝑖−1 𝐵 𝑇 𝐵 𝑖 von Neumann kernel (Chapter 6) Regularized Laplacian (This chapter) 𝐾= 𝑖=1 ∞ 𝛾 𝑖 −𝐿 𝑖

Example: Hypergraphs A hypergraph is a generalization of a graph, where an edge can connect any number of vertices I.e., each edge is a subset of the vertex set. Example: word-webpage graph Vertex – webpage Edge – set of pages containing same word 𝑒 2 𝑣 2 𝑣 5 𝑒 1 𝑣 1 𝑣 4 𝑣 6 𝑒 4 𝑣 7 𝑒 3 𝑣 3 𝑣 8 21

“Flattening” a Hypergraph Given hypergraph matrix 𝑨, 𝐴×𝐴𝑇 represents “similarity matrix” Rows, columns represent vertices (𝑖, 𝑗) entry – number of hyperedges incident on both vertex 𝑖 and 𝑗. Problem: some neighborhood info. lost (vertex 1 and 3 just as “similar” as 1 and 2)

Laplacian Matrix In the mathematical field of graph theory the Laplacian matrix (L), is a matrix representation of a graph. L = D – M M – adjacency matrix of graph (e.g., A*AT from hypergraph flattening) D – degree matrix (diagonal matrix where each (i,i) entry is vertex i‘s [weighted] degree) Laplacian used in many contexts (e.g., spectral graph theory) 23

Normalized Laplacian Matrix Normalizing the matrix helps eliminate bias in matrix toward high-degree vertices 𝐿 𝑖,𝑗 ≔ 1 −1 deg 𝑣 𝑖 deg⁡( 𝑣 𝑗 ) 0 if 𝑖=𝑗 and deg 𝑣 𝑖 ≠0 if 𝑖≠𝑗 and 𝑣 𝑖 is adjacent to 𝑣 𝑗 otherwise Original L Regularized L 24

𝐾= 𝑖=1 ∞ 𝛾 𝑖 −𝐿 𝑖 𝐾= 𝐼+𝛾𝐿 −1 Laplacian Kernel 𝐾= 𝑖=1 ∞ 𝛾 𝑖 −𝐿 𝑖 Uses walk-based geometric series, only applied to regularized Laplacian matrix Decay constant NOT degree-based – instead tunable parameter < 1 𝐾= 𝐼+𝛾𝐿 −1 Regularized L 25

Classification Outline Introduction, Overview Classification using Graphs, Graph classification – Direct Product Kernel Predictive Toxicology example dataset. Vertex classification – Laplacian Kernel WEBKB example dataset. Related Works

WEBKB dataset word 2 𝑣 2 𝑣 5 The WEBKB dataset is a collection of web pages that include samples from four universities website. The web pages are assigned into five distinct classes according to their contents namely course, faculty, student, project and staff. The web pages are searched for the most commonly used words. There are 1073 words that are encountered at least with a frequency of 10. word 4 𝑣 1 𝑣 4 𝑣 6 𝑣 7 word 1 word 3 𝑣 3 𝑣 8 # R code to create the SVM model data(WEBKB) # generate kernel matrix K = generateKernelMatrixWithinGraph(WEBKB) # create sample set for testing holdout <- sample (1:ncol(K), 20) # create SVM model model =ksvm(K[-holdout,-holdout], y, kernel=‘matrix’)

Classification Outline Introduction, Overview Classification using Graphs, Graph classification – Direct Product Kernel Predictive Toxicology example dataset. Vertex classification – Laplacian Kernel WEBKB example dataset. Kernel-based vector classification – Support Vector Machines Related Works

Related Work – Classification on Graphs Graph mining chapters: Frequent Subgraph Mining (Ch. 7) Anomaly Detection (Ch. 11) Kernel chapter (Ch. 4) – discusses in detail alternatives to the direct product and other “walk-based” kernels. gBoost – extension of “boosting” for graphs Progressively collects “informative” frequent patterns to use as features for classification / regression. Also considered a frequent subgraph mining technique (similar to gSpan in Frequent Subgraph Chapter). Tree kernels – similarity of graphs that are trees.

Related Work – Traditional Classification Decision Trees Classification model  tree of conditionals on variables, where leaves represent class labels Input space is typically a set of discrete variables Bayesian belief networks Produces directed acyclic graph structure using Bayesian inference to generate edges. Each vertex (a variable/class) associated with a probability table indicating likelihood of event or value occurring, given the value of the determined dependent variables. Support Vector Machines Traditionally used in classification of real-valued vector data. See Kernels chapter for kernel functions working on vectors.

Related Work – Ensemble Classification Ensemble learning: algorithms that build multiple models to enhance stability and reduce selection bias. Some examples: Bagging: Generate multiple models using samples of input set (with replacement), evaluate by averaging / voting with the models. Boosting: Generate multiple weak models, weight evaluation by some measure of model accuracy.

Related Work – Evaluating, Comparing Classifiers This is the subject of Chapter 12, Performance Metrics A very brief, “typical” classification workflow: Partition data into training, test sets. Build classification model using only the training set. Evaluate accuracy of model using only the test set. Modifications to the basic workflow: Multiple rounds of training, testing (cross-validation) Multiple classification models built (bagging, boosting) More sophisticated sampling (all)

Related Work – Evaluating, Comparing Classifiers This is the subject of Chapter 12, Performance Metrics A very brief, “typical” classification workflow: Partition data into training, test sets. Build classification model using only the training set. Evaluate accuracy of model using only the test set. Modifications to the basic workflow: Multiple rounds of training, testing (cross-validation) Multiple classification models built (bagging, boosting) More sophisticated sampling (all)