Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.

Slides:



Advertisements
Similar presentations
Ranking Outliers Using Symmetric Neighborhood Relationship Wen Jin, Anthony K.H. Tung, Jiawei Han, and Wei Wang Advances in Knowledge Discovery and Data.
Advertisements

Random Forest Predrag Radenković 3237/10
gSpan: Graph-based substructure pattern mining
Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Reference-based Indexing of Sequence Databases Jayendra Venkateswaran, Deepak Lachwani, Tamer Kahveci, Christopher Jermaine University of Florida-Gainesville.
Classification Techniques: Decision Tree Learning
Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
Continuous Data Stream Processing  Music Virtual Channel – extensions  Data Stream Monitoring – tree pattern mining  Continuous Query Processing – sequence.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
ID3 Algorithm Abbas Rizvi CS157 B Spring What is the ID3 algorithm? ID3 stands for Iterative Dichotomiser 3 Algorithm used to generate a decision.
3 -1 Chapter 3 The Greedy Method 3 -2 The greedy method Suppose that a problem can be solved by a sequence of decisions. The greedy method has that each.
COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel.
Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
1 A DATA MINING APPROACH FOR LOCATION PREDICTION IN MOBILE ENVIRONMENTS* by Gökhan Yavaş Feb 22, 2005 *: To appear in Data and Knowledge Engineering, Elsevier.
Branch and Bound Algorithm for Solving Integer Linear Programming
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
33 rd International Conference on Very Large Data Bases, Sep. 2007, Vienna Towards Graph Containment Search and Indexing Chen Chen 1, Xifeng Yan 2, Philip.
1 Efficient Discovery of Conserved Patterns Using a Pattern Graph Inge Jonassen Pattern Discovery Arwa Zabian 13/07/2015.
Solving problems by searching
1 Efficiently Mining Frequent Trees in a Forest Mohammed J. Zaki.
Identifying Computer Graphics Using HSV Model And Statistical Moments Of Characteristic Functions Xiao Cai, Yuewen Wang.
Busby, Dodge, Fleming, and Negrusa. Backtracking Algorithm Is used to solve problems for which a sequence of objects is to be selected from a set such.
Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim
Subgraph Containment Search Dayu Yuan The Pennsylvania State University 1© Dayu Yuan9/7/2015.
Graph Indexing: A Frequent Structure­ based Approach Authors:Xifeng Yan†, Philip S‡. Yu, Jiawei Han†
USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns Authors: Junfu Yin, Zhigang Zheng, Longbing Cao In: Proceedings of the 18th ACM.
© The McGraw-Hill Companies, Inc., Chapter 3 The Greedy Method.
Search.
Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:
Graph Data Management Lab, School of Computer Science gdm.fudan.edu.cn XMLSnippet: A Coding Assistant for XML Configuration Snippet.
On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.
Frequent Structure Mining Presented By: Ahmed R. Nabhan Computer Science Department University of Vermont Fall 2011.
1 Boosting-based parse re-ranking with subtree features Taku Kudo Jun Suzuki Hideki Isozaki NTT Communication Science Labs.
SPIN: Mining Maximal Frequent Subgraphs from Graph Databases Jun Huan, Wei Wang, Jan Prins, Jiong Yang KDD 2004.
Xiangnan Kong,Philip S. Yu Department of Computer Science University of Illinois at Chicago KDD 2010.
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
CEMiner – An Efficient Algorithm for Mining Closed Patterns from Time Interval-based Data Yi-Cheng Chen, Wen-Chih Peng and Suh-Yin Lee ICDM 2011.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Frequent Subgraph Discovery Michihiro Kuramochi and George Karypis ICDM 2001.
Chao-Yeh Chen and Kristen Grauman University of Texas at Austin Efficient Activity Detection with Max- Subgraph Search.
Automatic Suggestion of Query-Rewrite Rules for Enterprise Search Date : 2013/08/13 Source : SIGIR’12 Authors : Zhuowei Bao, Benny Kimelfeld, Yunyao Li.
A correction The definition of knot in page 147 is not correct. The correct definition is: A knot in a directed graph is a subgraph with the property that.
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Text Document Categorization by Term Association Maria-luiza Antonie Osmar R. Zaiane University of Alberta, Canada 2002 IEEE International Conference on.
Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy.
Graphs and MSTs Sections 1.4 and 9.1. Partial-Order Relations Everybody is not related to everybody. Examples? Direct road connections between locations.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Advisor-Advisee Relationships from Research Publication.
Frequent Structure Mining Robert Howe University of Vermont Spring 2014.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Graph Indexing From managing and mining graph data.
Mismatch String Kernals for SVM Protein Classification Christina Leslie, Eleazar Eskin, Jason Weston, William Stafford Noble Presented by Pradeep Anand.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Mining Complex Data COMP Seminar Spring 2011.
::Network Optimization:: Minimum Spanning Trees and Clustering Taufik Djatna, Dr.Eng. 1.
Mustafa Gokce Baydogan, George Runger and Eugene Tuv INFORMS Annual Meeting 2011, Charlotte A Bag-of-Features Framework for Time Series Classification.
Subgraph Search Over Uncertain Graphs Erşan Demircioğlu.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Learning Portfolio Analysis and Mining for SCORM Compliant Environment Pattern Recognition (PR, 2010)
Gspan: Graph-based Substructure Pattern Mining
Presented by: Mi Tian, Deepan Sanghavi, Dhaval Dholakia
Finding Dense and Connected Subgraphs in Dual Networks
An overview of decoding techniques for LVCSR
Relating Reinforcement Learning Performance to Classification performance Presenter: Hui Li Sept.11, 2006.
On Efficient Graph Substructure Selection
Discriminative Frequent Pattern Analysis for Effective Classification
Approximate Graph Mining with Label Costs
Presentation transcript:

Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History

Outlines Motivation Objectives Methodology Experiments Conclusions 2

Motivation Complex structures in many scientific applications can be represented by graphs, and many data mining and database problems in graph databases, such as graph indexing, graph classification, need discriminative subgraph patterns. Discriminative subgraphs can be used to characterize complex graphs, construct graph classifiers and generate graph indices. 3

Objectives Discriminative subgraph pattern mining solved in one of two ways: one is a greedy approach the other is a branch-and-bound approach. greedy approach attempting to reach local optimal subgraph as fast as possible branch-and-bound approach prunes the search space using an estimated upper-bound of the scores. LTS (Learn To Search) algorithm, which integrates both approaches with novel probing and pruning techniques. 4

Methodology Fast-probe algorithm Upper-bound estimation algorithm LTS algorithm 5

Definitions Graph A graph is denoted as, where V is a set of nodes and E is a set of edges. positive graph set and a negative graph set we assume that the positive set is the interesting set denoted as and the negative set is the decoy denoted as. 6

Cont. Subgraph Isomorphism The label of node is denoted by and the label of an edge is denoted by for two graphs there is an injection such that, then is a subgraph of is a supergraph of, or supports. 7

Cont. Frequency Given a graph set, the frequency of a subgraph pattern is defined as: 8

Cont. Discriminatio n Score The more discriminative the pattern, the larger the discrimination score.we define the discrimination score as 9

Cont. 10 Lineage lineage of pattern is a sequence of patterns:, can be directly extended from Score record the score record for is a sequence of scores for the patterns in the lineage:

Fast-probe algorithm 11 maintains a list of candidate subgraph patterns to generate a good sample of discriminative subgraphs to facilitate the subsequent branch-and-bound search. candidate list is initialized with all single-edge subgraph patterns in. for each graph in, update the optimal pattern and optimal score for. Add one more edge and repeat, terminates when the candidate list becomes empty. Get optimal pattern for each in.

12

Upper-bound estimation algorithm 13 Discriminative subgraph mining process always generates many score records, which can be organized into a prefix tree, called prediction tree. Root node is labelled with 0.0, each tree node is also associated with the maximum score in the sub-tree rooted at this node. the maximum score at each tree node is an estimated upper-bound in the search space.

14

LTS algorithm 15 LTS first uses fast-probe to collect score records and generates search history, which includes a of score records and a. LTS utilizes a vector F to keep track of the optimal pattern for each positive graph. stores the optimal pattern for positive graph. Candidate list is optimal pattern for each in by fast-probe algorithm. LTS updates if positive graph supports and is greater than. Terminates when the candidate list becomes empty. return the optimal pattern in.

16

Experiments 17

Conclusions 18 In this paper, we investigate the feasibility of estimating upper-bound for discrimination scores of subgraph patterns in discriminative subgraph mining by learning from search history. In the more complex protein datasets, LTS can significantly improve classification accuracy by the branch-and-bound search following fast-probe.