Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.

Slides:



Advertisements
Similar presentations
Online Mining of Frequent Query Trees over XML Data Streams Hua-Fu Li*, Man-Kwan Shan and Suh-Yin Lee Department of Computer Science.
Advertisements

Indexing DNA Sequences Using q-Grams
Graph Mining Laks V.S. Lakshmanan
gSpan: Graph-based substructure pattern mining
Correlation Search in Graph Databases Yiping Ke James Cheng Wilfred Ng Presented By Phani Yarlagadda.
Frequent Closed Pattern Search By Row and Feature Enumeration
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture10.
University of Illinois at Urbana-Champaign Graph Indexing: Tree + Δ ≥ Graph Peixiang Zhao Jeffrey Xu Yu Philip S. Yu Peixiang Zhao Jeffrey Xu Yu Philip.
Connected Substructure Similarity Search Haichuan Shang The University of New South Wales & NICTA, Australia Joint Work: Xuemin Lin (The University of.
Mining Graphs.
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
Data Mining Association Analysis: Basic Concepts and Algorithms
ViST: a dynamic index method for querying XML data by tree structures Authors: Haixun Wang, Sanghyun Park, Wei Fan, Philip Yu Presenter: Elena Zheleva,
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Leiden University Efficient Frequent Query Discovery in F ARMER Siegfried Nijssen and Joost N. Kok ECML/PKDD-2003, Cavtat.
IGraph: A Framework for Comparisons of Disk-Based Graph Indexing Techniques Jeffrey Xu Yu et. al. VLDB ‘10 Presented by Tao Yu.
Jan. 2013Dr. Yangjun Chen ACS Outline Signature Files - Signature for attribute values - Signature for records - Searching a signature file Signature.
Association Analysis (7) (Mining Graphs)
Midterm 2 Overview Fawzi Emad Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
CSC 2300 Data Structures & Algorithms February 27, 2007 Chapter 5. Hashing.
Data Mining Association Analysis: Basic Concepts and Algorithms
Quick Review of material covered Apr 8 B+-Tree Overview and some definitions –balanced tree –multi-level –reorganizes itself on insertion and deletion.
HASH TABLES Malathi Mansanpally CS_257 ID-220. Agenda: Extensible Hash Tables Insertion Into Extensible Hash Tables Linear Hash Tables Insertion Into.
33 rd International Conference on Very Large Data Bases, Sep. 2007, Vienna Towards Graph Containment Search and Indexing Chen Chen 1, Xifeng Yan 2, Philip.
Chapter 17 Methodology – Physical Database Design for Relational Databases Transparencies © Pearson Education Limited 1995, 2005.
What Is Sequential Pattern Mining?
Slides are modified from Jiawei Han & Micheline Kamber
Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim
Subgraph Containment Search Dayu Yuan The Pennsylvania State University 1© Dayu Yuan9/7/2015.
Graph Indexing: A Frequent Structure­ based Approach Authors:Xifeng Yan†, Philip S‡. Yu, Jiawei Han†
USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns Authors: Junfu Yin, Zhigang Zheng, Longbing Cao In: Proceedings of the 18th ACM.
Lecture 9 Methodology – Physical Database Design for Relational Databases.
Sanjay Agarwal Surajit Chaudhuri Gautam Das Presented By : SRUTHI GUNGIDI.
Sequential PAttern Mining using A Bitmap Representation
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
March 16 & 21, Csci 2111: Data and File Structures Week 9, Lectures 1 & 2 Indexed Sequential File Access and Prefix B+ Trees.
On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.
PMLAB Finding Similar Image Quickly Using Object Shapes Heng Tao Shen Dept. of Computer Science National University of Singapore Presented by Chin-Yi Tsai.
Copyright © Curt Hill Query Evaluation Translating a query into action.
Frequent Subgraph Discovery Michihiro Kuramochi and George Karypis ICDM 2001.
Indexing and hashing Azita Keshmiri CS 157B. Basic concept An index for a file in a database system works the same way as the index in text book. For.
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar.
University at BuffaloThe State University of New York Lei Shi Department of Computer Science and Engineering State University of New York at Buffalo Frequent.
Methodology – Physical Database Design for Relational Databases.
Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.
Mining Graph Patterns Efficiently via Randomized Summaries Chen Chen, Cindy X. Lin, Matt Fredrikson, Mihai Christodorescu, Xifeng Yan, Jiawei Han VLDB’09.
by Dayu Yuan The Pennsylvania State University
2004/12/31 報告人 : 邱紹禎 1 Mining Frequent Query Patterns from XML Queries L.H. Yang, M.L. Lee, W. Hsu, and S. Acharya. Proc. of 8th Int. Conf. on Database.
APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Graph Indexing: A Frequent Structure-­based Approach 指導老師:曾新穆 教授 組員:李彥寬、洪世敏、丁鏘巽、 黃冠霖、詹博丞 日期: 2013/11/ /11/141.
Graph Indexing From managing and mining graph data.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chapter 11 Indexing And Hashing (1) Yonsei University 1 st Semester, 2016 Sanghyun Park.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
Gspan: Graph-based Substructure Pattern Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Methodology – Physical Database Design for Relational Databases
Probabilistic Data Management
Mining Frequent Subgraphs
Graph Search with Indexing
國立臺北科技大學 課程:資料庫系統 fall Chapter 18
Indexing and Hashing Basic Concepts Ordered Indices
Mining and Searching Graphs in Biological Databases
Huffman Coding CSE 373 Data Structures.
Mining Frequent Subgraphs
Compact routing schemes with improved stretch
Presentation transcript:

Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007

Presentation Outline Introduction Frequent Fragment Discriminative Fragment Gindex Experimental Result

Introduction Graphs are used to model complicated structures such as proteins, circuits, images and XML documents Current index approach is path based – Example: GraphGrep – Advantages Paths are easier to handle Index space is predefined :all the path up to maxL length are selected – Disadvantages Path is too simple There are too many paths and too many false positives

Introduction “Can we use a graph structure instead of a a path as the basic index feature?” – gIndex Indexes only “frequent subgraphs” Creates a smaller index Improves query times

Presentation Outline Introduction Frequent Fragment Discriminative Fragment Gindex Experimental Result

Frequent Fragment Key concept Fragment – small subgraph minsup – minimum support threshold – A graph is frequent if its support or the number of times it appears in the graph database is greater than minsup Only frequent fragments will be indexed

Frequent Fragment low minimum support on small fragments (for effectiveness) – Want to index lots of the small subgraphs high minimum support on large fragments (for compactness) – Only want to index a large fragment if it appears a lot Otherwise it will be indexed by the smaller subgraphs Problem: There could be a lot of frequent fragments!

Presentation Outline Introduction Frequent Fragment Discriminative Fragment Gindex Experimental Result

Discriminative Fragment Definition (Redundant Fragment) – Fragment is redundant with respect to feature set if Definition (Discriminative Fragment). – Fragment is discriminative with respect to if Fragments that are not redundant are called discriminative

Presentation Outline Introduction Frequent Fragment Discriminative Fragment Gindex Experimental Result

GIndex - Construction First generates all frequent fragments while taking out redundant ones Translates fragments into sequences and holds them in a prefix tree – Each fragment has an id list: the ids of the graphs containing the fragment – Graph Sequentialization (DFS Code) Labeled edge is a 5-tuple (I,j,l i, l (I,j),l j ) Described in another paper

GIndex - Construction gIndex Tree – each fragment can be mapped to an edge sequence (DFS code), insert the edge sequences of discriminative fragments in a prefix tree called the gIndex Tree

GIndex - Construction gIndex Tree – Implemented using a hash table Both black and white nodes are included in the table The tree is still an important concept since it determines what white nodes will be included

GIndex - Search

Optimization Apriori Pruning – If a fragment is not in the gIndex tree, we need not check its super-graphs

GIndex - Search Verification – After getting the candidate answer set, we have to verify that the graphs in the set really contain the query graph perform a subgraph isomorphism test on each graph one by one

GIndex – Maintenance

Presentation Outline Introduction Frequent Fragment Discriminative Fragment Gindex Experimental Result

The index size of gIndex is more than 10 times smaller than that of GraphGrep; gIndex outperforms GraphGrep by 3 to 10 times in various query loads; the index returned by the incremental maintenance algorithm is effective: it performs as well as the index computed from scratch provided the data distribution does not change much.

Experimental Result Data is from an AIDS Antiviral Screen Dataset

Experimental Result

The End