Xifeng Yan Philip S. Yu Jiawei Han SIGMOD 2005 Substructure Similarity Search in Graph Databases.

Slides:



Advertisements
Similar presentations
Data not in the pre-defined feature vectors that can be used to construct predictive models. Applications: Transactional database Sequence database Graph.
Advertisements

AESuniversity Ad hoc Reporting. Ad hoc Reports What are ad hoc reports? Why would you use ad hoc reports? Creating an ad hoc report from a query Building.
Indexing DNA Sequences Using q-Grams
B-Trees. Motivation When data is too large to fit in the main memory, then the number of disk accesses becomes important. A disk access is unbelievably.
Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
1 DynaMat A Dynamic View Management System for Data Warehouses Vicky :: Cao Hui Ping Sherman :: Chow Sze Ming CTH :: Chong Tsz Ho Ronald :: Woo Lok Yan.
gSpan: Graph-based substructure pattern mining
Yinghui Wu, LFCS DB talk Database Group Meeting Talk Yinghui Wu 10/11/ Simulation Revised for Graph Pattern Matching.
A Framework for Clustering Evolving Data Streams Charu C. Aggarwal, Jiawei Han, Jianyong Wang, Philip S. Yu Presented by: Di Yang Charudatta Wad.
VisualRank: Applying PageRank to Large-Scale Image Search Yushi Jing, Member, IEEE Shumeet Baluja, Member, IEEE IEEE TRANSACTIONS ON PATTERN ANALYSIS AND.
VisualRank: Applying PageRank to Large-Scale Image Search Yushi Jing, Member, IEEE, and Shumeet Baluja, Member, IEEE.
Connected Substructure Similarity Search Haichuan Shang The University of New South Wales & NICTA, Australia Joint Work: Xuemin Lin (The University of.
Contextual Advertising by Combining Relevance with Click Feedback D. Chakrabarti D. Agarwal V. Josifovski.
3D Shape Histograms for Similarity Search and Classification in Spatial Databases. Mihael Ankerst,Gabi Kastenmuller, Hans-Peter-Kriegel,Thomas Seidl Univ.
Mining Top-K Large Structural Patterns in a Massive Network Feida Zhu 1, Qiang Qu 2, David Lo 1, Xifeng Yan 3, Jiawei Han 4, and Philip S. Yu 5 1 Singapore.
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A novel document similarity measure based on earth mover’s.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Video summarization by video structure analysis and graph optimization M. Phil 2 nd Term Presentation Lu Shi Dec 5, 2003.
Computing the Banzhaf Power Index in Network Flow Games
Pores and Ridges: High- Resolution Fingerprint Matching Using Level 3 Features Anil K. Jain Yi Chen Meltem Demirkus.
1. 2 General problem Retrieval of time-series similar to a given pattern.
Dean H. Lorenz, Danny Raz Operations Research Letter, Vol. 28, No
33 rd International Conference on Very Large Data Bases, Sep. 2007, Vienna Towards Graph Containment Search and Indexing Chen Chen 1, Xifeng Yan 2, Philip.
Towards Scalable Critical Alert Mining Bo Zong 1 with Yinghui Wu 1, Jie Song 2, Ambuj K. Singh 1, Hasan Cam 3, Jiawei Han 4, and Xifeng Yan 1 1 UCSB, 2.
Query Relaxation for XML Database Award #: PI: Wesley W. Chu Computer Science Dept. UCLA.
Slides are modified from Jiawei Han & Micheline Kamber
Graph Indexing Techniques Seoul National University IDB Lab. Kisung Kim
Space-Efficient Sequence Alignment Space-Efficient Sequence Alignment Bioinformatics 202 University of California, San Diego Lecture Notes No. 7 Dr. Pavel.
Graph Indexing: A Frequent Structure­ based Approach Authors:Xifeng Yan†, Philip S‡. Yu, Jiawei Han†
Ontology Development Kenneth Baclawski Northeastern University Harvard Medical School.
Xiaochen Zhu 1, Shaoxu Song 1, Jianmin Wang 1, Philip S. Yu 2, Jiaguang Sun 1 1 Tsinghua University, China 2University of Illinois at Chicago, USA 1/29.
PageRank for Product Image Search Kevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology) Shumeet Baluja (Google Inc.) WWW 2008.
AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project.
® Microsoft Office 2010 Access Tutorial 3 Maintaining and Querying a Database.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
Xiangnan Kong,Philip S. Yu Department of Computer Science University of Illinois at Chicago KDD 2010.
1/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science
On Node Classification in Dynamic Content-based Networks.
Xiangnan Kong,Philip S. Yu Multi-Label Feature Selection for Graph Classification Department of Computer Science University of Illinois at Chicago.
Answering Similar Region Search Queries Chang Sheng, Yu Zheng.
Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.
MINING COLOSSAL FREQUENT PATTERNS BY CORE PATTERN FUSION FEIDA ZHU, XIFENG YAN, JIAWEI HAN, PHILIP S. YU, HONG CHENG ICDE07 Advisor: Koh JiaLing Speaker:
Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.
Mining Top-K Large Structural Patterns in a Massive Network Feida Zhu 1, Qiang Qu 2, David Lo 1, Xifeng Yan 3, Jiawei Han 4, and Philip S. Yu 5 1 Singapore.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Psychiatric document retrieval using a discourse-aware model Presenter : Wu, Jia-Hao Authors : Liang-Chih.
Page 1 PathSim: Meta Path-Based Top-K Similarity Search in Heterogeneous Information Networks Yizhou Sun, Jiawei Han, Xifeng Yan, Philip S. Yu, Tianyi.
Mining Graph Patterns Efficiently via Randomized Summaries Chen Chen, Cindy X. Lin, Matt Fredrikson, Mihai Christodorescu, Xifeng Yan, Jiawei Han VLDB’09.
1 Efficient Discovery of Frequent Approximate Sequential Patterns Feida Zhu, Xifeng Yan, Jiawei Han, Philip S. Yu ICDM 2007.
1 Approximate XML Query Answers Presenter: Hongyu Guo Authors: N. polyzotis, M. Garofalakis, Y. Ioannidis.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Content Based Color Image Retrieval vi Wavelet Transformations Information Retrieval Class Presentation May 2, 2012 Author: Mrs. Y.M. Latha Presenter:
Making Path-Consistency Stronger for SAT Pavel Surynek Faculty of Mathematics and Physics Charles University in Prague Czech Republic.
Comparison of Image Registration Methods David Grimm Joseph Handfield Mahnaz Mohammadi Yushan Zhu March 18, 2004.
Advanced Gene Selection Algorithms Designed for Microarray Datasets Limitation of current feature selection methods: –Ignores gene/gene interaction: single.
Content-Based Image Retrieval Using Color Space Transformation and Wavelet Transform Presented by Tienwei Tsai Department of Information Management Chihlee.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Chun Kai Chen Author : Andrew.
1 A Methodology for automatic retrieval of similarly shaped machinable components Mark Ascher - Dept of ECE.
1 Substructure Similarity Search in Graph Databases R 陳芃安.
Outline Introduction State-of-the-art solutions
Privacy Preserving Subgraph Matching on Large Graphs in Cloud
Access Maintaining and Querying a Database
Graph Search with Indexing
Privacy Preserving Subgraph Matching on Large Graphs in Cloud
Mining and Searching Graphs in Biological Databases
Efficient Subgraph Similarity All-Matching
MCN: A New Semantics Towards Effective XML Keyword Search
FP-Growth Wenlong Zhang.
15-826: Multimedia Databases and Data Mining
Information Retrieval and Web Design
Presentation transcript:

Xifeng Yan Philip S. Yu Jiawei Han SIGMOD 2005 Substructure Similarity Search in Graph Databases

Outlines Motivation Objectives Methodology - Grafil - Feature-based structural filtering - Feature Set Selection Experiments Conclusions 2

Motivation Exact matching is often too restrictive, similarity search of complex structures becomes a vital operation. Substructure similarity computation is very expensive, practically it is not affordable in a large database. 3

Objectives Substructure similarity search using indexed features. Transforming the edge relaxation ratio of a query graph into the maximum allowed missing features, called Grafil. Filter many graphs without performing pairwise similarity computations. 4

Search Categories 5 Full structure search: find the structure exactly the same as the query graph. Substructure search: find structures that contain the query graph. Full structure similarity search: find structures that are similar to the query graph.

Methodology 6

Grafil (Graph Similarity Filtering) 7

Example: Relaxation Ratio 8

Feature-based structural filtering 9

10

11 No matter which edge is relaxed, the relaxed query graph should have at least two occurrences of these features. (upper bound of feature misses is written as ) Relaxed query graph may miss at most four occurrences of these features in comparison with the original query graph which have six occurrences: one fa, one fb, four fc. We can discard graphs that do not contain at least two occurrences of these features.

Feature Graph Matrix Index 12

Substructure similarity search divide into four part 13

Feature Set Selection 14

15

16

Experiment 17

Conclusions 18 Explored the filtering algorithm using indexed structural patterns, without doing costly structure comparisons. We identify the criteria to form effective feature sets for filtering, and combining features with similar size and selectivity can improve the filtering and search performance significantly.