Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj.

Slides:



Advertisements
Similar presentations
1 DATA STRUCTURES USED IN SPATIAL DATA MINING. 2 What is Spatial data ? broadly be defined as data which covers multidimensional points, lines, rectangles,
Advertisements

2.6. B OUNDING V OLUME H IERARCHIES Overview of different forms of bounding volume hierarchy.
Trees for spatial indexing
On Reinsertions in M-tree Jakub Lokoč Tomáš Skopal Charles University in Prague Department of Software Engineering Czech Republic.
The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Clustering.
 Definition of B+ tree  How to create B+ tree  How to search for record  How to delete and insert a data.
Binary Trees CSC 220. Your Observations (so far data structures) Array –Unordered Add, delete, search –Ordered Linked List –??
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Searching on Multi-Dimensional Data
Pivoting M-tree: A Metric Access Method for Efficient Similarity Search Tomáš Skopal Department of Computer Science, VŠB-Technical.
B-Trees. Motivation for B-Trees Index structures for large datasets cannot be stored in main memory Storing it on disk requires different approach to.
Accessing Spatial Data
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Chapter 3: Data Storage and Access Methods
Techniques and Data Structures for Efficient Multimedia Similarity Search.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Birch: An efficient data clustering method for very large databases
Chapter 3: Cluster Analysis  3.1 Basic Concepts of Clustering  3.2 Partitioning Methods  3.3 Hierarchical Methods The Principle Agglomerative.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
 B+ Tree Definition  B+ Tree Properties  B+ Tree Searching  B+ Tree Insertion  B+ Tree Deletion.
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
1 SD-Rtree: A Scalable Distributed Rtree Witold Litwin & Cédric du Mouza & Philippe Rigaux.
The BIRCH Algorithm Davitkov Miroslav, 2011/3116
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
A Quantitative Analysis and Performance Study For Similar- Search Methods In High- Dimensional Space Presented By Umang Shah Koushik.
M- tree: an efficient access method for similarity search in metric spaces Reporter : Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU
Parallel dynamic batch loading in the M-tree Jakub Lokoč Department of Software Engineering Charles University in Prague, FMP.
Nearest Neighbor Queries Chris Buzzerd, Dave Boerner, and Kevin Stewart.
BIRCH: Balanced Iterative Reducing and Clustering Using Hierarchies A hierarchical clustering method. It introduces two concepts : Clustering feature Clustering.
CURE: EFFICIENT CLUSTERING ALGORITHM FOR LARGE DATASETS VULAVALA VAMSHI PRIYA.
1 Multi-Level Indexing and B-Trees. 2 Statement of the Problem When indexes grow too large they have to be stored on secondary storage. However, there.
R-Trees: A Dynamic Index Structure For Spatial Searching Antonin Guttman.
An Efficient CUDA Implementation of the Tree-Based Barnes Hut n-body Algorithm By Martin Burtscher and Keshav Pingali Jason Wengert.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
Database Management Systems, R. Ramakrishnan 1 Algorithms for clustering large datasets in arbitrary metric spaces.
Other Clustering Techniques
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
DASFAA 2005, Beijing 1 Nearest Neighbours Search using the PM-tree Tomáš Skopal 1 Jaroslav Pokorný 1 Václav Snášel 2 1 Charles University in Prague Department.
AQWA Adaptive Query-Workload-Aware Partitioning of Big Spatial Data Dimosthenis Stefanidis Stelios Nikolaou.
BINARY TREES Objectives Define trees as data structures Define the terms associated with trees Discuss tree traversal algorithms Discuss a binary.
Temporal Indexing MVBT. Temporal Indexing Transaction time databases : update the last version, query all versions Queries: “Find all employees that worked.
Presenters: Amool Gupta Amit Sharma. MOTIVATION Basic problem that it addresses?(Why) Other techniques to solve same problem and how this one is step.
HIERARCHICAL CELLULAR TREE (HCT) AN EFFICIENT INDEXING METHOD FOR CONTENT-BASED RETRIEVAL ON MULTIMEDIA DATABASES By Serkan KIRANYAZ TUT, Finland.
Spatial Data Management
Hierarchical Clustering: Time and Space requirements
Bounding Volume Hierarchies and Spatial Partitioning
Indexing and hashing.
Multiway Search Trees Data may not fit into main memory
Data Science Algorithms: The Basic Methods
Tree-Structured Indexes
Dynamic Hashing (Chapter 12)
Chapter 25: Advanced Data Types and New Applications
Bounding Volume Hierarchies and Spatial Partitioning
Spatial Indexing I Point Access Methods.
Temporal Indexing MVBT.
Temporal Indexing MVBT.
Indexing and Hashing Basic Concepts Ordered Indices
B+Trees The slides for this text are organized into chapters. This lecture covers Chapter 9. Chapter 1: Introduction to Database Systems Chapter 2: The.
The BIRCH Algorithm Davitkov Miroslav, 2011/3116
Birch presented by : Bahare hajihashemi Atefeh Rahimi
Indexing 4/11/2019.
BIRCH: Balanced Iterative Reducing and Clustering Using Hierarchies
Tree-Structured Indexes
BIRCH: Balanced Iterative Reducing and Clustering using Hierarchies
Data Mining CSCI 307, Spring 2019 Lecture 23
Presentation transcript:

Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj

Objective To present the technique of using a Hierarchical Cellular Tree (HCT) as an indexing scheme for content-based retrieval on multimedia databases.

Why is this technique important? Technological hardware and network improvements Daily usage of Internet Technique reduces costly I/O operations

HCT Overview Is a MAM(Metric Access Method) technique. Based off the M-tree Is a dynamic, cell-based, hierarchical structured indexing method Items are partitioned based on distances and stored within cells based on their similarity proximity Self-organized tree implemented via genetic programming principles

Indexing Technique Categories SAM (spatial access method) (dis-)similarity distance only measured through Euclidean distance. o Not suited for deep spanning trees MAM (metric access method) Support black box approach to (dis- )similarity distance. o Allows for deep trees Do not support dynamic changes*

*M-tree Similarities Is a dynamic MAM Has a hierarchical structure based on the mitosis of a cell o Tree grows one level upwards whenever a split occurs at the top level Each cell is represented by a nucleus (except the top most cell)

M-tree Problems Achieves a balanced tree with low I/O cost in large datasets o Problem: Multimedia databases are seldom balanced at all. o HCT: Cells are unbalanced and can vary in size Must know the size of the database entries/Cells before building (capacity M) o Problem: All M-tree structures can hit upper limits (size non dynamic) o HCT: Removes limit on cell size as long as they keep a definite "compactness" measure

M-tree Problems M-tree compactness is only measured with respect to distance of nucleus to furthest object (covering radius) o Problem: Determining compactness this way does not allow for dynamic sizing of cells. o HCT: Uses all cell items and their minimum distances to the cell(instead of a single nucleus item alone), compactness is constantly being updated.

Related Work in Multimedia Databases (SAM trees) KD-Trees o Hierarchical tree structure o Use space-partitioning methods to divide the feature space into predefined hyperplanes R-Trees o Feature space divided according to distribution of database items o Region overlapping may occur

Related Work in Multimedia Databases (SAM trees) R*-trees o Improves the node splitting of R-tree by taking overlapping areas into consideration TV-tree o Uses telescope vectors o Authors call telescope vectors "so called telescope vectors" o Google search does not come up with anything meaningful for telescope vectors

Related Work in Multimedia Databases (SAM trees) X-tree o Avoids overlapping of region bounding boxes by using a new organization of the directory o Boxes can still intersect at higher levels in the tree o Paper does not go into detail on what a bounding box is (assumption bounding box = cell) SS-tree o Uses minimum bounding spheres instead of boxes o Less intersects at higher levels

Related Work in Multimedia Databases (MAM trees) vp-tree(vantage point) o organizes feature vectors(data points) into two groups according to their similarity distances with respect to a single point(vantage point) mvp-tree(multiple vantage point) o assigns multiple vantage points instead of one

HCT Structure - Cell Structure Basic container in which similar database items are stored. Ground level cells contain the entire database items Cells carry an MST (Minimum Spanning Tree) o Holds minimum (dis-)similarity distance of each item to other items within the cell. o Used to determine when mitosis should occur. Splits occur at longest branch. o This is actually very similar to MVP-tree except every cell is treated as a vantage point. Better idea about the similarity proximity of an item.

HCT Structure - Cell Structure Cells cannot undergo mitosis before reaching a specific level of maturity o This works like real cells o Reason for this is not like real cells Nucleus o Represents the owner cell of a higher level o Nucleus is found through MST Item with maximum number of branches o Nucleus is updated with every operation performed M-tree does not do this

HCT Structure - Cell Structure Cell Compactness o How tight focused the clustering for items within the cell o High variations are eliminated by using more than a single item(vantage point)

HCT Structure - Cell Structure Cell Mitosis o Two conditions for mitosis Maturity (N c > N m ) c = number of items in cell m = maturity minimum limit Cell Compactness (CF c > CThr L ) CF c = Compactness feature CThr L = current level compactness threshold o Cell Mitosis has no cost as the cell is simply split by breaking longest branch

HCT Structure - Cell Structure

HCT Structure - Level Structure Top level always single cell o If mitosis occurs on top level, new top level is created to preserve single cell top level. Each level attempts to dynamically maximize compactness of cells

HCT Structure - HCT Operations Three operations o Cell mitosis o Item insertion o Item removal As stated before all three operations cause a recalculation of Compactness

HCT Structure - HCT Operations Insert o First performs the Pre-Emptive cell search recursively descends HCT from top to target level o Once target located, insert item into target cell o Perform post-processing check Check for mitosis Recalculate compactness for single or multiple cells o If mitosis was performed Remove old nucleus item from higher level Consecutively call Insert for new nucleus

HCT Structure - HCT Indexing HCT can index using any set of available features o Must have fusion mechanism o Must have similarity measure Consists of two operations o Incremental construction o Optional periodic fitness check

HCT Structure - HCT Indexing HCT Incremental Construction o Takes a Database D and appends all new items contained in an Array o If an HCT does not already exist for database D All current items of D are inserted into the Array A new HCT body is constructed from D o Else if an HCT does exist for database D HCT body is first loaded HCT body is updated with contents of Array

HCT Structure - HCT Indexing HCT Fitness Check o Aims to minimize corruption which can happen during construction of HCT body Corruption happens because the order of items that are inserted is not handled o Outliers Check Reduces the "crowd effect" by removing redundant minority cells minority cells, cells with a few or one item in it All minority cells are reintroduced into the system to see if they fit into another cell

HCT Structure - HCT Indexing o Cell Merging If a cell merge occurs that is later deemed as not meeting the requirements of cell compactness it can be merged.

HCT - Examples

QA