Presentation on theme: "Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj."— Presentation transcript:
Hierarchical Cellular Tree: An Efficient Indexing Scheme for Content-Based Retrieval on Multimedia Databases Serkan Kiranyaz and Moncef Gabbouj
Objective To present the technique of using a Hierarchical Cellular Tree (HCT) as an indexing scheme for content-based retrieval on multimedia databases.
Why is this technique important? Technological hardware and network improvements Daily usage of Internet Technique reduces costly I/O operations
HCT Overview Is a MAM(Metric Access Method) technique. Based off the M-tree Is a dynamic, cell-based, hierarchical structured indexing method Items are partitioned based on distances and stored within cells based on their similarity proximity Self-organized tree implemented via genetic programming principles
Indexing Technique Categories SAM (spatial access method) (dis-)similarity distance only measured through Euclidean distance. o Not suited for deep spanning trees MAM (metric access method) Support black box approach to (dis- )similarity distance. o Allows for deep trees Do not support dynamic changes*
*M-tree Similarities Is a dynamic MAM Has a hierarchical structure based on the mitosis of a cell o Tree grows one level upwards whenever a split occurs at the top level Each cell is represented by a nucleus (except the top most cell)
M-tree Problems Achieves a balanced tree with low I/O cost in large datasets o Problem: Multimedia databases are seldom balanced at all. o HCT: Cells are unbalanced and can vary in size Must know the size of the database entries/Cells before building (capacity M) o Problem: All M-tree structures can hit upper limits (size non dynamic) o HCT: Removes limit on cell size as long as they keep a definite "compactness" measure
M-tree Problems M-tree compactness is only measured with respect to distance of nucleus to furthest object (covering radius) o Problem: Determining compactness this way does not allow for dynamic sizing of cells. o HCT: Uses all cell items and their minimum distances to the cell(instead of a single nucleus item alone), compactness is constantly being updated.
Related Work in Multimedia Databases (SAM trees) KD-Trees o Hierarchical tree structure o Use space-partitioning methods to divide the feature space into predefined hyperplanes R-Trees o Feature space divided according to distribution of database items o Region overlapping may occur
Related Work in Multimedia Databases (SAM trees) R*-trees o Improves the node splitting of R-tree by taking overlapping areas into consideration TV-tree o Uses telescope vectors o Authors call telescope vectors "so called telescope vectors" o Google search does not come up with anything meaningful for telescope vectors
Related Work in Multimedia Databases (SAM trees) X-tree o Avoids overlapping of region bounding boxes by using a new organization of the directory o Boxes can still intersect at higher levels in the tree o Paper does not go into detail on what a bounding box is (assumption bounding box = cell) SS-tree o Uses minimum bounding spheres instead of boxes o Less intersects at higher levels
Related Work in Multimedia Databases (MAM trees) vp-tree(vantage point) o organizes feature vectors(data points) into two groups according to their similarity distances with respect to a single point(vantage point) mvp-tree(multiple vantage point) o assigns multiple vantage points instead of one
HCT Structure - Cell Structure Basic container in which similar database items are stored. Ground level cells contain the entire database items Cells carry an MST (Minimum Spanning Tree) o Holds minimum (dis-)similarity distance of each item to other items within the cell. o Used to determine when mitosis should occur. Splits occur at longest branch. o This is actually very similar to MVP-tree except every cell is treated as a vantage point. Better idea about the similarity proximity of an item.
HCT Structure - Cell Structure Cells cannot undergo mitosis before reaching a specific level of maturity o This works like real cells o Reason for this is not like real cells Nucleus o Represents the owner cell of a higher level o Nucleus is found through MST Item with maximum number of branches o Nucleus is updated with every operation performed M-tree does not do this
HCT Structure - Cell Structure Cell Compactness o How tight focused the clustering for items within the cell o High variations are eliminated by using more than a single item(vantage point)
HCT Structure - Cell Structure Cell Mitosis o Two conditions for mitosis Maturity (N c > N m ) c = number of items in cell m = maturity minimum limit Cell Compactness (CF c > CThr L ) CF c = Compactness feature CThr L = current level compactness threshold o Cell Mitosis has no cost as the cell is simply split by breaking longest branch
HCT Structure - Level Structure Top level always single cell o If mitosis occurs on top level, new top level is created to preserve single cell top level. Each level attempts to dynamically maximize compactness of cells
HCT Structure - HCT Operations Three operations o Cell mitosis o Item insertion o Item removal As stated before all three operations cause a recalculation of Compactness
HCT Structure - HCT Operations Insert o First performs the Pre-Emptive cell search recursively descends HCT from top to target level o Once target located, insert item into target cell o Perform post-processing check Check for mitosis Recalculate compactness for single or multiple cells o If mitosis was performed Remove old nucleus item from higher level Consecutively call Insert for new nucleus
HCT Structure - HCT Indexing HCT can index using any set of available features o Must have fusion mechanism o Must have similarity measure Consists of two operations o Incremental construction o Optional periodic fitness check
HCT Structure - HCT Indexing HCT Incremental Construction o Takes a Database D and appends all new items contained in an Array o If an HCT does not already exist for database D All current items of D are inserted into the Array A new HCT body is constructed from D o Else if an HCT does exist for database D HCT body is first loaded HCT body is updated with contents of Array
HCT Structure - HCT Indexing HCT Fitness Check o Aims to minimize corruption which can happen during construction of HCT body Corruption happens because the order of items that are inserted is not handled o Outliers Check Reduces the "crowd effect" by removing redundant minority cells minority cells, cells with a few or one item in it All minority cells are reintroduced into the system to see if they fit into another cell
HCT Structure - HCT Indexing o Cell Merging If a cell merge occurs that is later deemed as not meeting the requirements of cell compactness it can be merged.