Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.

Slides:



Advertisements
Similar presentations
Trees for spatial indexing
Advertisements

The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Nearest Neighbor Search
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
Multimedia Database Systems
Multidimensional Indexing
Fast Parallel Similarity Search in Multimedia Databases (Best Paper of ACM SIGMOD '97 international conference)
Access Methods for Advanced Database Applications.
Searching on Multi-Dimensional Data
1 NNH: Improving Performance of Nearest- Neighbor Searches Using Histograms Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research) Chen Li (UC Irvine)
Similarity Search for Adaptive Ellipsoid Queries Using Spatial Transformation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa (Nara.
Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.
Navigating Nets: Simple algorithms for proximity search Robert Krauthgamer (IBM Almaden) Joint work with James R. Lee (UC Berkeley)
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
SASH Spatial Approximation Sample Hierarchy
2-dimensional indexing structure
Algorithms for Nearest Neighbor Search Piotr Indyk MIT.
Multimedia DBs.
Spatial Indexing SAMs. Spatial Indexing Point Access Methods can index only points. What about regions? Z-ordering and quadtrees Use the transformation.
Multiple-key indexes Index on one attribute provides pointer to an index on the other. If V is a value of the first attribute, then the index we reach.
Dimensionality Reduction
Traditional Database Indexing Techniques for Video Database Indexing Jianping Fan Department of Computer Science University of North Carolina at Charlotte.
Multimedia DBs. Time Series Data
Spatial Indexing I Point Access Methods.
Spatial and Temporal Data Mining
Techniques and Data Structures for Efficient Multimedia Similarity Search.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept.
R-Trees 2-dimensional indexing structure. R-trees 2-dimensional version of the B-tree: B-tree of maximum degree 8; degree between 3 and 8 Internal nodes.
Dimensionality Reduction
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Spatial Indexing I Point Access Methods. Spatial Indexing Point Access Methods (PAMs) vs Spatial Access Methods (SAMs) PAM: index only point data Hierarchical.
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Content Based Image Retrieval Natalia.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Multimedia and Time-series Data
Spatial Data Management Chapter 28. Types of Spatial Data Point Data –Points in a multidimensional space E.g., Raster data such as satellite imagery,
The X-Tree An Index Structure for High Dimensional Data Stefan Berchtold, Daniel A Keim, Hans Peter Kriegel Institute of Computer Science Munich, Germany.
IMAGE DATABASES Prof. Hyoung-Joo Kim OOPSLA Lab. Computer Engineering Seoul National University.
Introduction to The NSP-Tree: A Space-Partitioning Based Indexing Method Gang Qian University of Central Oklahoma November 2006.
A Quantitative Analysis and Performance Study For Similar- Search Methods In High- Dimensional Space Presented By Umang Shah Koushik.
M- tree: an efficient access method for similarity search in metric spaces Reporter : Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU
CSC 211 Data Structures Lecture 13
Jaruloj Chongstitvatana Advanced Data Structures 1 Index Structures for Multimedia Data Feature-based Approach.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
Spatial Database 2/5/2011 Reference – Ramakrishna Gerhke and Silbershatz.
CS848 Similarity Search in Multimedia Databases Dr. Gisli Hjaltason Content-based Retrieval Using Local Descriptors: Problems and Issues from Databases.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept. of Electronic.
Database Systems Laboratory The Pyramid-Technique: Towards Breaking the Curse of Dimensionality Stefan Berchtold, Christian Bohm, and Hans-Peter Kriegal.
Content-Based Image Retrieval (CBIR) By: Victor Makarenkov Michael Marcovich Noam Shemesh.
Multi-dimensional Search Trees CS302 Data Structures Modified from Dr George Bebis.
Spatial Indexing Techniques Introduction to Spatial Computing CSE 5ISC Some slides adapted from Spatial Databases: A Tour by Shashi Shekhar Prentice Hall.
1 CSIS 7101: CSIS 7101: Spatial Data (Part 1) The R*-tree : An Efficient and Robust Access Method for Points and Rectangles Rollo Chan Chu Chung Man Mak.
Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Time Series databases Text databases.
DASFAA 2005, Beijing 1 Nearest Neighbours Search using the PM-tree Tomáš Skopal 1 Jaroslav Pokorný 1 Václav Snášel 2 1 Charles University in Prague Department.
Presenters: Amool Gupta Amit Sharma. MOTIVATION Basic problem that it addresses?(Why) Other techniques to solve same problem and how this one is step.
IMinMax B.C. Ooi, K.-L Tan, C. Yu, S. Stephen. Indexing the Edges -- A Simple and Yet Efficient Approach to High dimensional Indexing. ACM SIGMOD-SIGACT-
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
High-Dimensional Data. Topics Motivation Similarity Measures Index Structures.
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
1 R-Trees Guttman. 2 Introduction Range queries in multiple dimensions: Computer Aided Design (CAD) Geo-data applications Support special data objects.
Indexing Multidimensional Data
Spatial Data Management
SIMILARITY SEARCH The Metric Space Approach
Spatial Indexing I Point Access Methods.
Nearest Neighbor Queries using R-trees
K Nearest Neighbor Classification
15-826: Multimedia Databases and Data Mining
Locality Sensitive Hashing
Multidimensional Search Structures
Presentation transcript:

Multimedia DBs

Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find the images in the database that are similar (or you can “describe” the query image) Extract features, index in feature space, answer similarity queries using GEMINI Again, average values help! (Used QBIC –IBM Almaden)

Image Features Features extracted from an image are based on: Color distribution Shapes and structure …..

Images - color what is an image? A: 2-d RGB array

Images - color Color histograms, and distance function

Images - color Mathematically, the distance function between a vector x and a query q is: D(x, q) = (x-q) T A (x-q) =  a ij (x i -q i ) (x j -q j ) A=I ?

Images - color Problem: ‘cross-talk’: Features are not orthogonal -> SAMs will not work properly Q: what to do? A: feature-extraction question

Images - color possible answers: avg red, avg green, avg blue it turns out that this lower-bounds the histogram distance -> no cross-talk SAMs are applicable

Images - color performance: time selectivity w/ avg RGB seq scan

Images - shapes distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ (Q: how to normalize them?

Images - shapes distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ (Q: how to normalize them? A: divide by standard deviation)

Images - shapes distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ (Q: other ‘features’ / distance functions?

Images - shapes distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ (Q: other ‘features’ / distance functions? A1: turning angle A2: dilations/erosions A3:... )

Images - shapes distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ Q: how to do dim. reduction?

Images - shapes distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ Q: how to do dim. reduction? A: Karhunen-Loeve (= centered PCA/SVD)

Images - shapes Performance: ~10x faster # of features kept log(# of I/Os) all kept

Dimensionality Reduction Many problems (like time-series and image similarity) can be expressed as proximity problems in a high dimensional space Given a query point we try to find the points that are close… But in high-dimensional spaces things are different!

Effects of High-dimensionality Assume a uniformly distributed set of points in high dimensions [0,1] d Let’s have a query with length 0.1 in each dimension  query selectivity in 100-d If we want constant selectivity (0.1) the length of the side must be ~1!

Effects of High-dimensionality Surface is everything! Probability that a point is closer than 0.1 to a (d-1) dimensional surface D= D = 10 ~1 D=100 ~1

Effects of High-dimensionality Number of grid cells and surfaces Number of k-dimensional surfaces in a d- dimensional hypercube Binary partitioning  2 d cells Indexing in high-dimensions is extremely difficult “curse of dimensionality”

X-tree Performance impacted by the amount of overlap between index nodes Need to follow different paths Overlap, multi-overlap, weighted overlap R*-tree when overlap is small Sequential access when overlap is large When an overflow occurs Split into two nodes if overlap is small Otherwise create a super-node with twice the capacity Tradeoffs made locally over different regions of data space No performance comparisons with linear scan!

Pyramid Tree Designed for Range queries Map each d-dimensional point to 1-d value Build B+-tree on 1-d values A range query is transformed into a set of 1-d ranges More efficient than X-tree, Hilbert order, and sequential scan

Pyramid transformation pyramids 2d pyramids with top at center of data-space points in different pyramids ordered based on pyramid id points within a pyramid ordered based on height value(v) = pyramid(v) + height(v)

Vector Approximation (VA) file Tile d-dimensional data-space uniformly A fixed number of bits in each dimensions (8) 256 partitions along each dimension 256 d tiles Approximate each point by corresponding tile size of approximation = 8d bits = d bytes size of each point = 4d bytes (assuming a word per dimension) 2-step approach, the first using VA file

Simple NN searching δ = distance to kth NN so far For each approximation ai If lb(q,ai) < δ then Compute r = distance(q,vi) If r < δ then Add point i to the set of NNs Update δ Performance based on ordering of vectors and their approximations

Near-optimal NN searching δ = kth distant ub(q,a) so far For each approximation ai Compute lb(q,ai) and ub(q,ai) If lb(q,ai) <= δ then If ub(q,ai) < δ then Add point i to the set of NNs Update δ InsertHeap(Heap,lb(q,ai),i)

Near-optimal NN searching (2) δ = distance to kth NN so far Repeat Examine the next entry (li,i) from the heap If δ < li then break Else Compute r = distance(q,vi) If r < δ then Add point i to the set of NNs Update δ Forever Sub-linear (log n) vectors after first phase

SS-tree and SR-tree Use Spheres for index nodes (SS-tree) Higher fanout since storage cost is reduced Use rectangles and spheres for index nodes Index node defined by the intersection of two volumes More accurate representation of data Higher storage cost

Metric Tree (M-tree) Definition of a metric d(x,y) >= 0 d(x,y) = d(y,x) d(x,y) + d(y,z) >= d(x,z) d(x,x) = 0 Non-vector spaces Edit distance d(u,v) = sqrt ((u-v) T A(u-v) ) used in QBIC

Basic idea Index entry = (routing object, distance to parent,covering radius) x,d(x,p),r(x)y,d(y,p),r(y) Parent p x y All objects in subtree are within a distance of “covering radius” from routing object. z d(y,z) <= r(y)

Range queries x,d(x,p),r(x)y,d(y,p),r(y) Parent p x y z Query q with range t d(q,z) >= d(q,y) - d(y,z) d(y,z) <= r(y) So, d(q,z) >= d(q,y) -r(y) if d(q,y) - r(y) > t then d(q,z) > t Prune subtree y if d(q,y) - r(y) > t (C1) q t

Range queries x,d(x,p),r(x)y,d(y,p),r(y) Parent p x y z Query q with range t d(q,y) >= d(q,p) - d(p,y) d(q,y) >= d(p,y) - d(q,p) So, d(q,y) >= |d(q,p) - d(p,y)| if |d(q,p) - d(p,y)| - r(y) > t then d(q,y) - r(y) > t Prune subtree y if |d(q,p) - d(p,y)| - r(y) > t (C2) q t Prune subtree y if d(q,y) - r(y) > t (C1)

Range query algorithm RQ(q, t, Root, Subtrees S1, S2, …) For each subtree Si prune if condition C2 holds otherwise compute distance to root of Si and prune if condition C1 holds otherwise search the children of Si

Nearest neighbor query Maintain a priority list of k NN distances Minimum distance to a subtree with root x d min (q,x) = max(d(q,x) - r(x), 0) |d(q,p) - d(p,x)| - r(x) <= d(q,x) - r(x) may not need to compute d(q,x) Maximum distance to a subtree with root xd max (q,x) = d(q,x) + r(x) x q z r(x) d(q,z) + r(x) >= d(q,x) d(q,z) >= d(q,x) - r(x) d(q,z) <= d(q,x) + r(x)

Nearest neighbor query Maintain an estimate d p of the kth smallest maximum distance Prune a subtree x if d min (q,x) >= d p

References Christos Faloutsos, Ron Barber, Myron Flickner, Jim Hafner, Wayne Niblack, Dragutin Petkovic, William Equitz: Efficient and Effective Querying by Image Content. JIIS 3(3/4): (1994)Ron BarberMyron FlicknerJim HafnerWayne NiblackDragutin PetkovicWilliam EquitzJIIS 3 Stefan Berchtold, Daniel A. Keim, Hans-Peter Kriegel: The X-tree : An Index Structure for High-Dimensional Data. VLDB 1996: 28-39Daniel A. KeimHans-Peter KriegelVLDB 1996 Stefan Berchtold, Christian Böhm, Hans-Peter Kriegel: The Pyramid- Technique: Towards Breaking the Curse of Dimensionality. SIGMOD Conference 1998: Christian BöhmHans-Peter KriegelSIGMOD Conference 1998 Roger Weber, Hans-Jörg Schek, Stephen Blott: A Quantitative Analysis and Performance Study for Similarity-Search Methods in High- Dimensional Spaces. VLDB 1998: Hans-Jörg SchekStephen BlottVLDB 1998 Paolo Ciaccia, Marco Patella, Pavel Zezula: M-tree: An Efficient Access Method for Similarity Search in Metric Spaces. VLDB 1997: Marco PatellaPavel ZezulaVLDB 1997