Multimedia DBs.

Slides:



Advertisements
Similar presentations
Dimensionality Reduction Techniques Dimitrios Gunopulos, UCR.
Advertisements

Trees for spatial indexing
The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Fast Algorithms For Hierarchical Range Histogram Constructions
CMU SCS : Multimedia Databases and Data Mining Lecture #25: Multimedia indexing C. Faloutsos.
1 Storage of images for Efficient Retrieval  Representing IDB as relations  straightforward  Representing IDB with spatial data structures  represent.
Progressive Computation of The Min-Dist Optimal-Location Query Donghui Zhang, Yang Du, Tian Xia, Yufei Tao* Northeastern University * Chinese University.
1 NNH: Improving Performance of Nearest- Neighbor Searches Using Histograms Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research) Chen Li (UC Irvine)
Similarity Search for Adaptive Ellipsoid Queries Using Spatial Transformation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa (Nara.
Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.
Mining Time Series.
Time Series Indexing II. Time Series Data
Liang Jin (UC Irvine) Nick Koudas (AT&T) Chen Li (UC Irvine)
Indexing Time Series. Time Series Databases A time series is a sequence of real numbers, representing the measurements of a real variable at equal time.
Indexing Time Series Based on Slides by C. Faloutsos (CMU) and D. Gunopulos (UCR)
Spatio-Temporal Databases
Dimensionality Reduction
Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.
0 Two-dimensional color images 2-D color image (QBIC) –Compute a k-element color histogram for each image 16×10 6 → 256 A: color-to-color similarity matrix.
Similarity Searches in Sequence Databases
Multimedia DBs. Time Series Data
1 ISI’02 Multidimensional Databases Challenge: representation for efficient storage, indexing & querying Examples (time-series, images) New multidimensional.
Spatial and Temporal Data Mining
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept.
San Diego, 06/12/03 San Diego, 06/12/03 Martin Pfeifle, Database Group, University of Munich Using Sets of Feature Vectors for Similarity Search on Voxelized.
Review. Time Series Data
E.G.M. PetrakisDimensionality Reduction1  Given N vectors in n dims, find the k most important axes to project them  k is user defined (k < n)  Applications:
A Multiresolution Symbolic Representation of Time Series
Dimensionality Reduction
Spatial and Temporal Databases Efficiently Time Series Matching by Wavelets (ICDE 98) Kin-pong Chan and Ada Wai-chee Fu.
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
Indexing Time Series.
Fast Subsequence Matching in Time-Series Databases Christos Faloutsos M. Ranganathan Yannis Manolopoulos Department of Computer Science and ISR University.
Exact Indexing of Dynamic Time Warping
Multimedia and Time-series Data
CH 14 Multimedia IR. Multimedia IR system The architecture of a Multimedia IR system depends on two main factors –The peculiar characteristics of multimedia.
Improving Min/Max Aggregation over Spatial Objects Donghui Zhang, Vassilis J. Tsotras University of California, Riverside ACM GIS’01.
PMLAB Finding Similar Image Quickly Using Object Shapes Heng Tao Shen Dept. of Computer Science National University of Singapore Presented by Chin-Yi Tsai.
IMAGE DATABASES Prof. Hyoung-Joo Kim OOPSLA Lab. Computer Engineering Seoul National University.
A Query Adaptive Data Structure for Efficient Indexing of Time Series Databases Presented by Stavros Papadopoulos.
Mining Time Series.
Shape-based Similarity Query for Trajectory of Mobile Object NTT Communication Science Laboratories, NTT Corporation, JAPAN. Yutaka Yanagisawa Jun-ichi.
Fast Subsequence Matching in Time-Series Databases Author: Christos Faloutsos etc. Speaker: Weijun He.
The Haar + Tree: A Refined Synopsis Data Structure Panagiotis Karras HKU, September 7 th, 2006.
Jaruloj Chongstitvatana Advanced Data Structures 1 Index Structures for Multimedia Data Feature-based Approach.
Identifying Patterns in Time Series Data Daniel Lewis 04/06/06.
E.G.M. PetrakisSearching Signals and Patterns1  Given a query Q and a collection of N objects O 1,O 2,…O N search exactly or approximately  The ideal.
Bin Yao (Slides made available by Feifei Li) R-tree: Indexing Structure for Data in Multi- dimensional Space.
2005/12/021 Content-Based Image Retrieval Using Grey Relational Analysis Dept. of Computer Engineering Tatung University Presenter: Tienwei Tsai ( 蔡殿偉.
2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )
Exact indexing of Dynamic Time Warping
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept. of Electronic.
Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Data Mining Multimedia Databases Text databases Image and.
An Image Retrieval Approach Based on Dominant Wavelet Features Presented by Te-Wei Chiang 2006/4/1.
Time Series Sequence Matching Jiaqin Wang CMPS 565.
Indexing Time Series. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Time Series databases Text databases.
Query by Image and Video Content: The QBIC System M. Flickner et al. IEEE Computer Special Issue on Content-Based Retrieval Vol. 28, No. 9, September 1995.
FastMap : Algorithm for Indexing, Data- Mining and Visualization of Traditional and Multimedia Datasets.
1 Reverse Nearest Neighbor Queries for Dynamic Databases SHOU Yu Tao Jan. 10 th, 2003 SIGMOD 2000.
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
ITree: Exploring Time-Varying Data using Indexable Tree Yi Gu and Chaoli Wang Michigan Technological University Presented at IEEE Pacific Visualization.
1 Spatial Query Processing using the R-tree Donghui Zhang CCIS, Northeastern University Feb 8, 2005.
Keogh, E. , Chakrabarti, K. , Pazzani, M. & Mehrotra, S. (2001)
Time Series Indexing II
Fast Approximate Query Answering over Sensor Data with Deterministic Error Guarantees Chunbin Lin Joint with Etienne Boursier, Jacque Brito, Yannis Katsis,
Histogram—Representation of Color Feature in Image Processing Yang, Li
15-826: Multimedia Databases and Data Mining
Improving Retrieval Performance of Zernike Moment Descriptor on Affined Shapes Dengsheng Zhang, Guojun Lu Gippsland School of Comp. & Info Tech Monash.
Similarity Search: A Matching Based Approach
Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research)
Presentation transcript:

Multimedia DBs

PAA and APCA Another approach: segment the time series into equal parts, store the average value for each part. Use an index to store the averages and the segment end points

Feature Spaces X X' DFT X X' DWT X X' SVD 20 40 60 80 100 120 140 X X' DFT X X' DWT 20 40 60 80 100 120 140 X X' SVD 20 40 60 80 100 120 140 eigenwave 0 eigenwave 1 eigenwave 2 eigenwave 3 eigenwave 4 eigenwave 5 eigenwave 6 eigenwave 7 Haar 0 Haar 1 Haar 2 Haar 3 Haar 4 Haar 5 Haar 6 Haar 7 1 2 3 4 5 6 7 Agrawal, Faloutsos, Swami 1993 Chan & Fu 1999 Korn, Jagadish, Faloutsos 1997

Piecewise Aggregate Approximation (PAA) value axis time axis Original time series (n-dimensional vector) S={s1, s2, …, sn} sv1 sv2 sv3 sv4 sv5 sv6 sv7 sv8 n’-segment PAA representation (n’-d vector) S = {sv1 , sv2, …, svn’ } PAA representation satisfies the lower bounding lemma (Keogh, Chakrabarti, Mehrotra and Pazzani, 2000; Yi and Faloutsos 2000)

Can we improve upon PAA? n’-segment PAA representation (n’-d vector) sv1 sv2 sv3 sv4 sv5 sv6 sv7 sv8 n’-segment PAA representation (n’-d vector) S = {sv1 , sv2, …, svN } Adaptive Piecewise Constant Approximation (APCA) sv1 sv2 sv3 sv4 sr1 sr2 sr3 sr4 n’/2-segment APCA representation (n’-d vector) S= { sv1, sr1, sv2, sr2, …, svM , srM } (M is the number of segments = n’/2)

APCA approximates original signal better than PAA Reconstruction error PAA Reconstruction error APCA Improvement factor = 1.69 3.77 1.21 1.03 3.02 1.75

APCA Representation can be computed efficiently Near-optimal representation can be computed in O(nlog(n)) time Optimal representation can be computed in O(n2M) (Koudas et al.)

Distance Measure Lower bounding distance DLB(Q,S) D(Q,S) Exact (Euclidean) distance D(Q,S) S Q’ Q DLB(Q’,S) DLB(Q’,S)

Index on 2M-dimensional APCA space R1 R3 R2 R4 R2 R3 R4 R1 S3 S4 S5 S6 S7 S8 S9 S2 S1 2M-dimensional APCA space S6 S5 S1 S2 S3 S4 S8 S7 S9 The k-nearest neighbor search traverse the nodes of the multidimensional index structure in the order of the distance from the query. We define the node distance as MINDIST which is the minimum distance from the query to any point in the node boundary. In case of Query Point Movement, MinDist is computed as the distance between the centroid of relevant points and the point P which is defined as in this equation. Any feature-based index structure can used (e.g., R-tree, X-tree, Hybrid Tree)

k-nearest neighbor Algorithm MINDIST(Q,R2) MINDIST(Q,R3) R1 S7 R3 R2 R4 S1 S2 S3 S5 S4 S6 S8 S9 Q MINDIST(Q,R4) The k-nearest neighbor search traverse the nodes of the multidimensional index structure in the order of the distance from the query. We define the node distance as MINDIST which is the minimum distance from the query to any point in the node boundary. In case of Query Point Movement, MinDist is computed as the distance between the centroid of relevant points and the point P which is defined as in this equation. For any node U of the index structure with MBR R, MINDIST(Q,R) £ D(Q,S) for any data item S under U

Index Modification for MINDIST Computation APCA point S= { sv1, sr1, sv2, sr2, …, svM, srM } smax3 smin3 R1 sv3 S2 S5 R3 S3 smax1 smin1 smax2 smin2 S1 S6 S4 sv1 R2 smax4 smin4 R4 S8 sv2 S9 sv4 S7 sr1 sr2 sr3 sr4 APCA rectangle S= (L,H) where L= { smin1, sr1, smin2, sr2, …, sminM, srM } and H = { smax1, sr1, smax2, sr2, …, smaxM, srM }

MBR Representation in time-value space We can view the MBR R=(L,H) of any node U as two APCA representations L= { l1, l2, …, l(N-1), lN } and H= { h1, h2, …, h(N-1), hN } REGION 2 H= { h1, h2, h3, h4 , h5, h6 } h1 h2 h3 h4 h5 h6 value axis time axis l3 l4 l6 l5 REGION 1 l1 l2 REGION 3 L= { l1, l2, l3, l4 , l5, l6 }

Regions l(2i-1) h(2i-1) h2i l(2i-2)+1 h3 h5 h2 h4 h6 l3 l1 l2 l4 l6 l5 REGION i l(2i-1) h(2i-1) h2i l(2i-2)+1 M regions associated with each MBR; boundaries of ith region: h3 h1 h5 h2 h4 h6 value axis time axis l3 l1 l2 l4 l6 l5 REGION 1 REGION 3 REGION 2

Regions ith region is active at time instant t if it spans across t The value st of any time series S under node U at time instant t must lie in one of the regions active at t (Lemma 2) REGION 2 t1 t2 h3 value axis h1 l3 REGION 3 h5 l1 l5 REGION 1 l2 l4 h2 h4 h6 l6 time axis

MINDIST Computation t1 minregion G active at t MINDIST(Q,G,t) For time instant t, MINDIST(Q, R, t) = minregion G active at t MINDIST(Q,G,t) t1 MINDIST(Q,R,t1) =min(MINDIST(Q, Region1, t1), MINDIST(Q, Region2, t1)) =min((qt1 - h1)2 , (qt1 - h3)2 ) =(qt1 - h1)2 REGION 2 h3 l3 h1 REGION 3 h5 l1 l5 REGION 1 l2 l4 h2 h4 h6 MINDIST(Q,R) = l6 Lemma3: MINDIST(Q,R) £ D(Q,C) for any time series C under node U

But there is one problem… what? Approximate Search A simpler definition of the distance in the feature space is the following: But there is one problem… what? DLB(Q’,S)

Multimedia dbs A multimedia database stores also images Again similarity queries (content based retrieval) Extract features, index in feature space, answer similarity queries using GEMINI Again, average values help!

Images - color what is an image? A: 2-d array

Images - color Color histograms, and distance function

Mathematically, the distance function is: Images - color Mathematically, the distance function is:

Images - color Problem: ‘cross-talk’: Features are not orthogonal -> SAMs will not work properly Q: what to do? A: feature-extraction question

Images - color possible answers: avg red, avg green, avg blue it turns out that this lower-bounds the histogram distance -> no cross-talk SAMs are applicable

Images - color time performance: seq scan w/ avg RGB selectivity

Images - shapes distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ (Q: how to normalize them?

Images - shapes distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ (Q: how to normalize them? A: divide by standard deviation)

Images - shapes distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ (Q: other ‘features’ / distance functions?

Images - shapes distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ (Q: other ‘features’ / distance functions? A1: turning angle A2: dilations/erosions A3: ... )

Images - shapes distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ Q: how to do dim. reduction?

Images - shapes distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ Q: how to do dim. reduction? A: Karhunen-Loeve (= centered PCA/SVD)

Images - shapes Performance: ~10x faster log(# of I/Os) all kept # of features kept