Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multimedia DBs.

Similar presentations


Presentation on theme: "Multimedia DBs."— Presentation transcript:

1 Multimedia DBs

2 PAA and APCA Another approach: segment the time series into equal parts, store the average value for each part. Use an index to store the averages and the segment end points

3 Feature Spaces X X' DFT X X' DWT X X' SVD
20 40 60 80 100 120 140 X X' DFT X X' DWT 20 40 60 80 100 120 140 X X' SVD 20 40 60 80 100 120 140 eigenwave 0 eigenwave 1 eigenwave 2 eigenwave 3 eigenwave 4 eigenwave 5 eigenwave 6 eigenwave 7 Haar 0 Haar 1 Haar 2 Haar 3 Haar 4 Haar 5 Haar 6 Haar 7 1 2 3 4 5 6 7 Agrawal, Faloutsos, Swami 1993 Chan & Fu 1999 Korn, Jagadish, Faloutsos 1997

4 Piecewise Aggregate Approximation (PAA)
value axis time axis Original time series (n-dimensional vector) S={s1, s2, …, sn} sv1 sv2 sv3 sv4 sv5 sv6 sv7 sv8 n’-segment PAA representation (n’-d vector) S = {sv1 , sv2, …, svn’ } PAA representation satisfies the lower bounding lemma (Keogh, Chakrabarti, Mehrotra and Pazzani, 2000; Yi and Faloutsos 2000)

5 Can we improve upon PAA? n’-segment PAA representation (n’-d vector)
sv1 sv2 sv3 sv4 sv5 sv6 sv7 sv8 n’-segment PAA representation (n’-d vector) S = {sv1 , sv2, …, svN } Adaptive Piecewise Constant Approximation (APCA) sv1 sv2 sv3 sv4 sr1 sr2 sr3 sr4 n’/2-segment APCA representation (n’-d vector) S= { sv1, sr1, sv2, sr2, …, svM , srM } (M is the number of segments = n’/2)

6 APCA approximates original signal better than PAA
Reconstruction error PAA Reconstruction error APCA Improvement factor = 1.69 3.77 1.21 1.03 3.02 1.75

7 APCA Representation can be computed efficiently
Near-optimal representation can be computed in O(nlog(n)) time Optimal representation can be computed in O(n2M) (Koudas et al.)

8 Distance Measure Lower bounding distance DLB(Q,S)
D(Q,S) Exact (Euclidean) distance D(Q,S) S Q’ Q DLB(Q’,S) DLB(Q’,S)

9 Index on 2M-dimensional APCA space
R1 R3 R2 R4 R2 R3 R4 R1 S3 S4 S5 S6 S7 S8 S9 S2 S1 2M-dimensional APCA space S6 S5 S1 S2 S3 S4 S8 S7 S9 The k-nearest neighbor search traverse the nodes of the multidimensional index structure in the order of the distance from the query. We define the node distance as MINDIST which is the minimum distance from the query to any point in the node boundary. In case of Query Point Movement, MinDist is computed as the distance between the centroid of relevant points and the point P which is defined as in this equation. Any feature-based index structure can used (e.g., R-tree, X-tree, Hybrid Tree)

10 k-nearest neighbor Algorithm
MINDIST(Q,R2) MINDIST(Q,R3) R1 S7 R3 R2 R4 S1 S2 S3 S5 S4 S6 S8 S9 Q MINDIST(Q,R4) The k-nearest neighbor search traverse the nodes of the multidimensional index structure in the order of the distance from the query. We define the node distance as MINDIST which is the minimum distance from the query to any point in the node boundary. In case of Query Point Movement, MinDist is computed as the distance between the centroid of relevant points and the point P which is defined as in this equation. For any node U of the index structure with MBR R, MINDIST(Q,R) £ D(Q,S) for any data item S under U

11 Index Modification for MINDIST Computation
APCA point S= { sv1, sr1, sv2, sr2, …, svM, srM } smax3 smin3 R1 sv3 S2 S5 R3 S3 smax1 smin1 smax2 smin2 S1 S6 S4 sv1 R2 smax4 smin4 R4 S8 sv2 S9 sv4 S7 sr1 sr2 sr3 sr4 APCA rectangle S= (L,H) where L= { smin1, sr1, smin2, sr2, …, sminM, srM } and H = { smax1, sr1, smax2, sr2, …, smaxM, srM }

12 MBR Representation in time-value space
We can view the MBR R=(L,H) of any node U as two APCA representations L= { l1, l2, …, l(N-1), lN } and H= { h1, h2, …, h(N-1), hN } REGION 2 H= { h1, h2, h3, h4 , h5, h6 } h1 h2 h3 h4 h5 h6 value axis time axis l3 l4 l6 l5 REGION 1 l1 l2 REGION 3 L= { l1, l2, l3, l4 , l5, l6 }

13 Regions l(2i-1) h(2i-1) h2i l(2i-2)+1 h3 h5 h2 h4 h6 l3 l1 l2 l4 l6 l5
REGION i l(2i-1) h(2i-1) h2i l(2i-2)+1 M regions associated with each MBR; boundaries of ith region: h3 h1 h5 h2 h4 h6 value axis time axis l3 l1 l2 l4 l6 l5 REGION 1 REGION 3 REGION 2

14 Regions ith region is active at time instant t if it spans across t The value st of any time series S under node U at time instant t must lie in one of the regions active at t (Lemma 2) REGION 2 t1 t2 h3 value axis h1 l3 REGION 3 h5 l1 l5 REGION 1 l2 l4 h2 h4 h6 l6 time axis

15 MINDIST Computation t1 minregion G active at t MINDIST(Q,G,t)
For time instant t, MINDIST(Q, R, t) = minregion G active at t MINDIST(Q,G,t) t1 MINDIST(Q,R,t1) =min(MINDIST(Q, Region1, t1), MINDIST(Q, Region2, t1)) =min((qt1 - h1)2 , (qt1 - h3)2 ) =(qt1 - h1)2 REGION 2 h3 l3 h1 REGION 3 h5 l1 l5 REGION 1 l2 l4 h2 h4 h6 MINDIST(Q,R) = l6 Lemma3: MINDIST(Q,R) £ D(Q,C) for any time series C under node U

16 But there is one problem… what?
Approximate Search A simpler definition of the distance in the feature space is the following: But there is one problem… what? DLB(Q’,S)

17 Multimedia dbs A multimedia database stores also images Again similarity queries (content based retrieval) Extract features, index in feature space, answer similarity queries using GEMINI Again, average values help!

18 Images - color what is an image? A: 2-d array

19 Images - color Color histograms, and distance function

20 Mathematically, the distance function is:
Images - color Mathematically, the distance function is:

21 Images - color Problem: ‘cross-talk’:
Features are not orthogonal -> SAMs will not work properly Q: what to do? A: feature-extraction question

22 Images - color possible answers: avg red, avg green, avg blue
it turns out that this lower-bounds the histogram distance -> no cross-talk SAMs are applicable

23 Images - color time performance: seq scan w/ avg RGB selectivity

24 Images - shapes distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ (Q: how to normalize them?

25 Images - shapes distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ (Q: how to normalize them? A: divide by standard deviation)

26 Images - shapes distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ (Q: other ‘features’ / distance functions?

27 Images - shapes distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ (Q: other ‘features’ / distance functions? A1: turning angle A2: dilations/erosions A3: ... )

28 Images - shapes distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ Q: how to do dim. reduction?

29 Images - shapes distance function: Euclidean, on the area, perimeter, and 20 ‘moments’ Q: how to do dim. reduction? A: Karhunen-Loeve (= centered PCA/SVD)

30 Images - shapes Performance: ~10x faster log(# of I/Os) all kept
# of features kept


Download ppt "Multimedia DBs."

Similar presentations


Ads by Google