Presentation is loading. Please wait.

Presentation is loading. Please wait.

Similarity Search on Bregman Divergence, Towards Non- Metric Indexing Zhenjie Zhang, Beng Chi Ooi, Srinivasan Parthasarathy, Anthony K. H. Tung.

Similar presentations


Presentation on theme: "Similarity Search on Bregman Divergence, Towards Non- Metric Indexing Zhenjie Zhang, Beng Chi Ooi, Srinivasan Parthasarathy, Anthony K. H. Tung."— Presentation transcript:

1 Similarity Search on Bregman Divergence, Towards Non- Metric Indexing Zhenjie Zhang, Beng Chi Ooi, Srinivasan Parthasarathy, Anthony K. H. Tung

2 Metric v.s. Non-Metric Euclidean distance dominates DB queries Similarity in human perception Metric distance is not enough! Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 2

3 Outline Bregman Divergence Solution  Basic solution  Better pruning bounds  Query distribution Experiments Conclusion Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 3

4 Bregman Divergence Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 4 qp Euclidean dist. convex function f(x) Bregman divergence D f (p,q) (q,f(q)) (p,f(p)) h

5 Bregman Divergence Mathematical Interpretation  The distance between p and q is defined as the difference between f(p) and the first order Taylor expansion at q Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 5 original f(x)first order Taylor expansion of f(x) at q

6 Bregman Divergence General Properties  Uniqueness A function f(x) uniquely decides the D f (p,q)  Non-Negativity D f (p,q)≥0 for any p, q  Identity D f (p,p)=0 for any p  Symmetry and Triangle Inequality Do NOT hold any more Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 6

7 Examples Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 7 Distancef(x)Df(p,q)Df(p,q)Usage KL-Divergencex logxp log (p/q)distribution, color histogram Itakura-Saito Distance -logxp/q-log (p/q)-1signal, speech Squared Euclidean x2x2 (p-q)2(p-q)2 traditional queries Von-Nuemann Entropy tr(X log X – X)tr(X logX – X logY – X + Y) symmetric matrix

8 Why in DB system? Database application  Retrieval of similar images, speech signals, or time series  Optimization on matrices in machine learning  Efficiency is important! Query Types  Nearest Neighbor Query  Range Query Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 8

9 Euclidean Space How to answer the queries  R-Tree Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 9

10 Euclidean Space How to answer the queries  VA File Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 10

11 Our goal Re-use the infrastructure of existing DB system to support Bregman divergence  Storage management  Indexing structures  Query processing algorithms Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 11

12 Outline Bregman Divergence Solution  Basic solution  Better pruning bounds  Query distribution Experiments Conclusion Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 12

13 Basic Solution Extended Space  Convex function f(x) = x Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 13 pointA1A1 A2A2 p01 q0.5 r10.8 t pointA1A1 A2A2 A3A3 p+p+ 011 q+q+ 0.5 r+r t+t

14 Basic Solution After the extension  Index extended points with R-Tree or VA File  Re-use existing algorithms with new lower and upper bound computation Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 14

15 How to improve? Reformulation of Bregman divergence Tighter bounds are derived No change on index construction or query processing algorithm Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 15

16 A New Formulation Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 16 qp D f (p,q)+Δ query vector v q D* f (p,q) h h’

17 Math. Interpretation Reformulation of similarity search queries  k-NN query: query q, data set P, divergence D f Find the point p, minimizing  Range query: query q, threshold θ, data set P Return any point p that Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 17

18 Naïve Bounds Check the corners of the bounding rectangles Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 18

19 Tighter Bounds Take the curve f(x) into consideration Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 19

20 Query distribution Distortion of rectangles  The difference between maximum and minimum distances from inside the rectangle to the query Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 20

21 Can we improve it more? When Building R-Tree in Euclidean space  Minimize the volume/edge length of MBRs  Does it remain valid? Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 21

22 Query distribution Distortion of bounding rectangles  Invariant in Euclidean space (triangle inequality)  Query-dependent for Bregman Divergence Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 22

23 Utilize Query Distribution Summarize query distribution with O(d) real number Estimation on expected distortion on any bounding rectangle in O(d) time Allows better index to be constructed for both R-Tree and VA File Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 23

24 Outline Bregman Divergence Solution  Basic solution  Better pruning bounds  Query distribution Experiments Conclusion Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 24

25 Experiments Data Sets  KDD’99 data Network data, the proportion of packages in 72 different TCP/IP connection Types  DBLP data Use co-authorship graph to generate the probabilities of the authors related to 8 different areas Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 25

26 Experiment Data Sets  Uniform Synthetic data Generate synthetic data with uniform distribution  Clustered Synthetic data Generate synthetic data with Gaussian Mixture Model Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 26

27 Experiments Methods to compare Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 27 BasicImproved Bounds Query Distribution R-TreeRR-BR-BQ VA FileVV-BV-BQ Linear ScanLS BB-TreeBBT

28 Experiments Index Construction Time Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 28

29 Experiments Varying dimensionality Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 29

30 Experiments Varying dimensionality (cont.) Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 30

31 Experiments Varying k for nearest neighbor query Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 31

32 Conclusion A general technique on similarity for Bregman Divergence All techniques are based on existing infrastructure of commercial database Extensive experiments to compare performances with R-Tree and VA File with different optimizations Similarity Search on Bregman Divergence: Towards Non-Metric Indexing 32

33 Acknowledgment Zhenjie Zhang, Anthony K. H. Tung and Beng Chin Ooi were supported by Singapore NRF grant R Srinivasan Parthasarathy was supported by NSF IIS (CAREER) and NSF CCF

34 Q & A


Download ppt "Similarity Search on Bregman Divergence, Towards Non- Metric Indexing Zhenjie Zhang, Beng Chi Ooi, Srinivasan Parthasarathy, Anthony K. H. Tung."

Similar presentations


Ads by Google