15-826: Multimedia Databases and Data Mining

Slides:



Advertisements
Similar presentations
CMU SCS : Multimedia Databases and Data Mining Lecture #17: Text - part IV (LSI) C. Faloutsos.
Advertisements

CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #4: Multi-key and Spatial Access Methods - I C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #19: SVD - part II (case studies) C. Faloutsos.
Nearest Neighbor Search
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #10: Fractals - case studies - I C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture#5: Multi-key and Spatial Access Methods - II C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture#2: Primary key indexing – B-trees Christos Faloutsos - CMU
        iDistance -- Indexing the Distance An Efficient Approach to KNN Indexing C. Yu, B. C. Ooi, K.-L. Tan, H.V. Jagadish. Indexing the distance:
Searching on Multi-Dimensional Data
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
15-826: Multimedia Databases and Data Mining
Multimedia DBs. Multimedia dbs A multimedia database stores text, strings and images Similarity queries (content based retrieval) Given an image find.
CMU SCS : Multimedia Databases and Data Mining Lecture #16: Text - part III: Vector space model and clustering C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals - case studies Part III (regions, quadtrees, knn queries) C. Faloutsos.
Multidimensional Data. Many applications of databases are "geographic" = 2­dimensional data. Others involve large numbers of dimensions. Example: data.
Spatio-Temporal Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases …..
Spatio-Temporal Databases. Introduction Spatiotemporal Databases: manage spatial data whose geometry changes over time Geometry: position and/or extent.
10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos.
R-tree Analysis. R-trees - performance analysis How many disk (=node) accesses we’ll need for range nn spatial joins why does it matter?
Spatial Indexing. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Conclusions C. Faloutsos.
Introduction to Fractals and Fractal Dimension Christos Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #8: Fractals - introduction C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #10: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
Indexing for Multidimensional Data An Introduction.
Multidimensional Indexes Applications: geographical databases, data cubes. Types of queries: –partial match (give only a subset of the dimensions) –range.
CMU SCS : Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos.
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Spatial Access Methods - z-ordering.
CMU SCS : Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos.
R-trees: An Average Case Analysis. R-trees - performance analysis How many disk (=node) accesses we ’ ll need for range nn spatial joins why does it matter?
Spatio-Temporal Databases
The Power-Method: A Comprehensive Estimation Technique for Multi- Dimensional Queries Yufei Tao U. Hong Kong Christos Faloutsos CMU Dimitris Papadias Hong.
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
Carnegie Mellon Data Mining – Research Directions C. Faloutsos CMU
Similarity Search without Tears: the OMNI- Family of All-Purpose Access Methods Michael Kelleher Kiyotaka Iwataki The Department of Computer and Information.
Dense-Region Based Compact Data Cube
Indexing Multidimensional Data
Spatial Data Management
Strategies for Spatial Joins
15-826: Multimedia Databases and Data Mining
Next Generation Data Mining Tools: SVD and Fractals
Fast nearest neighbor searches in high dimensions Sami Sieranoja
Spatial Indexing.
Mean Shift Segmentation
Indexing and Data Mining in Multimedia Databases
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
Multidimensional Indexes
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
CS5112: Algorithms and Data Structures for Applications
Lecture 03: K-nearest neighbors
15-826: Multimedia Databases and Data Mining
R-trees: An Average Case Analysis
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
C. Faloutsos Spatial Access Methods - R-trees
Presentation transcript:

15-826: Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos

Copyright: C. Faloutsos (2017) Must-read Material Alberto Belussi and Christos Faloutsos, Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension Proc. of VLDB, p. 299-310, 1995 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Optional Material Optional, but very useful: Manfred Schroeder Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991 15-826 Copyright: C. Faloutsos (2017)

Outline Goal: ‘Find similar / interesting things’ Intro to DB C. Faloutsos 15-826 Optional Outline Goal: ‘Find similar / interesting things’ Intro to DB Indexing - similarity search Data Mining Optional = NOT in exam (but useful as mental drill!) 15-826 Copyright: C. Faloutsos (2017)

Indexing - Detailed outline Optional Indexing - Detailed outline primary key indexing secondary key / multi-key indexing spatial access methods z-ordering R-trees misc fractals intro applications text ... 15-826 Copyright: C. Faloutsos (2017)

Indexing - Detailed outline Optional Indexing - Detailed outline fractals intro applications disk accesses for R-trees (range queries) dimensionality reduction dim. curse revisited quad-tree analysis [Gaede+] nn queries [Belussi+] ✔ ✔ ✔ 15-826 Copyright: C. Faloutsos (2017)

Fractals and Quadtrees Optional Fractals and Quadtrees Problem: how many quadtree nodes will we need, to store a region in some level of approximation? [Gaede+96] Faloutsos, C. and V. Gaede (Sept. 1996). Analysis of the z-ordering Method Using the Hausdorff Fractal Dimension. VLDB, Bombay, India. 15-826 Copyright: C. Faloutsos (2017)

Fractals and Quadtrees Optional Fractals and Quadtrees I.e.: 15-826 Copyright: C. Faloutsos (2017)

Fractals and Quadtrees Optional Fractals and Quadtrees I.e.: ? # of quadtree ‘blocks’ (= # gray nodes) level of quadtree 15-826 Copyright: C. Faloutsos (2017)

Fractals and Quadtrees Optional Fractals and Quadtrees Datasets: Brain Atlas Franconia 15-826 Copyright: C. Faloutsos (2017)

Fractals and Quadtrees Optional Fractals and Quadtrees Hint: assume that the boundary is self-similar, with a given fd how will the quad-tree (oct-tree) look like? 15-826 Copyright: C. Faloutsos (2017)

Fractals and Quadtrees Optional Fractals and Quadtrees white gray black 15-826 Copyright: C. Faloutsos (2017)

Fractals and Quadtrees Optional Fractals and Quadtrees Let pg(i) the prob. to find a gray node at level i. If self-similar, what can we say for pg(i) ? 15-826 Copyright: C. Faloutsos (2017)

Fractals and Quadtrees Optional Fractals and Quadtrees Let pg(i) the prob. to find a gray node at level i. If self-similar, what can we say for pg(i) ? A: pg(i) = pg= constant 15-826 Copyright: C. Faloutsos (2017)

Fractals and Quadtrees Optional Fractals and Quadtrees Assume only ‘gray’ and ‘white’ nodes (ie., no volume’) Assume that pg is given - how many gray nodes at level i? 15-826 Copyright: C. Faloutsos (2017)

Fractals and Quadtrees Optional Fractals and Quadtrees Assume only ‘gray’ and ‘white’ nodes (ie., no volume’) Assume that pg is given - how many gray nodes at level i? A: 1 at level 0; 4*pg (4*pg)* (4*pg) ... (4*pg ) i … 15-826 Copyright: C. Faloutsos (2017)

Fractals and Quadtrees Optional Fractals and Quadtrees I.e.: ? (4*pg) i # of quadtree ‘blocks’ level of quadtree (‘i’) 15-826 Copyright: C. Faloutsos (2017)

Fractals and Quadtrees Optional Fractals and Quadtrees I.e.: ? log[(4*pg) i] log(# of quadtree ‘blocks’) level of quadtree 15-826 Copyright: C. Faloutsos (2017)

Fractals and Quadtrees Optional Fractals and Quadtrees Conclusion: Self-similarity leads to easy and accurate estimation log2(#blocks) level 15-826 Copyright: C. Faloutsos (2017)

Fractals and Quadtrees Optional Fractals and Quadtrees Conclusion: Self-similarity leads to easy and accurate estimation log2(#blocks) level 15-826 Copyright: C. Faloutsos (2017)

Fractals and Quadtrees Optional Fractals and Quadtrees 15-826 Copyright: C. Faloutsos (2017)

Fractals and Quadtrees Optional Fractals and Quadtrees log(#blocks) level 15-826 Copyright: C. Faloutsos (2017)

Fractals and Quadtrees Optional Fractals and Quadtrees Final observation: relationship between pg and fractal dimension? 15-826 Copyright: C. Faloutsos (2017)

Fractals and Quadtrees Optional Fractals and Quadtrees Final observation: relationship between pg and fractal dimension? A: very close: (4*pg)i = # of gray nodes at level i = # of Hausdorff grid-cells of side (1/2)i = r Eventually: DH = 2 + log2( pg ) and, for E-d spaces: DH = E + log2( pg ) 15-826 Copyright: C. Faloutsos (2017)

Fractals and Quadtrees Optional Fractals and Quadtrees for E-d spaces: DH = E + log2( pg ) Sanity check: - point in 2-d: DH = 0 pg = ?? line in 2-d: DH = 1 pg = ?? plane in 2-d: DH=2 pg = ?? point in 3-d: DH = 0 pg = ?? 15-826 Copyright: C. Faloutsos (2017)

Fractals and Quadtrees Optional Fractals and Quadtrees for E-d spaces: DH = E + log2( pg ) Sanity check: - point in 2-d: DH = 0 pg = 1/4 - line in 2-d: DH = 1 pg = 1/2 plane in 2-d: DH=2 pg= 1 point in 3-d: DH = 0 pg = 1/8 15-826 Copyright: C. Faloutsos (2017)

Fractals and Quadtrees Optional Fractals and Quadtrees Final conclusions: self-similarity leads to estimates for # of z-values = # of quadtree/oct-tree blocks close dependence on the Hausdorff fractal dimension of the boundary 15-826 Copyright: C. Faloutsos (2017)

Indexing - Detailed outline fractals intro applications disk accesses for R-trees (range queries) dimensionality reduction dim. curse revisited quad-tree analysis [Gaede+] nn queries [Belussi+] ✔ ✔ ✔ ✔ 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) NN queries Q: in NN queries, what is the effect of the shape of the query region? [Belussi+95] L2 Linf r L1 15-826 Copyright: C. Faloutsos (2017)

NN queries Q: in NN queries, what is the effect of the shape of the query region? that is, for L2, and self-similar data: log(#pairs-within(<=d)) D2 L2 r log(d) 15-826 Copyright: C. Faloutsos (2017)

NN queries Q: What about L1, Linf? log(#pairs-within(<=d)) D2 L2 r log(d) 15-826 Copyright: C. Faloutsos (2017)

NN queries Q: What about L1, Linf? A: Same slope, different intercept log(#pairs-within(<=d)) D2 L2 r log(d) 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) NN queries Q: What about L1, Linf? A: Same slope, different intercept log(#neighbors) log(d) 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Optional SKIP NN queries Q: what about the intercept? Ie., what can we say about N2 and Ninf Ninf neighbors N2 neighbors Linf r L2 r volume: V2 volume: Vinf 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Optional SKIP NN queries Consider sphere with volume Vinf and r’ radius Ninf neighbors N2 neighbors r’ Linf L2 r r volume: V2 volume: Vinf 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Optional SKIP NN queries Consider sphere with volume Vinf and r’ radius (r/r’)^E = V2 / Vinf (r/r’)^D2 = N2 / N2’ N2’ = Ninf (since shape does not matter) and finally: 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) Optional SKIP NN queries ( N2 / Ninf ) ^ 1/D2 = (V2 / Vinf) ^ 1/E 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) NN queries Conclusions: for self-similar datasets Avg # neighbors: grows like (distance)^D2 , regardless of query shape (circle, diamond, square, e.t.c. ) 15-826 Copyright: C. Faloutsos (2017)

Indexing - Detailed outline fractals intro applications disk accesses for R-trees (range queries) dimensionality reduction dim. curse revisited quad-tree analysis [Gaede+] nn queries [Belussi+] Conclusions 15-826 Copyright: C. Faloutsos (2017)

Fractals - overall conclusions self-similar datasets: appear often powerful tools: correlation integral, NCDF, rank-frequency plot intrinsic/fractal dimension helps in estimations (selectivities, quadtrees, etc) dim. reduction / dim. curse (later: can help in image compression...) 15-826 Copyright: C. Faloutsos (2017)

Copyright: C. Faloutsos (2017) 15-826 References Belussi, A. and C. Faloutsos (Sept. 1995). Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension. Proc. of VLDB, Zurich, Switzerland. Faloutsos, C. and V. Gaede (Sept. 1996). Analysis of the z-ordering Method Using the Hausdorff Fractal Dimension. VLDB, Bombay, India. Proietti, G. and C. Faloutsos (March 23-26, 1999). I/O complexity for range queries on region data stored using an R-tree. International Conference on Data Engineering (ICDE), Sydney, Australia. 15-826 Copyright: C. Faloutsos (2017)