CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #11: Fractals - case studies Part III (regions, quadtrees, knn queries) C. Faloutsos.

Slides:



Advertisements
Similar presentations
CMU SCS : Multimedia Databases and Data Mining Lecture #17: Text - part IV (LSI) C. Faloutsos.
Advertisements

CMU SCS : Multimedia Databases and Data Mining Lecture #13: Power laws Potential causes and explanations C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #4: Multi-key and Spatial Access Methods - I C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #19: SVD - part II (case studies) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #10: Fractals - case studies - I C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture#5: Multi-key and Spatial Access Methods - II C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture#2: Primary key indexing – B-trees Christos Faloutsos - CMU
DIMENSIONALITY REDUCTION: FEATURE EXTRACTION & FEATURE SELECTION Principle Component Analysis.
Deepayan ChakrabartiCIKM F4: Large Scale Automated Forecasting Using Fractals -Deepayan Chakrabarti -Christos Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture#3: Primary key indexing – hashing C. Faloutsos.
Indexing and Data Mining in Multimedia Databases Christos Faloutsos CMU
CMU SCS : Multimedia Databases and Data Mining Lecture #25: Multimedia indexing C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
15-826: Multimedia Databases and Data Mining
CMU SCS : Multimedia Databases and Data Mining Lecture #16: Text - part III: Vector space model and clustering C. Faloutsos.
Is It Live, or Is It Fractal? Bergren Forum September 3, 2009 Addison Frey, Presenter.
CMU SCS Data Mining Meets Systems: Tools and Case Studies Christos Faloutsos SCS CMU.
Indexing and Data Mining in Multimedia Databases Christos Faloutsos CMU
CMU SCS Graph and stream mining Christos Faloutsos CMU.
R-tree Analysis. R-trees - performance analysis How many disk (=node) accesses we’ll need for range nn spatial joins why does it matter?
School of Computer Science Carnegie Mellon Boston U., 2005C. Faloutsos1 Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos.
Carnegie Mellon Powerful Tools for Data Mining Fractals, Power laws, SVD C. Faloutsos Carnegie Mellon University.
R-tree Analysis. R-trees - performance analysis How many disk (=node) accesses we’ll need for range nn spatial joins why does it matter?
Spatial Indexing. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Conclusions C. Faloutsos.
Introduction to Fractals and Fractal Dimension Christos Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #8: Fractals - introduction C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #8: Fractals - introduction C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #10: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
School of Computer Science Carnegie Mellon UIUC 04C. Faloutsos1 Advanced Data Mining Tools: Fractals and Power Laws for Graphs, Streams and Traditional.
CMU SCS : Multimedia Databases and Data Mining Lecture #9: Fractals – examples & algo’s C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #26: Compression - JPEG, MPEG, fractal C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos.
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Spatial Access Methods - z-ordering.
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos.
R-trees: An Average Case Analysis. R-trees - performance analysis How many disk (=node) accesses we ’ ll need for range nn spatial joins why does it matter?
The Power-Method: A Comprehensive Estimation Technique for Multi- Dimensional Queries Yufei Tao U. Hong Kong Christos Faloutsos CMU Dimitris Papadias Hong.
Presenters: Amool Gupta Amit Sharma. MOTIVATION Basic problem that it addresses?(Why) Other techniques to solve same problem and how this one is step.
School of Computer Science Carnegie Mellon WRIGHT, 2005C. Faloutsos1 Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
Carnegie Mellon Data Mining – Research Directions C. Faloutsos CMU
Similarity Search without Tears: the OMNI- Family of All-Purpose Access Methods Michael Kelleher Kiyotaka Iwataki The Department of Computer and Information.
15-826: Multimedia Databases and Data Mining
Next Generation Data Mining Tools: SVD and Fractals
Spatial Indexing.
Indexing and Data Mining in Multimedia Databases
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
Lecture 03: K-nearest neighbors
15-826: Multimedia Databases and Data Mining
R-trees: An Average Case Analysis
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
Presentation transcript:

CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals - case studies Part III (regions, quadtrees, knn queries) C. Faloutsos

CMU SCS Copyright: C. Faloutsos (2013)2 Must-read Material Alberto Belussi and Christos Faloutsos, Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension Proc. of VLDB, p , 1995 Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension

CMU SCS Copyright: C. Faloutsos (2013)3 Optional Material Optional, but very useful: Manfred Schroeder Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991

CMU SCS Copyright: C. Faloutsos (2013)4 Outline Goal: ‘Find similar / interesting things’ Intro to DB Indexing - similarity search Data Mining Optional

CMU SCS Copyright: C. Faloutsos (2013)5 Indexing - Detailed outline primary key indexing secondary key / multi-key indexing spatial access methods –z-ordering –R-trees –misc fractals –intro –applications text... Optional

CMU SCS Copyright: C. Faloutsos (2013)6 Indexing - Detailed outline fractals –intro –applications disk accesses for R-trees (range queries) dimensionality reduction selectivity in M-trees dim. curse revisited “fat fractals” quad-tree analysis [Gaede+] nn queries [Belussi+] ✔ ✔ ✔ ✔

CMU SCS Copyright: C. Faloutsos (2013)7 ‘Fat’ fractals & R-tree performance on region data Problem [Proietti+,’99] Given –N (# of data regions ) estimate how many of them will qualify for the average range query (q1 x q2 x... qE) Of course, we need more info Q: what? Optional

CMU SCS Copyright: C. Faloutsos (2013)8 R-tree performance on region data A: the distributions of their sizes Q: do we also need some info about the locations? Optional

CMU SCS Copyright: C. Faloutsos (2013)9 R-tree performance on region data A: the distributions of their sizes Q: do we also need some info about the locations? A: no (not for range queries) Optional

CMU SCS Copyright: C. Faloutsos (2013)10 R-tree performance on region data A: the distributions of their sizes Q: what exactly would we need? Optional

CMU SCS Copyright: C. Faloutsos (2013)11 R-tree performance on region data A: the distributions of their sizes Q: what exactly would we need? A: for self-similar regions (~ ‘fat’ fractals), we just need the slope of the Korcak law! (and the total area) [Proietti+] Optional

CMU SCS Copyright: C. Faloutsos (2013)12 More power laws: areas – Korcak’s law Scandinavian lakes Any pattern? Optional

CMU SCS Copyright: C. Faloutsos (2013)13 More power laws: areas – Korcak’s law Scandinavian lakes area vs complementary cumulative count (log-log axes) log(count( >= area)) log(area) B (patchiness exponent) Optional

CMU SCS Copyright: C. Faloutsos (2013)14 R-tree performance on regions Once we know ‘B’ (and the total area) we can second-guess the individual sizes and then apply the [Pagel+93] formula Bottom line: Optional

CMU SCS Copyright: C. Faloutsos (2013)15 R-tree performance on regions DatasetNAB LAKES81675, ISLANDS470136, REGIONS757190, Optional

CMU SCS Copyright: C. Faloutsos (2013)16 R-tree performance on regions query side sel. error Optional

CMU SCS Copyright: C. Faloutsos (2013)17 ‘Fat’ fractals - observation B: patchiness exp; d: embedding dim D H : Hausdorff of periphery B= D H / d Optional

CMU SCS Copyright: C. Faloutsos (2013)18 ‘Fat’ fractals - observation Optional

CMU SCS Copyright: C. Faloutsos (2013)19 ‘Fat’ fractals intuition behind B = D H / d ? Optional

CMU SCS Copyright: C. Faloutsos (2013)20 ‘Fat’ fractals intuition behind B = D H / d ? A: consider ‘flooding’: Optional

CMU SCS Copyright: C. Faloutsos (2013)21 ‘Fat’ fractals intuition behind B = D H / d ? A: consider ‘flooding’: Optional

CMU SCS Copyright: C. Faloutsos (2013)22 Conclusions ‘Fat’ fractals model regions well patchiness exp.: B = D H / d can help us estimate selectivities Optional

CMU SCS Copyright: C. Faloutsos (2013)23 Indexing - Detailed outline fractals –intro –applications disk accesses for R-trees (range queries) dimensionality reduction selectivity in M-trees dim. curse revisited “fat fractals” quad-tree analysis [Gaede+] nn queries [Belussi+] Optional ✔ ✔ ✔ ✔ ✔

CMU SCS Copyright: C. Faloutsos (2013)24 Fractals and Quadtrees Problem: how many quadtree nodes will we need, to store a region in some level of approximation? [Gaede+96] Optional

CMU SCS Copyright: C. Faloutsos (2013)25 Fractals and Quadtrees I.e.: Optional

CMU SCS Copyright: C. Faloutsos (2013)26 Fractals and Quadtrees I.e.: level of quadtree # of quadtree ‘blocks’ (= # gray nodes) ? Optional

CMU SCS Copyright: C. Faloutsos (2013)27 Fractals and Quadtrees Datasets: Franconia Brain Atlas Optional

CMU SCS Copyright: C. Faloutsos (2013)28 Fractals and Quadtrees Hint: –assume that the boundary is self-similar, with a given fd –how will the quad-tree (oct-tree) look like? Optional

CMU SCS Copyright: C. Faloutsos (2013)29 Fractals and Quadtrees white black gray Optional

CMU SCS Copyright: C. Faloutsos (2013)30 Fractals and Quadtrees Let p g (i) the prob. to find a gray node at level i. If self-similar, what can we say for p g (i) ? Optional

CMU SCS Copyright: C. Faloutsos (2013)31 Fractals and Quadtrees Let p g (i) the prob. to find a gray node at level i. If self-similar, what can we say for p g (i) ? A: p g (i) = p g = constant Optional

CMU SCS Copyright: C. Faloutsos (2013)32 Fractals and Quadtrees Assume only ‘gray’ and ‘white’ nodes (ie., no volume’) Assume that p g is given - how many gray nodes at level i? Optional

CMU SCS Copyright: C. Faloutsos (2013)33 Fractals and Quadtrees Assume only ‘gray’ and ‘white’ nodes (ie., no volume’) Assume that p g is given - how many gray nodes at level i? A: 1 at level 0; 4*p g (4*p g )* (4*p g )... (4*p g ) i … Optional

CMU SCS Copyright: C. Faloutsos (2013)34 Fractals and Quadtrees I.e.: level of quadtree (‘i’) # of quadtree ‘blocks’ ? (4*p g ) i Optional

CMU SCS Copyright: C. Faloutsos (2013)35 Fractals and Quadtrees I.e.: level of quadtree log(# of quadtree ‘blocks’) ? log[(4*p g ) i ] Optional

CMU SCS Copyright: C. Faloutsos (2013)36 Fractals and Quadtrees Conclusion: Self-similarity leads to easy and accurate estimation level log 2 (#blocks) Optional

CMU SCS Copyright: C. Faloutsos (2013)37 Fractals and Quadtrees Conclusion: Self-similarity leads to easy and accurate estimation level log 2 (#blocks) Optional

CMU SCS Copyright: C. Faloutsos (2013)38 Fractals and Quadtrees Optional

CMU SCS Copyright: C. Faloutsos (2013)39 Fractals and Quadtrees level log(#blocks) Optional

CMU SCS Copyright: C. Faloutsos (2013)40 Fractals and Quadtrees Final observation: relationship between p g and fractal dimension? Optional

CMU SCS Copyright: C. Faloutsos (2013)41 Fractals and Quadtrees Final observation: relationship between p g and fractal dimension? A: very close: (4*p g ) i = # of gray nodes at level i = # of Hausdorff grid-cells of side (1/2) i = r Eventually: D H = 2 + log 2 ( p g ) and, for E-d spaces: D H = E + log 2 ( p g ) Optional

CMU SCS Copyright: C. Faloutsos (2013)42 Fractals and Quadtrees for E-d spaces: D H = E + log 2 ( p g ) Sanity check: - point in 2-d: D H = 0 p g = ?? -line in 2-d: D H = 1 p g = ?? -plane in 2-d: D H =2 p g = ?? -point in 3-d: D H = 0 p g = ?? Optional

CMU SCS Copyright: C. Faloutsos (2013)43 Fractals and Quadtrees for E-d spaces: D H = E + log 2 ( p g ) Sanity check: - point in 2-d: D H = 0 p g = 1/4 - line in 2-d: D H = 1 p g = 1/2 -plane in 2-d: D H =2 p g = 1 -point in 3-d: D H = 0 p g = 1/8 Optional

CMU SCS Copyright: C. Faloutsos (2013)44 Fractals and Quadtrees Final conclusions: self-similarity leads to estimates for # of z- values = # of quadtree/oct-tree blocks close dependence on the Hausdorff fractal dimension of the boundary Optional

CMU SCS Copyright: C. Faloutsos (2013)45 Indexing - Detailed outline fractals –intro –applications disk accesses for R-trees (range queries) dimensionality reduction selectivity in M-trees dim. curse revisited “fat fractals” quad-tree analysis [Gaede+] nn queries [Belussi+] Optional ✔ ✔ ✔ ✔ ✔ ✔

CMU SCS Copyright: C. Faloutsos (2013)46 NN queries Q: in NN queries, what is the effect of the shape of the query region? [Belussi+95] r L inf L2L2 L1L1 Optional

CMU SCS Copyright: C. Faloutsos (2013)47 NN queries Q: in NN queries, what is the effect of the shape of the query region? that is, for L2, and self-similar data: log(d) log(#pairs-within(<=d)) r L2L2 D2D2

CMU SCS Copyright: C. Faloutsos (2013)48 NN queries Q: What about L 1, L inf ? log(d) log(#pairs-within(<=d)) r L2L2 D2D2

CMU SCS Copyright: C. Faloutsos (2013)49 NN queries Q: What about L 1, L inf ? A: Same slope, different intercept log(d) log(#pairs-within(<=d)) r L2L2 D2D2

CMU SCS Copyright: C. Faloutsos (2013)50 NN queries Q: What about L 1, L inf ? A: Same slope, different intercept log(d) log(#neighbors)

CMU SCS Copyright: C. Faloutsos (2013)51 NN queries Q: what about the intercept? Ie., what can we say about N 2 and N inf r L2L2 N 2 neighbors r L inf N inf neighbors volume: V 2 volume: V inf SKIP Optional

CMU SCS Copyright: C. Faloutsos (2013)52 NN queries Consider sphere with volume V inf and r’ radius r L2L2 N 2 neighbors r L inf N inf neighbors volume: V 2 volume: V inf r’ SKIP Optional

CMU SCS Copyright: C. Faloutsos (2013)53 NN queries Consider sphere with volume V inf and r’ radius (r/r’)^E = V 2 / V inf (r/r’)^D 2 = N 2 / N 2 ’ N 2 ’ = N inf (since shape does not matter) and finally: SKIP Optional

CMU SCS Copyright: C. Faloutsos (2013)54 NN queries ( N 2 / N inf ) ^ 1/D 2 = (V 2 / V inf ) ^ 1/E SKIP Optional

CMU SCS Copyright: C. Faloutsos (2013)55 NN queries Conclusions: for self-similar datasets Avg # neighbors: grows like (distance)^D 2, regardless of query shape (circle, diamond, square, e.t.c. ) Optional

CMU SCS Copyright: C. Faloutsos (2013)56 Indexing - Detailed outline fractals –intro –applications disk accesses for R-trees (range queries) dimensionality reduction selectivity in M-trees dim. curse revisited “fat fractals” quad-tree analysis [Gaede+] nn queries [Belussi+] –Conclusions Optional

CMU SCS Copyright: C. Faloutsos (2013)57 Fractals - overall conclusions self-similar datasets: appear often powerful tools: correlation integral, NCDF, rank-frequency plot intrinsic/fractal dimension helps in –estimations (selectivities, quadtrees, etc) –dim. reduction / dim. curse (later: can help in image compression...)

CMU SCS Copyright: C. Faloutsos (2013)58 References 1. Belussi, A. and C. Faloutsos (Sept. 1995). Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension. Proc. of VLDB, Zurich, Switzerland. 2. Faloutsos, C. and V. Gaede (Sept. 1996). Analysis of the z- ordering Method Using the Hausdorff Fractal Dimension. VLDB, Bombay, India. 3. Proietti, G. and C. Faloutsos (March 23-26, 1999). I/O complexity for range queries on region data stored using an R- tree. International Conference on Data Engineering (ICDE), Sydney, Australia.