Introduction to Fractals and Fractal Dimension Christos Faloutsos.

Slides:



Advertisements
Similar presentations
FRACTAL DIMENSION OF BIOFILM IMAGES
Advertisements

CMU SCS : Multimedia Databases and Data Mining Lecture #19: SVD - part II (case studies) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #10: Fractals - case studies - I C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture#5: Multi-key and Spatial Access Methods - II C. Faloutsos.
Deepayan ChakrabartiCIKM F4: Large Scale Automated Forecasting Using Fractals -Deepayan Chakrabarti -Christos Faloutsos.
Indexing and Data Mining in Multimedia Databases Christos Faloutsos CMU
CMU SCS : Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
15-826: Multimedia Databases and Data Mining
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals - case studies Part III (regions, quadtrees, knn queries) C. Faloutsos.
Length, scale and dimension. Learning Goals Understand some of the issues underlying the definitions of arclength, area etc. See how the scaling properties.
Rules of Thumb for Information Acquisition from Large and Redundant Data Wolfgang Gatterbauer Database group University of Washington.
Chapter 9: Recursive Methods and Fractals E. Angel and D. Shreiner: Interactive Computer Graphics 6E © Addison-Wesley Mohan Sridharan Based on Slides.
10-603/15-826A: Multimedia Databases and Data Mining SVD - part II (more case studies) C. Faloutsos.
School of Computer Science Carnegie Mellon Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
Efficient Similarity Search in Sequence Databases Rakesh Agrawal, Christos Faloutsos and Arun Swami Leila Kaghazian.
Indexing and Data Mining in Multimedia Databases Christos Faloutsos CMU
On Power-Law Relationships of the Internet Topology CSCI 780, Fall 2005.
Analysis of the Internet Topology Michalis Faloutsos, U.C. Riverside (PI) Christos Faloutsos, CMU (sub- contract, co-PI) DARPA NMS, no
CMU SCS Graph and stream mining Christos Faloutsos CMU.
School of Computer Science Carnegie Mellon Boston U., 2005C. Faloutsos1 Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos.
Carnegie Mellon Powerful Tools for Data Mining Fractals, Power laws, SVD C. Faloutsos Carnegie Mellon University.
R-tree Analysis. R-trees - performance analysis How many disk (=node) accesses we’ll need for range nn spatial joins why does it matter?
Self-Similar through High-Variability: Statistical Analysis of Ethernet LAN Traffic at the Source Level Walter Willinger, Murad S. Taqqu, Robert Sherman,
Spatial Indexing. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
Data Mining using Fractals and Power laws
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Conclusions C. Faloutsos.
Modelling and Simulation 2008 A brief introduction to self-similar fractals.
CMU SCS : Multimedia Databases and Data Mining Lecture #8: Fractals - introduction C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #8: Fractals - introduction C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #10: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
School of Computer Science Carnegie Mellon UIUC 04C. Faloutsos1 Advanced Data Mining Tools: Fractals and Power Laws for Graphs, Streams and Traditional.
CMU SCS : Multimedia Databases and Data Mining Lecture #9: Fractals – examples & algo’s C. Faloutsos.
Fractals Basic Concept Entities are composed of features that are reproduced at different scales. And whole entities can be described as a sum of smaller.
CMU SCS : Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos.
School of Computer Science Carnegie Mellon Data Mining using Fractals (fractals for fun and profit) Christos Faloutsos Carnegie Mellon University.
1 Self Similar Traffic. 2 Self Similarity The idea is that something looks the same when viewed from different degrees of “magnification” or different.
University of Wisconsin-Milwaukee Geographic Information Science Geography 625 Intermediate Geographic Information Science Instructor: Changshan Wu Department.
School of Computer Science Carnegie Mellon Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Spatial Access Methods - z-ordering.
How Do “Real” Networks Look?
School of Computer Science Carnegie Mellon Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
R-trees: An Average Case Analysis. R-trees - performance analysis How many disk (=node) accesses we ’ ll need for range nn spatial joins why does it matter?
The Power-Method: A Comprehensive Estimation Technique for Multi- Dimensional Queries Yufei Tao U. Hong Kong Christos Faloutsos CMU Dimitris Papadias Hong.
23 1 Christian Böhm 1, Florian Krebs 2, and Hans-Peter Kriegel 2 1 University for Health Informatics and Technology, Innsbruck 2 University of Munich Optimal.
School of Computer Science Carnegie Mellon WRIGHT, 2005C. Faloutsos1 Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
Carnegie Mellon Data Mining – Research Directions C. Faloutsos CMU
SCS-CMU Data Mining Tools A crash course C. Faloutsos.
15-826: Multimedia Databases and Data Mining
Next Generation Data Mining Tools: SVD and Fractals
Spatial Indexing.
Indexing and Data Mining in Multimedia Databases
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
Feifei Li, Ching Chang, George Kollios, Azer Bestavros
15-826: Multimedia Databases and Data Mining
Data Mining using Fractals and Power laws
R-trees: An Average Case Analysis
15-826: Multimedia Databases and Data Mining
Presentation transcript:

Introduction to Fractals and Fractal Dimension Christos Faloutsos

Multi DB and D.M.Copyright: C. Faloutsos (2001)2 Intro to Fractals - Outline Motivation – 3 problems / case studies Definition of fractals and power laws Solutions to posed problems More examples Discussion - putting fractals to work! Conclusions – practitioner’s guide Appendix: gory details - boxcounting plots

Multi DB and D.M.Copyright: C. Faloutsos (2001)3 Road end-points of Montgomery county: Q : distribution? not uniform not Gaussian ??? no rules/pattern at all!? Problem #1: GIS - points

Multi DB and D.M.Copyright: C. Faloutsos (2001)4 Problem #2 - spatial d.m. Galaxies (Sloan Digital Sky Survey w/ B. Nichol) - ‘spiral’ and ‘elliptical’ galaxies (stores and households...) - patterns? - attraction/repulsion? - how many ‘spi’ within r from an ‘ell’?

Multi DB and D.M.Copyright: C. Faloutsos (2001)5 Problem #3: traffic disk trace (from HP - J. Wilkes); Web traffic time #bytes Poisson - how many explosions to expect? - queue length distr.?

Multi DB and D.M.Copyright: C. Faloutsos (2001)6 Common answer for all three: Fractals / self-similarities / power laws Seminal works from Hilbert, Minkowski, Cantor, Mandelbrot, (Hausdorff, Lyapunov, Ken Wilson, …)

Multi DB and D.M.Copyright: C. Faloutsos (2001)7 Road map Motivation – 3 problems / case studies Definition of fractals and power laws Solutions to posed problems More examples and tools Discussion - putting fractals to work! Conclusions – practitioner’s guide Appendix: gory details - boxcounting plots

Multi DB and D.M.Copyright: C. Faloutsos (2001)8 Definitions (cont’d) Simple 2-D figure has finite perimeter and finite area –area ≈ perimeter 2 Do all 2-D figures have finite perimeter, finite area, and area ≈ perimeter 2 ?

Multi DB and D.M.Copyright: C. Faloutsos (2001)9 What is a fractal? = self-similar point set, e.g., Sierpinski triangle:... zero area; infinite length!

Multi DB and D.M.Copyright: C. Faloutsos (2001)10 Definitions (cont’d) Paradox: Infinite perimeter ; Zero area! ‘dimensionality’: between 1 and 2 actually: Log(3)/Log(2) = 1.58…

Multi DB and D.M.Copyright: C. Faloutsos (2001)11 Definitions (cont’d) “self similar” – pieces look like whole independent of scale “dimension” dim = log(# self-similar pieces) / log(scale) –dim(line) = log(2)/log(2) = 1 –dim(square) = log(4)/log(2) = 2 –dim(cube) = log(8)/log(2) = 3

Multi DB and D.M.Copyright: C. Faloutsos (2001)12 Intrinsic (‘fractal’) dimension Q: fractal dimension of a line? A: 1 (= log(2)/log(2))

Multi DB and D.M.Copyright: C. Faloutsos (2001)13 Intrinsic (‘fractal’) dimension Q: dfn for a given set of points? yx

Multi DB and D.M.Copyright: C. Faloutsos (2001)14 Intrinsic (‘fractal’) dimension Q: fractal dimension of a line? A: nn ( <= r ) ~ r^1 (‘power law’: y=x^a) Q: fd of a plane? A: nn ( <= r ) ~ r^2 Fd = slope of (log(nn) vs log(r) )

Multi DB and D.M.Copyright: C. Faloutsos (2001)15 Intrinsic (‘fractal’) dimension Algorithm, to estimate it? Notice avg nn(<=r) is exactly tot#pairs(<=r) / (2*N) including ‘mirror’ pairs

Multi DB and D.M.Copyright: C. Faloutsos (2001)16 Dfn of fd: ONLY for a perfectly self-similar point set: =log(n)/log(f) = log(3)/log(2) = zero area; infinite length!

Multi DB and D.M.Copyright: C. Faloutsos (2001)17 Sierpinsky triangle log( r ) log(#pairs within <=r ) 1.58 == ‘correlation integral’

Multi DB and D.M.Copyright: C. Faloutsos (2001)18 Observations: Euclidean objects have integer fractal dimensions –point: 0 –lines and smooth curves: 1 –smooth surfaces: 2 fractal dimension -> roughness of the periphery

Multi DB and D.M.Copyright: C. Faloutsos (2001)19 Important properties fd = embedding dimension -> uniform pointset a point set may have several fd, depending on scale

Multi DB and D.M.Copyright: C. Faloutsos (2001)20 Road map Motivation – 3 problems / case studies Definition of fractals and power laws Solutions to posed problems More examples and tools Discussion - putting fractals to work! Conclusions – practitioner’s guide Appendix: gory details - boxcounting plots

Multi DB and D.M.Copyright: C. Faloutsos (2001)21 Cross-roads of Montgomery county: any rules? Problem #1: GIS points

Multi DB and D.M.Copyright: C. Faloutsos (2001)22 Solution #1 A: self-similarity -> fractals scale-free power-laws (y=x^a, F=C*r^(-2)) avg#neighbors(<= r ) = r^D log( r ) log(#pairs(within <= r)) 1.51

Multi DB and D.M.Copyright: C. Faloutsos (2001)23 Solution #1 A: self-similarity avg#neighbors(<= r ) ~ r^(1.51) log( r ) log(#pairs(within <= r)) 1.51

Multi DB and D.M.Copyright: C. Faloutsos (2001)24 Examples:MG county Montgomery County of MD (road end- points)

Multi DB and D.M.Copyright: C. Faloutsos (2001)25 Examples:LB county Long Beach county of CA (road end-points)

Multi DB and D.M.Copyright: C. Faloutsos (2001)26 Solution#2: spatial d.m. Galaxies ( ‘BOPS’ plot - [sigmod2000]) log(#pairs) log(r)

Multi DB and D.M.Copyright: C. Faloutsos (2001)27 Solution#2: spatial d.m. log(r) log(#pairs within <=r ) spi-spi spi-ell ell-ell slope - plateau! - repulsion!

Multi DB and D.M.Copyright: C. Faloutsos (2001)28 spatial d.m. log(r) log(#pairs within <=r ) spi-spi spi-ell ell-ell slope - plateau! - repulsion!

Multi DB and D.M.Copyright: C. Faloutsos (2001)29 spatial d.m. r1r2 r1 r2 Heuristic on choosing # of clusters

Multi DB and D.M.Copyright: C. Faloutsos (2001)30 spatial d.m. log(r) log(#pairs within <=r ) spi-spi spi-ell ell-ell slope - plateau! - repulsion!

Multi DB and D.M.Copyright: C. Faloutsos (2001)31 spatial d.m. log(r) log(#pairs within <=r ) spi-spi spi-ell ell-ell slope - plateau! -repulsion!! -duplicates

Multi DB and D.M.Copyright: C. Faloutsos (2001)32 Solution #3: traffic disk traces: self-similar: time #bytes

Multi DB and D.M.Copyright: C. Faloutsos (2001)33 Solution #3: traffic disk traces (80-20 ‘law’ = ‘multifractal’) time #bytes 20% 80%

Multi DB and D.M.Copyright: C. Faloutsos (2001)34 Solution#3: traffic Clarification: fractal: a set of points that is self-similar multifractal: a probability density function that is self-similar Many other time-sequences are bursty/clustered: (such as?)

Multi DB and D.M.Copyright: C. Faloutsos (2001)35 Tape accesses time Tape#1 Tape# N # tapes needed, to retrieve n records? (# days down, due to failures / hurricanes / communication noise...)

Multi DB and D.M.Copyright: C. Faloutsos (2001)36 Tape accesses time Tape#1 Tape# N # tapes retrieved # qual. records = Poisson real

Multi DB and D.M.Copyright: C. Faloutsos (2001)37 Fast estimation of fd(s): How, for the (correlation) fractal dimension? A: Box-counting plot: log( r ) r pi log(sum(pi ^2))

Multi DB and D.M.Copyright: C. Faloutsos (2001)38 Definitions pi : the percentage (or count) of points in the i-th cell r: the side of the grid

Multi DB and D.M.Copyright: C. Faloutsos (2001)39 Fast estimation of fd(s): compute sum(pi^2) for another grid side, r’ log( r ) r’ pi’ log(sum(pi ^2))

Multi DB and D.M.Copyright: C. Faloutsos (2001)40 Fast estimation of fd(s): etc; if the resulting plot has a linear part, its slope is the correlation fractal dimension D2 log( r ) log(sum(pi ^2))

Multi DB and D.M.Copyright: C. Faloutsos (2001)41 Hausdorff or box-counting fd: Box counting plot: Log( N ( r ) ) vs Log ( r) r: grid side N (r ): count of non-empty cells (Hausdorff) fractal dimension D0:

Multi DB and D.M.Copyright: C. Faloutsos (2001)42 Definitions (cont’d) Hausdorff fd: r log(r) log(#non-empty cells) D0D0

Multi DB and D.M.Copyright: C. Faloutsos (2001)43 Observations q=0: Hausdorff fractal dimension q=2: Correlation fractal dimension (identical to the exponent of the number of neighbors vs radius) q=1: Information fractal dimension

Multi DB and D.M.Copyright: C. Faloutsos (2001)44 Observations, cont’d in general, the Dq’s take similar, but not identical, values. except for perfectly self-similar point-sets, where Dq=Dq’ for any q, q’

Multi DB and D.M.Copyright: C. Faloutsos (2001)45 Examples:MG county Montgomery County of MD (road end- points)

Multi DB and D.M.Copyright: C. Faloutsos (2001)46 Examples:LB county Long Beach county of CA (road end-points)

Multi DB and D.M.Copyright: C. Faloutsos (2001)47 Summary So Far many fractal dimensions, with nearby values can be computed quickly (O(N) or O(N log(N)) (code: on the web)

Multi DB and D.M.Copyright: C. Faloutsos (2001)48 Dimensionality Reduction with FD Problem definition: ‘Feature selection’ given N points, with E dimensions keep the k most ‘informative’ dimensions [Traina+,SBBD’00]

Multi DB and D.M.Copyright: C. Faloutsos (2001)49 Dim. reduction - w/ fractals not informative

Multi DB and D.M.Copyright: C. Faloutsos (2001)50 Dim. reduction Problem definition: ‘Feature selection’ given N points, with E dimensions keep the k most ‘informative’ dimensions Re-phrased: spot and drop attributes with strong (non-)linear correlations Q: how do we do that?

Multi DB and D.M.Copyright: C. Faloutsos (2001)51 Dim. reduction A: Hint: correlated attributes do not affect the intrinsic/fractal dimension, e.g., if y = f(x,z,w) we can drop y (hence: ‘partial fd’ (PFD) of a set of attributes = the fd of the dataset, when projected on those attributes)

Multi DB and D.M.Copyright: C. Faloutsos (2001)52 Dim. reduction - w/ fractals PFD~0 PFD=1 global FD=1

Multi DB and D.M.Copyright: C. Faloutsos (2001)53 Dim. reduction - w/ fractals PFD=1 global FD=1

Multi DB and D.M.Copyright: C. Faloutsos (2001)54 Dim. reduction - w/ fractals PFD~1 global FD=1

Multi DB and D.M.Copyright: C. Faloutsos (2001)55 Dim. reduction - w/ fractals (problem: given N points in E-d, choose k best dimensions) Q: Algorithm?

Multi DB and D.M.Copyright: C. Faloutsos (2001)56 Dim. reduction - w/ fractals Q: Algorithm? A: e.g., greedy - forward selection: –keep the attribute with highest partial fd –add the one that causes the highest increase in pfd –etc., until we are within epsilon from the full f.d.

Multi DB and D.M.Copyright: C. Faloutsos (2001)57 Dim. reduction - w/ fractals (backward elimination: ~ reverse) –drop the attribute with least impact on the p.f.d. –repeat –until we are epsilon below the full f.d.

Multi DB and D.M.Copyright: C. Faloutsos (2001)58 Dim. reduction - w/ fractals Q: what is the smallest # of attributes we should keep?

Multi DB and D.M.Copyright: C. Faloutsos (2001)59 Dim. reduction - w/ fractals Q: what is the smallest # of attributes we should keep? A: we should keep at least as many as the f.d. (and probably, a few more)

Multi DB and D.M.Copyright: C. Faloutsos (2001)60 Dim. reduction - w/ fractals Results: E.g., on the ‘currency’ dataset (daily exchange rates for USD, HKD, BP, FRF, DEM, JPY - i.e., 6-d vectors, one per day - base currency: CAD) e.g.: USD FRF

Multi DB and D.M.Copyright: C. Faloutsos (2001)61 E.g., on the ‘currency’ dataset correlation integral 1.98 log(r) log(#pairs(<=r))

Multi DB and D.M.Copyright: C. Faloutsos (2001)62 E.g., on the ‘currency’ dataset if unif + indep.

Multi DB and D.M.Copyright: C. Faloutsos (2001)63 E.g., on the eigenface dataset 16-d vectors, one for each of ~1K faces 4.25

Multi DB and D.M.Copyright: C. Faloutsos (2001)64 E.g., on the eigenface dataset

Multi DB and D.M.Copyright: C. Faloutsos (2001)65 Dim. reduction - w/ fractals Conclusion: can do non-linear dim. reduction PFD~1 global FD=1

Multi DB and D.M.Copyright: C. Faloutsos (2001)66 More tools Zipf’s law Korcak’s law / “fat fractals”

Multi DB and D.M.Copyright: C. Faloutsos (2001)67 A famous power law: Zipf’s law Q: vocabulary word frequency in a document - any pattern? aaronzoo freq.

Multi DB and D.M.Copyright: C. Faloutsos (2001)68 A famous power law: Zipf’s law Bible - rank vs frequency (log-log) log(rank) log(freq) “a” “the”

Multi DB and D.M.Copyright: C. Faloutsos (2001)69 A famous power law: Zipf’s law Bible - rank vs frequency (log-log) similarly, in many other languages; for customers and sales volume; city populations etc etc log(rank) log(freq)

Multi DB and D.M.Copyright: C. Faloutsos (2001)70 A famous power law: Zipf’s law Zipf distr: freq = 1/ rank generalized Zipf: freq = 1 / (rank)^a log(rank) log(freq)

Multi DB and D.M.Copyright: C. Faloutsos (2001)71 Olympic medals (Sidney): rank log(#medals)

Multi DB and D.M.Copyright: C. Faloutsos (2001)72 More power laws: areas – Korcak’s law Scandinavian lakes Any pattern?

Multi DB and D.M.Copyright: C. Faloutsos (2001)73 More power laws: areas – Korcak’s law Scandinavian lakes area vs complementary cumulative count (log-log axes) log(count( >= area)) log(area)

Multi DB and D.M.Copyright: C. Faloutsos (2001)74 More power laws: Korcak Japan islands

Multi DB and D.M.Copyright: C. Faloutsos (2001)75 More power laws: Korcak Japan islands; area vs cumulative count (log-log axes) log(area) log(count( >= area))

Multi DB and D.M.Copyright: C. Faloutsos (2001)76 (Korcak’s law: Aegean islands)

Multi DB and D.M.Copyright: C. Faloutsos (2001)77 Korcak’s law & “fat fractals” How to generate such regions?

Multi DB and D.M.Copyright: C. Faloutsos (2001)78 Korcak’s law & “fat fractals” Q: How to generate such regions? A: recursively, from a single region

Multi DB and D.M.Copyright: C. Faloutsos (2001)79 Conclusions tool#1: (for points) ‘correlation integral’: (#pairs within <= r) vs (distance r) tool#2: (for categorical values) rank- frequency plot (a’la Zipf) tool#3: (for numerical values) CCDF: Complementary cumulative distr. function (#of elements with value >= a )

Multi DB and D.M.Copyright: C. Faloutsos (2001)80 Fractals - overall conclusions Real data often disobey textbook assumptions: not Gaussian, not Poisson, not uniform, not independent) self-similar datasets: appear often powerful tools: correlation integral, rank- frequency plot,... intrinsic/fractal dimension helps: –find patterns –dimensionality reduction

Multi DB and D.M.Copyright: C. Faloutsos (2001)81 NN queries Q: What about L 1, L inf ? A: Same slope, different intercept log(d) log(#neighbors)

Multi DB and D.M.Copyright: C. Faloutsos (2001)82 Practitioner’s guide: tool#1: #pairs vs distance, for a set of objects, with a distance function (slope = intrinsic dimensionality) log(hops) log(#pairs) 2.8 log( r ) log(#pairs(within <= r)) 1.51 internet MGcounty

Multi DB and D.M.Copyright: C. Faloutsos (2001)83 Practitioner’s guide: tool#2: rank-frequency plot (for categorical attributes) log(rank) log(degree) internet domains Bible log(freq) log(rank)

Multi DB and D.M.Copyright: C. Faloutsos (2001)84 Practitioner’s guide: tool#3: CCDF, for (skewed) numerical attributes, eg. areas of islands/lakes, UNIX jobs...) log(count( >= area)) log(area) scandinavian lakes

Multi DB and D.M.Copyright: C. Faloutsos (2001)85 Resources: Software for fractal dimension –

Multi DB and D.M.Copyright: C. Faloutsos (2001)86 Books Strongly recommended intro book: –Manfred Schroeder Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991 Classic book on fractals: –B. Mandelbrot Fractal Geometry of Nature, W.H. Freeman, 1977

Multi DB and D.M.Copyright: C. Faloutsos (2001)87 References Belussi, A. and C. Faloutsos (Sept. 1995). Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension. Proc. of VLDB, Zurich, Switzerland. Faloutsos, C. and V. Gaede (Sept. 1996). Analysis of the z- ordering Method Using the Hausdorff Fractal Dimension. VLDB, Bombay, India. Proietti, G. and C. Faloutsos (March 23-26, 1999). I/O complexity for range queries on region data stored using an R- tree. International Conference on Data Engineering (ICDE), Sydney, Australia.

Multi DB and D.M.Copyright: C. Faloutsos (2001)88 Road map Motivation – 3 problems / case studies Definition of fractals and power laws Solutions to posed problems More tools and examples Discussion - putting fractals to work! Conclusions – practitioner’s guide Appendix: gory details - boxcounting plots

Multi DB and D.M.Copyright: C. Faloutsos (2001)89 NN queries Q: in NN queries, what is the effect of the shape of the query region? [Belussi+95] r L inf L2L2 L1L1

Multi DB and D.M.Copyright: C. Faloutsos (2001)90 NN queries Q: in NN queries, what is the effect of the shape of the query region? that is, for L2, and self-similar data: log(d) log(#pairs-within(<=d)) r L2L2 D2D2

Multi DB and D.M.Copyright: C. Faloutsos (2001)91 NN queries Q: What about L 1, L inf ? log(d) log(#pairs-within(<=d)) r L2L2 D2D2

Multi DB and D.M.Copyright: C. Faloutsos (2001)92 NN queries Q: What about L 1, L inf ? A: Same slope, different intercept log(d) log(#pairs-within(<=d)) r L2L2 D2D2

Multi DB and D.M.Copyright: C. Faloutsos (2001)93 NN queries Q: what about the intercept? Ie., what can we say about N 2 and N inf r L2L2 N 2 neighbors r L inf N inf neighbors volume: V 2 volume: V inf

Multi DB and D.M.Copyright: C. Faloutsos (2001)94 NN queries Consider sphere with volume V inf and r’ radius r L2L2 N 2 neighbors r L inf N inf neighbors volume: V 2 volume: V inf r’

Multi DB and D.M.Copyright: C. Faloutsos (2001)95 NN queries Consider sphere with volume V inf and r’ radius (r/r’)^E = V 2 / V inf (r/r’)^D 2 = N 2 / N 2 ’ N 2 ’ = N inf (since shape does not matter) and finally:

Multi DB and D.M.Copyright: C. Faloutsos (2001)96 NN queries ( N 2 / N inf ) ^ 1/D 2 = (V 2 / V inf ) ^ 1/E

Multi DB and D.M.Copyright: C. Faloutsos (2001)97 NN queries Conclusions: for self-similar datasets Avg # neighbors: grows like (distance)^D 2, regardless of query shape (circle, diamond, square, e.t.c. )

Multi DB and D.M.Copyright: C. Faloutsos (2001)98 Internet topology Internet routers: how many neighbors within h hops? Reachability function: number of neighbors within r hops, vs r (log- log). Mbone routers, 1995 log(hops) log(#pairs) 2.8

Multi DB and D.M.Copyright: C. Faloutsos (2001)99 More power laws on the Internet degree vs rank, for Internet domains (log-log) [sigcomm99] log(rank) log(degree) -0.82

Multi DB and D.M.Copyright: C. Faloutsos (2001)100 More power laws - internet pdf of degrees: (slope: 2.2 ) Log(count) Log(degree) -2.2

Multi DB and D.M.Copyright: C. Faloutsos (2001)101 Even more power laws on the Internet Scree plot for Internet domains (log- log) [sigcomm99] log(i) log( i-th eigenvalue) 0.47

Multi DB and D.M.Copyright: C. Faloutsos (2001)102 More apps: Brain scans Oct-trees; brain-scans octree levels Log(#octants) 2.63 = fd

Multi DB and D.M.Copyright: C. Faloutsos (2001)103 More apps: Medical images [Burdett et al, SPIE ‘93]: benign tumors: fd ~ 2.37 malignant: fd ~ 2.56

Multi DB and D.M.Copyright: C. Faloutsos (2001)104 More fractals: cardiovascular system: 3 (!) stock prices (LYCOS) - random walks: 1.5 Coastlines: (Norway!) 1 year2 years

Multi DB and D.M.Copyright: C. Faloutsos (2001)105

Multi DB and D.M.Copyright: C. Faloutsos (2001)106 More power laws duration of UNIX jobs [Harchol-Balter] Energy of earthquakes (Gutenberg-Richter law) [simscience.org] log(freq) magnitudeday amplitude

Multi DB and D.M.Copyright: C. Faloutsos (2001)107 Even more power laws: publication counts (Lotka’s law) Distribution of UNIX file sizes Income distribution (Pareto’s law) web hit counts [Huberman]

Multi DB and D.M.Copyright: C. Faloutsos (2001)108 Power laws, cont’ed In- and out-degree distribution of web sites [Barabasi], [IBM-CLEVER] length of file transfers [Bestavros+] Click-stream data (w/ A. Montgomery (CMU-GSIA) + MediaMetrix)

Multi DB and D.M.Copyright: C. Faloutsos (2001)109 Resources: Software for fractal dimension –

Multi DB and D.M.Copyright: C. Faloutsos (2001)110 Books Strongly recommended intro book: –Manfred Schroeder Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991 Classic book on fractals: –B. Mandelbrot Fractal Geometry of Nature, W.H. Freeman, 1977

Multi DB and D.M.Copyright: C. Faloutsos (2001)111 References –[ieeeTN94] W. E. Leland, M.S. Taqqu, W. Willinger, D.V. Wilson, On the Self-Similar Nature of Ethernet Traffic, IEEE Transactions on Networking, 2, 1, pp 1- 15, Feb – [pods94] Christos Faloutsos and Ibrahim Kamel, Beyond Uniformity and Independence: Analysis of R- trees Using the Concept of Fractal Dimension, PODS, Minneapolis, MN, May 24-26, 1994, pp. 4-13

Multi DB and D.M.Copyright: C. Faloutsos (2001)112 References –[vldb95] Alberto Belussi and Christos Faloutsos, Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension Proc. of VLDB, p , 1995 –[vldb96] Christos Faloutsos, Yossi Matias and Avi Silberschatz, Modeling Skewed Distributions Using Multifractals and the `80-20 Law’ Conf. on Very Large Data Bases (VLDB), Bombay, India, Sept

Multi DB and D.M.Copyright: C. Faloutsos (2001)113 References –[vldb96] Christos Faloutsos and Volker Gaede Analysis of the Z-Ordering Method Using the Hausdorff Fractal Dimension VLD, Bombay, India, Sept –[sigcomm99] Michalis Faloutsos, Petros Faloutsos and Christos Faloutsos, What does the Internet look like? Empirical Laws of the Internet Topology, SIGCOMM 1999

Multi DB and D.M.Copyright: C. Faloutsos (2001)114 References –[icde99] Guido Proietti and Christos Faloutsos, I/O complexity for range queries on region data stored using an R-tree International Conference on Data Engineering (ICDE), Sydney, Australia, March 23-26, 1999 –[sigmod2000] Christos Faloutsos, Bernhard Seeger, Agma J. M. Traina and Caetano Traina Jr., Spatial Join Selectivity Using Power Laws, SIGMOD 2000

Multi DB and D.M.Copyright: C. Faloutsos (2001)115 References [PODS94] Faloutsos, C. and I. Kamel (May 24-26, 1994). Beyond Uniformity and Independence: Analysis of R-trees Using the Concept of Fractal Dimension. Proc. ACM SIGACT-SIGMOD-SIGART PODS, Minneapolis, MN. [Traina+, SBBD’00] Traina, C., A. Traina, et al. (2000). Fast feature selection using the fractal dimension. XV Brazilian Symposium on Databases (SBBD), Paraiba, Brazil.