CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #8: Fractals - introduction C. Faloutsos.

Slides:



Advertisements
Similar presentations
CMU SCS : Multimedia Databases and Data Mining Lecture #17: Text - part IV (LSI) C. Faloutsos.
Advertisements

CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #4: Multi-key and Spatial Access Methods - I C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #19: SVD - part II (case studies) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #10: Fractals - case studies - I C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture#5: Multi-key and Spatial Access Methods - II C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture#2: Primary key indexing – B-trees Christos Faloutsos - CMU
Indexing and Data Mining in Multimedia Databases Christos Faloutsos CMU
CMU SCS : Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
15-826: Multimedia Databases and Data Mining
CMU SCS : Multimedia Databases and Data Mining Lecture#1: Introduction Christos Faloutsos CMU
CMU SCS : Multimedia Databases and Data Mining Lecture #16: Text - part III: Vector space model and clustering C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals - case studies Part III (regions, quadtrees, knn queries) C. Faloutsos.
Chapter 9: Recursive Methods and Fractals E. Angel and D. Shreiner: Interactive Computer Graphics 6E © Addison-Wesley Mohan Sridharan Based on Slides.
School of Computer Science Carnegie Mellon Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
CMU SCS Data Mining Meets Systems: Tools and Case Studies Christos Faloutsos SCS CMU.
Indexing and Data Mining in Multimedia Databases Christos Faloutsos CMU
On Power-Law Relationships of the Internet Topology CSCI 780, Fall 2005.
Analysis of the Internet Topology Michalis Faloutsos, U.C. Riverside (PI) Christos Faloutsos, CMU (sub- contract, co-PI) DARPA NMS, no
CMU SCS Graph and stream mining Christos Faloutsos CMU.
School of Computer Science Carnegie Mellon Boston U., 2005C. Faloutsos1 Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos.
Carnegie Mellon Powerful Tools for Data Mining Fractals, Power laws, SVD C. Faloutsos Carnegie Mellon University.
R-tree Analysis. R-trees - performance analysis How many disk (=node) accesses we’ll need for range nn spatial joins why does it matter?
Spatial Indexing. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
Data Mining using Fractals and Power laws
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Conclusions C. Faloutsos.
CMU SCS Data Mining in Streams and Graphs Christos Faloutsos CMU.
Introduction to Fractals and Fractal Dimension Christos Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #8: Fractals - introduction C. Faloutsos.
1 Chapters 9 Self-SimilarTraffic. Chapter 9 – Self-Similar Traffic 2 Introduction- Motivation Validity of the queuing models we have studied depends on.
CMU SCS : Multimedia Databases and Data Mining Lecture #10: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
Information Networks Power Laws and Network Models Lecture 3.
School of Computer Science Carnegie Mellon UIUC 04C. Faloutsos1 Advanced Data Mining Tools: Fractals and Power Laws for Graphs, Streams and Traditional.
CMU SCS : Multimedia Databases and Data Mining Lecture #9: Fractals – examples & algo’s C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos.
School of Computer Science Carnegie Mellon Data Mining using Fractals (fractals for fun and profit) Christos Faloutsos Carnegie Mellon University.
1 Self Similar Traffic. 2 Self Similarity The idea is that something looks the same when viewed from different degrees of “magnification” or different.
School of Computer Science Carnegie Mellon Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos.
School of Computer Science Carnegie Mellon Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
CMU SCS : Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos.
R-trees: An Average Case Analysis. R-trees - performance analysis How many disk (=node) accesses we ’ ll need for range nn spatial joins why does it matter?
The Power-Method: A Comprehensive Estimation Technique for Multi- Dimensional Queries Yufei Tao U. Hong Kong Christos Faloutsos CMU Dimitris Papadias Hong.
School of Computer Science Carnegie Mellon WRIGHT, 2005C. Faloutsos1 Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P9-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
Carnegie Mellon Data Mining – Research Directions C. Faloutsos CMU
SCS-CMU Data Mining Tools A crash course C. Faloutsos.
15-826: Multimedia Databases and Data Mining
Next Generation Data Mining Tools: SVD and Fractals
Spatial Indexing.
Indexing and Data Mining in Multimedia Databases
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
Graph and Tensor Mining for fun and profit
15-826: Multimedia Databases and Data Mining
Data Mining using Fractals and Power laws
R-trees: An Average Case Analysis
15-826: Multimedia Databases and Data Mining
Presentation transcript:

CMU SCS : Multimedia Databases and Data Mining Lecture #8: Fractals - introduction C. Faloutsos

CMU SCS Copyright: C. Faloutsos (2014)2 Must-read Material Christos Faloutsos and Ibrahim Kamel, Beyond Uniformity and Independence: Analysis of R-trees Using the Concept of Fractal Dimension, Proc. ACM SIGACT- SIGMOD-SIGART PODS, May 1994, pp. 4-13, Minneapolis, MN. Beyond Uniformity and Independence: Analysis of R-trees Using the Concept of Fractal Dimension

CMU SCS Copyright: C. Faloutsos (2014)3 Recommended Material optional, but very useful: Manfred Schroeder Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991 – Chapter 10: boxcounting method – Chapter 1: Sierpinski triangle

CMU SCS Copyright: C. Faloutsos (2014)4 Outline Goal: ‘Find similar / interesting things’ Intro to DB Indexing - similarity search Data Mining

CMU SCS Copyright: C. Faloutsos (2014)5 Indexing - Detailed outline primary key indexing secondary key / multi-key indexing spatial access methods –z-ordering –R-trees –misc fractals –intro –applications text

CMU SCS Copyright: C. Faloutsos (2014)6 Intro to fractals - outline Motivation – 3 problems / case studies Definition of fractals and power laws Solutions to posed problems More examples and tools Discussion - putting fractals to work! Conclusions – practitioner’s guide Appendix: gory details - boxcounting plots

CMU SCS Copyright: C. Faloutsos (2014)7 Road end-points of Montgomery county: Q1: how many d.a. for an R-tree? Q2 : distribution? not uniform not Gaussian no rules?? Problem #1: GIS - points

CMU SCS Copyright: C. Faloutsos (2014)8 Problem #2 - spatial d.m. Galaxies (Sloan Digital Sky Survey w/ B. Nichol) - ‘spiral’ and ‘elliptical’ galaxies (stores and households...) - patterns? - attraction/repulsion? - how many ‘spi’ within r from an ‘ell’?

CMU SCS Copyright: C. Faloutsos (2014)9 Problem #3: traffic disk trace (from HP - J. Wilkes); Web traffic - fit a model how many explosions to expect? queue length distr.? time # bytes

CMU SCS Copyright: C. Faloutsos (2014)10 Problem #3: traffic time # bytes Poisson indep., ident. distr

CMU SCS Copyright: C. Faloutsos (2014)11 Problem #3: traffic time # bytes Poisson indep., ident. distr

CMU SCS Copyright: C. Faloutsos (2014)12 Problem #3: traffic time # bytes Poisson indep., ident. distr Q: Then, how to generate such bursty traffic?

CMU SCS Copyright: C. Faloutsos (2014)13 Common answer: Fractals / self-similarities / power laws Seminal works from Hilbert, Minkowski, Cantor, Mandelbrot, (Hausdorff, Lyapunov, Ken Wilson, …)

CMU SCS Copyright: C. Faloutsos (2014)14 Road map Motivation – 3 problems / case studies Definition of fractals and power laws Solutions to posed problems More examples and tools Discussion - putting fractals to work! Conclusions – practitioner’s guide Appendix: gory details - boxcounting plots

CMU SCS Copyright: C. Faloutsos (2014)15 What is a fractal? = self-similar point set, e.g., Sierpinski triangle:... zero area; infinite length! Dimensionality??

CMU SCS Copyright: C. Faloutsos (2014)16 Definitions (cont’d) Paradox: Infinite perimeter ; Zero area! ‘dimensionality’: between 1 and 2 actually: Log(3)/Log(2) =

CMU SCS Copyright: C. Faloutsos (2014)17 Dfn of fd: ONLY for a perfectly self-similar point set: =log(n)/log(f) = log(3)/log(2) = zero area; infinite length!

CMU SCS Copyright: C. Faloutsos (2014)18 Intrinsic (‘fractal’) dimension Q: fractal dimension of a line? A: 1 (= log(2)/log(2)!)

CMU SCS Copyright: C. Faloutsos (2014)19 Intrinsic (‘fractal’) dimension Q: fractal dimension of a line? A: 1 (= log(2)/log(2)!)

CMU SCS Copyright: C. Faloutsos (2014)20 Intrinsic (‘fractal’) dimension Q: dfn for a given set of points? yx

CMU SCS Copyright: C. Faloutsos (2014)21 Intrinsic (‘fractal’) dimension Q: fractal dimension of a line? A: nn ( <= r ) ~ r^1 (‘power law’: y=x^a) Q: fd of a plane? A: nn ( <= r ) ~ r^2 fd== slope of (log(nn) vs log(r) )

CMU SCS Copyright: C. Faloutsos (2014) Intrinsic (‘fractal’) dimension Local fractal dimension of point ‘P’? A: nn P ( <= r ) ~ r^1 If this equation holds for several values of r, Then, the local fractal dimension of point P: Local fd = exp = 1 P EXPLANATIONS 22

CMU SCS Copyright: C. Faloutsos (2014) Intrinsic (‘fractal’) dimension Local fractal dimension of point ‘A’? A: nn P ( <= r ) ~ r^1 If this is true for all points of the cloud Then the exponent is the global f.d. Or simply the f.d. P EXPLANATIONS 23

CMU SCS Copyright: C. Faloutsos (2014) Intrinsic (‘fractal’) dimension Global fractal dimension? A: if sum all_P [ nn P ( <= r ) ] ~ r^1 Then: exp = global f.d. If this is true for all points of the cloud Then the exponent is the global f.d. Or simply the f.d. A EXPLANATIONS 24

CMU SCS Copyright: C. Faloutsos (2014)25 Intrinsic (‘fractal’) dimension Algorithm, to estimate it? Notice Sum all_P [ nn P (<=r) ] is exactly tot#pairs(<=r) including ‘mirror’ pairs EXPLANATIONS

CMU SCS Copyright: C. Faloutsos (2014)26 Sierpinsky triangle log( r ) log(#pairs within <=r ) 1.58 == ‘correlation integral’

CMU SCS Copyright: C. Faloutsos (2014)27 Observations: Euclidean objects have integer fractal dimensions –point: 0 –lines and smooth curves: 1 –smooth surfaces: 2 fractal dimension -> roughness of the periphery

CMU SCS Copyright: C. Faloutsos (2014)28 Important properties fd = embedding dimension -> uniform pointset a point set may have several fd, depending on scale

CMU SCS Copyright: C. Faloutsos (2014)29 Important properties fd = embedding dimension -> uniform pointset a point set may have several fd, depending on scale 2-d

CMU SCS Copyright: C. Faloutsos (2014)30 Important properties fd = embedding dimension -> uniform pointset a point set may have several fd, depending on scale 1-d

CMU SCS Copyright: C. Faloutsos (2014)31 Important properties 0-d

CMU SCS Copyright: C. Faloutsos (2014)32 Road map Motivation – 3 problems / case studies Definition of fractals and power laws Solutions to posed problems More examples and tools Discussion - putting fractals to work! Conclusions – practitioner’s guide Appendix: gory details - boxcounting plots

CMU SCS Copyright: C. Faloutsos (2014)33 Cross-roads of Montgomery county: any rules? Problem #1: GIS points

CMU SCS Copyright: C. Faloutsos (2014)34 Solution #1 A: self-similarity -> fractals scale-free power-laws (y=x^a, F=C*r^(-2)) avg#neighbors(<= r ) = r^D log( r ) log(#pairs(within <= r)) 1.51

CMU SCS Copyright: C. Faloutsos (2014)35 Solution #1 A: self-similarity avg#neighbors(<= r ) ~ r^(1.51) log( r ) log(#pairs(within <= r)) 1.51

CMU SCS Copyright: C. Faloutsos (2014)36 Examples:MG county Montgomery County of MD (road end- points)

CMU SCS Copyright: C. Faloutsos (2014)37 Examples:LB county Long Beach county of CA (road end-points)

CMU SCS Copyright: C. Faloutsos (2014)38 Solution#2: spatial d.m. Galaxies ( ‘BOPS’ plot - [sigmod2000]) log(#pairs) log(r)

CMU SCS Copyright: C. Faloutsos (2014)39 Solution#2: spatial d.m. log(r) log(#pairs within <=r ) spi-spi spi-ell ell-ell slope - plateau! - repulsion!

CMU SCS Copyright: C. Faloutsos (2014)40 Spatial d.m. log(r) log(#pairs within <=r ) spi-spi spi-ell ell-ell slope - plateau! - repulsion!

CMU SCS Copyright: C. Faloutsos (2014)41 Spatial d.m. r1r2 r1 r2 Heuristic on choosing # of clusters

CMU SCS Copyright: C. Faloutsos (2014)42 Spatial d.m. log(r) log(#pairs within <=r ) spi-spi spi-ell ell-ell slope - plateau! - repulsion!

CMU SCS Copyright: C. Faloutsos (2014)43 Spatial d.m. log(r) log(#pairs within <=r ) spi-spi spi-ell ell-ell slope - plateau! -repulsion!! -duplicates

CMU SCS Copyright: C. Faloutsos (2014)44 Solution #3: traffic disk traces: self-similar: time #bytes

CMU SCS Copyright: C. Faloutsos (2014)45 Solution #3: traffic disk traces (80-20 ‘law’ = ‘multifractal’) time #bytes 20% 80%

CMU SCS Copyright: C. Faloutsos (2014) / multifractals 20 80

CMU SCS Copyright: C. Faloutsos (2014) / multifractals 20 p ; (1-p) in general yes, there are dependencies 80

CMU SCS Copyright: C. Faloutsos (2014)48 More on 80/20: PQRS Part of ‘self-* storage’ project [Wang+’02] time cylinder#

CMU SCS Copyright: C. Faloutsos (2014)49 More on 80/20: PQRS Part of ‘self-* storage’ project [Wang+’02] pq rs q rs

CMU SCS Copyright: C. Faloutsos (2014)50 Solution#3: traffic Clarification: fractal: a set of points that is self-similar multifractal: a probability density function that is self-similar Many other time-sequences are bursty/clustered: (such as?)

CMU SCS Copyright: C. Faloutsos (2014)51 Example: network traffic

CMU SCS Copyright: C. Faloutsos (2014)52 Web traffic [Crovella Bestavros, SIGMETRICS’96] 1000 sec; 100sec 10sec; 1sec

CMU SCS Copyright: C. Faloutsos (2014)53 Tape accesses time Tape#1 Tape# N # tapes needed, to retrieve n records? (# days down, due to failures / hurricanes / communication noise...)

CMU SCS Copyright: C. Faloutsos (2014)54 Tape accesses time Tape#1 Tape# N # tapes retrieved # qual. records = Poisson real

CMU SCS Copyright: C. Faloutsos (2014)55 Road map Motivation – 3 problems / case studies Definition of fractals and power laws Solutions to posed problems More tools and examples Discussion - putting fractals to work! Conclusions – practitioner’s guide Appendix: gory details - boxcounting plots

CMU SCS Copyright: C. Faloutsos (2014)56 A counter-intuitive example avg degree is, say 3.3 pick a node at random – guess its degree, exactly (-> “mode”) degree count avg: 3.3 ?

CMU SCS Copyright: C. Faloutsos (2014)57 A counter-intuitive example avg degree is, say 3.3 pick a node at random – guess its degree, exactly (-> “mode”) A: 1!! degree count avg: 3.3

CMU SCS Copyright: C. Faloutsos (2014)58 A counter-intuitive example avg degree is, say 3.3 pick a node at random - what is the degree you expect it to have? A: 1!! A’: very skewed distr. Corollary: the mean is meaningless! (and std -> infinity (!)) degree count avg: 3.3

CMU SCS Copyright: C. Faloutsos (2014)59 Rank exponent R Power law in the degree distribution [SIGCOMM99] internet domains log(rank) log(degree) att.com ibm.com

CMU SCS Copyright: C. Faloutsos (2014)60 More tools Zipf’s law Korcak’s law / “fat fractals”

CMU SCS Copyright: C. Faloutsos (2014)61 A famous power law: Zipf’s law Q: vocabulary word frequency in a document - any pattern? aaronzoo freq.

CMU SCS Copyright: C. Faloutsos (2014)62 A famous power law: Zipf’s law Bible - rank vs frequency (log-log) log(rank) log(freq) “a” “the”

CMU SCS Copyright: C. Faloutsos (2014)63 A famous power law: Zipf’s law Bible - rank vs frequency (log-log) similarly, in many other languages; for customers and sales volume; city populations etc etc log(rank) log(freq)

CMU SCS Copyright: C. Faloutsos (2014)64 A famous power law: Zipf’s law Zipf distr: freq = 1/ rank generalized Zipf: freq = 1 / (rank)^a log(rank) log(freq)

CMU SCS Copyright: C. Faloutsos (2014)65 Olympic medals (Sydney): rank log(#medals)

CMU SCS Copyright: C. Faloutsos (2014)66 Olympic medals (Sydney’00, Athens’04): log( rank) log(#medals)

CMU SCS Copyright: C. Faloutsos (2014)67 TELCO data # of service units count of customers ‘best customer’

CMU SCS Copyright: C. Faloutsos (2014)68 SALES data – store#96 # units sold count of products “aspirin”

CMU SCS Copyright: C. Faloutsos (2014)69 More power laws: areas – Korcak’s law Scandinavian lakes Any pattern?

CMU SCS Copyright: C. Faloutsos (2014)70 More power laws: areas – Korcak’s law Scandinavian lakes area vs complementary cumulative count (log-log axes) log(count( >= area)) log(area)

CMU SCS Copyright: C. Faloutsos (2014)71 More power laws: Korcak Japan islands; area vs cumulative count (log-log axes) log(area) log(count( >= area))

CMU SCS Copyright: C. Faloutsos (2014)72 (Korcak’s law: Aegean islands)

CMU SCS Copyright: C. Faloutsos (2014)73 Korcak’s law & “fat fractals” How to generate such regions?

CMU SCS Copyright: C. Faloutsos (2014)74 Korcak’s law & “fat fractals” Q: How to generate such regions? A: recursively, from a single region...

CMU SCS Copyright: C. Faloutsos (2014)75 so far we’ve seen: concepts: –fractals, multifractals and fat fractals tools: –correlation integral (= pair-count plot) –rank/frequency plot (Zipf’s law) –CCDF (Korcak’s law)

CMU SCS Copyright: C. Faloutsos (2014)76 so far we’ve seen: concepts: –fractals, multifractals and fat fractals tools: –correlation integral (= pair-count plot) –rank/frequency plot (Zipf’s law) –CCDF (Korcak’s law) same info

CMU SCS Next: More examples / applications Practitioner’s guide Box-counting: fast estimation of correlation integral Copyright: C. Faloutsos (2014)77