CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos.

Slides:



Advertisements
Similar presentations
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos.
Advertisements

CMU SCS : Multimedia Databases and Data Mining Lecture #4: Multi-key and Spatial Access Methods - I C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #19: SVD - part II (case studies) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #10: Fractals - case studies - I C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture#5: Multi-key and Spatial Access Methods - II C. Faloutsos.
Indexing and Data Mining in Multimedia Databases Christos Faloutsos CMU
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
15-826: Multimedia Databases and Data Mining
CMU SCS : Multimedia Databases and Data Mining Lecture #16: Text - part III: Vector space model and clustering C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals - case studies Part III (regions, quadtrees, knn queries) C. Faloutsos.
Social Networks and Graph Mining Christos Faloutsos CMU - MLD.
School of Computer Science Carnegie Mellon Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
1 Complex systems Made of many non-identical elements connected by diverse interactions. NETWORK New York Times Slides: thanks to A-L Barabasi.
Web as Graph – Empirical Studies The Structure and Dynamics of Networks.
CMU SCS Data Mining Meets Systems: Tools and Case Studies Christos Faloutsos SCS CMU.
Indexing and Data Mining in Multimedia Databases Christos Faloutsos CMU
On Power-Law Relationships of the Internet Topology CSCI 780, Fall 2005.
Analysis of the Internet Topology Michalis Faloutsos, U.C. Riverside (PI) Christos Faloutsos, CMU (sub- contract, co-PI) DARPA NMS, no
CMU SCS Graph and stream mining Christos Faloutsos CMU.
School of Computer Science Carnegie Mellon Boston U., 2005C. Faloutsos1 Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos.
Multimedia Databases Text II. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Text databases Image and video.
Carnegie Mellon Powerful Tools for Data Mining Fractals, Power laws, SVD C. Faloutsos Carnegie Mellon University.
Spatial Indexing. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
Data Mining using Fractals and Power laws
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Conclusions C. Faloutsos.
CMU SCS Data Mining in Streams and Graphs Christos Faloutsos CMU.
Introduction to Fractals and Fractal Dimension Christos Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #8: Fractals - introduction C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #8: Fractals - introduction C. Faloutsos.
1 Chapters 9 Self-SimilarTraffic. Chapter 9 – Self-Similar Traffic 2 Introduction- Motivation Validity of the queuing models we have studied depends on.
CMU SCS : Multimedia Databases and Data Mining Lecture #10: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
School of Computer Science Carnegie Mellon UIUC 04C. Faloutsos1 Advanced Data Mining Tools: Fractals and Power Laws for Graphs, Streams and Traditional.
CMU SCS : Multimedia Databases and Data Mining Lecture #9: Fractals – examples & algo’s C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos.
School of Computer Science Carnegie Mellon Data Mining using Fractals (fractals for fun and profit) Christos Faloutsos Carnegie Mellon University.
School of Computer Science Carnegie Mellon Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
Carnegie Mellon Finding patterns in large, real networks Christos Faloutsos CMU
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos.
School of Computer Science Carnegie Mellon Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
CMU SCS : Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos.
R-trees: An Average Case Analysis. R-trees - performance analysis How many disk (=node) accesses we ’ ll need for range nn spatial joins why does it matter?
The Power-Method: A Comprehensive Estimation Technique for Multi- Dimensional Queries Yufei Tao U. Hong Kong Christos Faloutsos CMU Dimitris Papadias Hong.
School of Computer Science Carnegie Mellon WRIGHT, 2005C. Faloutsos1 Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
Notices of the AMS, September Internet traffic Standard Poisson models don’t capture long-range correlations. Poisson Measured “bursty” on all time.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P9-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
Carnegie Mellon Data Mining – Research Directions C. Faloutsos CMU
SCS-CMU Data Mining Tools A crash course C. Faloutsos.
15-826: Multimedia Databases and Data Mining
Next Generation Data Mining Tools: SVD and Fractals
Indexing and Data Mining in Multimedia Databases
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
Graph and Tensor Mining for fun and profit
Feifei Li, Ching Chang, George Kollios, Azer Bestavros
15-826: Multimedia Databases and Data Mining
Data Mining using Fractals and Power laws
R-trees: An Average Case Analysis
Presentation transcript:

CMU SCS : Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos

CMU SCS Copyright: C. Faloutsos (2012)2 Must-read Material Christos Faloutsos and Ibrahim Kamel, Beyond Uniformity and Independence: Analysis of R-trees Using the Concept of Fractal Dimension, Proc. ACM SIGACT- SIGMOD-SIGART PODS, May 1994, pp. 4-13, Minneapolis, MN. Beyond Uniformity and Independence: Analysis of R-trees Using the Concept of Fractal Dimension

CMU SCS Copyright: C. Faloutsos (2012)3 Recommended Material optional, but very useful: Manfred Schroeder Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991 (on reserve in the library) – Chapter 10: boxcounting method – Chapter 1: Sierpinski triangle

CMU SCS Copyright: C. Faloutsos (2012)4 Outline Goal: ‘Find similar / interesting things’ Intro to DB Indexing - similarity search Data Mining

CMU SCS Copyright: C. Faloutsos (2012)5 Indexing - Detailed outline primary key indexing secondary key / multi-key indexing spatial access methods –z-ordering –R-trees –misc fractals –intro –applications text

CMU SCS Copyright: C. Faloutsos (2012)6 Intro to fractals - outline Motivation – 3 problems / case studies Definition of fractals and power laws Solutions to posed problems More examples and tools Discussion - putting fractals to work! Conclusions – practitioner’s guide Appendix: gory details - boxcounting plots

CMU SCS Copyright: C. Faloutsos (2012)7 Road end-points of Montgomery county: Q1: how many d.a. for an R-tree? Q2 : distribution? not uniform not Gaussian no rules?? Problem #1: GIS - points

CMU SCS Copyright: C. Faloutsos (2012)8 Problem #2 - spatial d.m. Galaxies (Sloan Digital Sky Survey w/ B. Nichol) - ‘spiral’ and ‘elliptical’ galaxies (stores and households...) - patterns? - attraction/repulsion? - how many ‘spi’ within r from an ‘ell’?

CMU SCS Copyright: C. Faloutsos (2012)9 Problem #3: traffic disk trace (from HP - J. Wilkes); Web traffic - fit a model how many explosions to expect? queue length distr.? time # bytes

CMU SCS Copyright: C. Faloutsos (2012)10 Problem #3: traffic time # bytes Poisson indep., ident. distr

CMU SCS Copyright: C. Faloutsos (2012)11 Problem #3: traffic time # bytes Poisson indep., ident. distr

CMU SCS Copyright: C. Faloutsos (2012)12 Problem #3: traffic time # bytes Poisson indep., ident. distr Q: Then, how to generate such bursty traffic?

CMU SCS Copyright: C. Faloutsos (2012)13 Common answer: Fractals / self-similarities / power laws Seminal works from Hilbert, Minkowski, Cantor, Mandelbrot, (Hausdorff, Lyapunov, Ken Wilson, …)

CMU SCS Copyright: C. Faloutsos (2012)14 Road map Motivation – 3 problems / case studies Definition of fractals and power laws Solutions to posed problems More examples and tools Discussion - putting fractals to work! Conclusions – practitioner’s guide Appendix: gory details - boxcounting plots

CMU SCS Copyright: C. Faloutsos (2012)15 What is a fractal? = self-similar point set, e.g., Sierpinski triangle:... zero area; infinite length! Dimensionality??

CMU SCS Copyright: C. Faloutsos (2012)16 Definitions (cont’d) Paradox: Infinite perimeter ; Zero area! ‘dimensionality’: between 1 and 2 actually: Log(3)/Log(2) =

CMU SCS Copyright: C. Faloutsos (2012)17 Dfn of fd: ONLY for a perfectly self-similar point set: =log(n)/log(f) = log(3)/log(2) = zero area; infinite length!

CMU SCS Copyright: C. Faloutsos (2012)18 Intrinsic (‘fractal’) dimension Q: fractal dimension of a line? A: 1 (= log(2)/log(2)!)

CMU SCS Copyright: C. Faloutsos (2012)19 Intrinsic (‘fractal’) dimension Q: fractal dimension of a line? A: 1 (= log(2)/log(2)!)

CMU SCS Copyright: C. Faloutsos (2012)20 Intrinsic (‘fractal’) dimension Q: dfn for a given set of points? yx

CMU SCS Copyright: C. Faloutsos (2012)21 Intrinsic (‘fractal’) dimension Q: fractal dimension of a line? A: nn ( <= r ) ~ r^1 (‘power law’: y=x^a) Q: fd of a plane? A: nn ( <= r ) ~ r^2 fd== slope of (log(nn) vs log(r) )

CMU SCS Copyright: C. Faloutsos (2012)22 Intrinsic (‘fractal’) dimension Algorithm, to estimate it? Notice avg nn(<=r) is exactly tot#pairs(<=r) / N including ‘mirror’ pairs

CMU SCS Copyright: C. Faloutsos (2012)23 Sierpinsky triangle log( r ) log(#pairs within <=r ) 1.58 == ‘correlation integral’

CMU SCS Copyright: C. Faloutsos (2012)24 Observations: Euclidean objects have integer fractal dimensions –point: 0 –lines and smooth curves: 1 –smooth surfaces: 2 fractal dimension -> roughness of the periphery

CMU SCS Copyright: C. Faloutsos (2012)25 Important properties fd = embedding dimension -> uniform pointset a point set may have several fd, depending on scale

CMU SCS Copyright: C. Faloutsos (2012)26 Important properties fd = embedding dimension -> uniform pointset a point set may have several fd, depending on scale 2-d

CMU SCS Copyright: C. Faloutsos (2012)27 Important properties fd = embedding dimension -> uniform pointset a point set may have several fd, depending on scale 1-d

CMU SCS Copyright: C. Faloutsos (2012)28 Important properties 0-d

CMU SCS Copyright: C. Faloutsos (2012)29 Road map Motivation – 3 problems / case studies Definition of fractals and power laws Solutions to posed problems More examples and tools Discussion - putting fractals to work! Conclusions – practitioner’s guide Appendix: gory details - boxcounting plots

CMU SCS Copyright: C. Faloutsos (2012)30 Cross-roads of Montgomery county: any rules? Problem #1: GIS points

CMU SCS Copyright: C. Faloutsos (2012)31 Solution #1 A: self-similarity -> fractals scale-free power-laws (y=x^a, F=C*r^(-2)) avg#neighbors(<= r ) = r^D log( r ) log(#pairs(within <= r)) 1.51

CMU SCS Copyright: C. Faloutsos (2012)32 Solution #1 A: self-similarity avg#neighbors(<= r ) ~ r^(1.51) log( r ) log(#pairs(within <= r)) 1.51

CMU SCS Copyright: C. Faloutsos (2012)33 Examples:MG county Montgomery County of MD (road end- points)

CMU SCS Copyright: C. Faloutsos (2012)34 Examples:LB county Long Beach county of CA (road end-points)

CMU SCS Copyright: C. Faloutsos (2012)35 Solution#2: spatial d.m. Galaxies ( ‘BOPS’ plot - [sigmod2000]) log(#pairs) log(r)

CMU SCS Copyright: C. Faloutsos (2012)36 Solution#2: spatial d.m. log(r) log(#pairs within <=r ) spi-spi spi-ell ell-ell slope - plateau! - repulsion!

CMU SCS Copyright: C. Faloutsos (2012)37 Spatial d.m. log(r) log(#pairs within <=r ) spi-spi spi-ell ell-ell slope - plateau! - repulsion!

CMU SCS Copyright: C. Faloutsos (2012)38 Spatial d.m. r1r2 r1 r2 Heuristic on choosing # of clusters

CMU SCS Copyright: C. Faloutsos (2012)39 Spatial d.m. log(r) log(#pairs within <=r ) spi-spi spi-ell ell-ell slope - plateau! - repulsion!

CMU SCS Copyright: C. Faloutsos (2012)40 Spatial d.m. log(r) log(#pairs within <=r ) spi-spi spi-ell ell-ell slope - plateau! -repulsion!! -duplicates

CMU SCS Copyright: C. Faloutsos (2012)41 Solution #3: traffic disk traces: self-similar: time #bytes

CMU SCS Copyright: C. Faloutsos (2012)42 Solution #3: traffic disk traces (80-20 ‘law’ = ‘multifractal’) time #bytes 20% 80%

CMU SCS Copyright: C. Faloutsos (2012) / multifractals 20 80

CMU SCS Copyright: C. Faloutsos (2012) / multifractals 20 p ; (1-p) in general yes, there are dependencies 80

CMU SCS Copyright: C. Faloutsos (2012)45 More on 80/20: PQRS Part of ‘self-* storage’ project [Wang+’02] time cylinder#

CMU SCS Copyright: C. Faloutsos (2012)46 More on 80/20: PQRS Part of ‘self-* storage’ project [Wang+’02] pq rs q rs

CMU SCS Copyright: C. Faloutsos (2012)47 Solution#3: traffic Clarification: fractal: a set of points that is self-similar multifractal: a probability density function that is self-similar Many other time-sequences are bursty/clustered: (such as?)

CMU SCS Copyright: C. Faloutsos (2012)48 Example: network traffic

CMU SCS Copyright: C. Faloutsos (2012)49 Web traffic [Crovella Bestavros, SIGMETRICS’96] 1000 sec; 100sec 10sec; 1sec

CMU SCS Copyright: C. Faloutsos (2012)50 Tape accesses time Tape#1 Tape# N # tapes needed, to retrieve n records? (# days down, due to failures / hurricanes / communication noise...)

CMU SCS Copyright: C. Faloutsos (2012)51 Tape accesses time Tape#1 Tape# N # tapes retrieved # qual. records = Poisson real

CMU SCS Copyright: C. Faloutsos (2012)52 Road map Motivation – 3 problems / case studies Definition of fractals and power laws Solutions to posed problems More tools and examples Discussion - putting fractals to work! Conclusions – practitioner’s guide Appendix: gory details - boxcounting plots

CMU SCS Copyright: C. Faloutsos (2012)53 A counter-intuitive example avg degree is, say 3.3 pick a node at random – guess its degree, exactly (-> “mode”) degree count avg: 3.3 ?

CMU SCS Copyright: C. Faloutsos (2012)54 A counter-intuitive example avg degree is, say 3.3 pick a node at random – guess its degree, exactly (-> “mode”) A: 1!! degree count avg: 3.3

CMU SCS Copyright: C. Faloutsos (2012)55 A counter-intuitive example avg degree is, say 3.3 pick a node at random - what is the degree you expect it to have? A: 1!! A’: very skewed distr. Corollary: the mean is meaningless! (and std -> infinity (!)) degree count avg: 3.3

CMU SCS Copyright: C. Faloutsos (2012)56 Rank exponent R Power law in the degree distribution [SIGCOMM99] internet domains log(rank) log(degree) att.com ibm.com

CMU SCS Copyright: C. Faloutsos (2012)57 More tools Zipf’s law Korcak’s law / “fat fractals”

CMU SCS Copyright: C. Faloutsos (2012)58 A famous power law: Zipf’s law Q: vocabulary word frequency in a document - any pattern? aaronzoo freq.

CMU SCS Copyright: C. Faloutsos (2012)59 A famous power law: Zipf’s law Bible - rank vs frequency (log-log) log(rank) log(freq) “a” “the”

CMU SCS Copyright: C. Faloutsos (2012)60 A famous power law: Zipf’s law Bible - rank vs frequency (log-log) similarly, in many other languages; for customers and sales volume; city populations etc etc log(rank) log(freq)

CMU SCS Copyright: C. Faloutsos (2012)61 A famous power law: Zipf’s law Zipf distr: freq = 1/ rank generalized Zipf: freq = 1 / (rank)^a log(rank) log(freq)

CMU SCS Copyright: C. Faloutsos (2012)62 Olympic medals (Sidney): rank log(#medals)

CMU SCS Copyright: C. Faloutsos (2012)63 Olympic medals (Sidney’00, Athens’04): log( rank) log(#medals)

CMU SCS Copyright: C. Faloutsos (2012)64 TELCO data # of service units count of customers ‘best customer’

CMU SCS Copyright: C. Faloutsos (2012)65 SALES data – store#96 # units sold count of products “aspirin”

CMU SCS Copyright: C. Faloutsos (2012)66 More power laws: areas – Korcak’s law Scandinavian lakes Any pattern?

CMU SCS Copyright: C. Faloutsos (2012)67 More power laws: areas – Korcak’s law Scandinavian lakes area vs complementary cumulative count (log-log axes) log(count( >= area)) log(area)

CMU SCS Copyright: C. Faloutsos (2012)68 More power laws: Korcak Japan islands; area vs cumulative count (log-log axes) log(area) log(count( >= area))

CMU SCS Copyright: C. Faloutsos (2012)69 (Korcak’s law: Aegean islands)

CMU SCS Copyright: C. Faloutsos (2012)70 Korcak’s law & “fat fractals” How to generate such regions?

CMU SCS Copyright: C. Faloutsos (2012)71 Korcak’s law & “fat fractals” Q: How to generate such regions? A: recursively, from a single region...

CMU SCS Copyright: C. Faloutsos (2012)72 so far we’ve seen: concepts: –fractals, multifractals and fat fractals tools: –correlation integral (= pair-count plot) –rank/frequency plot (Zipf’s law) –CCDF (Korcak’s law)

CMU SCS Copyright: C. Faloutsos (2012)73 so far we’ve seen: concepts: –fractals, multifractals and fat fractals tools: –correlation integral (= pair-count plot) –rank/frequency plot (Zipf’s law) –CCDF (Korcak’s law) same info

CMU SCS Copyright: C. Faloutsos (2012)74 Road map Motivation – 3 problems / case studies Definition of fractals and power laws Solutions to posed problems More tools and examples Discussion - putting fractals to work! Conclusions – practitioner’s guide Appendix: gory details - boxcounting plots

CMU SCS Copyright: C. Faloutsos (2012)75 Other applications: Internet How does the internet look like? CMU

CMU SCS Copyright: C. Faloutsos (2012)76 Other applications: Internet How does the internet look like? Internet routers: how many neighbors within h hops? CMU

CMU SCS Copyright: C. Faloutsos (2012)77 (reminder: our tool-box:) concepts: –fractals, multifractals and fat fractals tools: –correlation integral (= pair-count plot) –rank/frequency plot (Zipf’s law) –CCDF (Korcak’s law)

CMU SCS Copyright: C. Faloutsos (2012)78 Internet topology Internet routers: how many neighbors within h hops? Reachability function: number of neighbors within r hops, vs r (log- log). Mbone routers, 1995 log(hops) log(#pairs) 2.8

CMU SCS Copyright: C. Faloutsos (2012)79 More power laws on the Internet degree vs rank, for Internet domains (log-log) [sigcomm99] log(rank) log(degree) -0.82

CMU SCS Copyright: C. Faloutsos (2012)80 More power laws - internet pdf of degrees: (slope: 2.2 ) Log(count) Log(degree) -2.2

CMU SCS Copyright: C. Faloutsos (2012)81 Even more power laws on the Internet Scree plot for Internet domains (log- log) [sigcomm99] log(i) log( i-th eigenvalue) 0.47

CMU SCS Copyright: C. Faloutsos (2012)82 Fractals & power laws: appear in numerous settings: medical geographical / geological social computer-system related

CMU SCS Copyright: C. Faloutsos (2012)83 More apps: Brain scans Oct-trees; brain-scans octree levels Log(#octants)

CMU SCS Copyright: C. Faloutsos (2012)84 More apps: Brain scans Oct-trees; brain-scans octree levels Log(#octants) 2.63 = fd

CMU SCS Copyright: C. Faloutsos (2012)85 More apps: Medical images [Burdett et al, SPIE ‘93]: benign tumors: fd ~ 2.37 malignant: fd ~ 2.56

CMU SCS Copyright: C. Faloutsos (2012)86 More fractals: cardiovascular system: 3 (!) lungs: 2.9

CMU SCS Copyright: C. Faloutsos (2012)87 Fractals & power laws: appear in numerous settings: medical geographical / geological social computer-system related

CMU SCS Copyright: C. Faloutsos (2012)88 More fractals: Coastlines:

CMU SCS Copyright: C. Faloutsos (2012)89

CMU SCS Copyright: C. Faloutsos (2012)90 More fractals: the fractal dimension for the Amazon river is 1.85 (Nile: 1.4) [ems.gphys.unc.edu/nonlinear/fractals/examples.html]

CMU SCS Copyright: C. Faloutsos (2012)91 More fractals: the fractal dimension for the Amazon river is 1.85 (Nile: 1.4) [ems.gphys.unc.edu/nonlinear/fractals/examples.html]

CMU SCS Copyright: C. Faloutsos (2012)92 More power laws Energy of earthquakes (Gutenberg-Richter law) [simscience.org] log(freq) magnitudeday amplitude

CMU SCS Copyright: C. Faloutsos (2012)93 Fractals & power laws: appear in numerous settings: medical geographical / geological social computer-system related

CMU SCS Copyright: C. Faloutsos (2012)94 More fractals: stock prices (LYCOS) - random walks: year2 years

CMU SCS Copyright: C. Faloutsos (2012)95 Even more power laws: Income distribution (Pareto’s law) size of firms publication counts (Lotka’s law)

CMU SCS Copyright: C. Faloutsos (2012)96 Even more power laws: library science (Lotka’s law of publication count); and citation counts: (citeseer.nj.nec.com 6/2001) log(#citations) log(count) Ullman

CMU SCS Copyright: C. Faloutsos (2012)97 Even more power laws: web hit counts [w/ A. Montgomery] Web Site Traffic log(freq) log(count) Zipf “yahoo.com”

CMU SCS Copyright: C. Faloutsos (2012)98 Fractals & power laws: appear in numerous settings: medical geographical / geological social computer-system related

CMU SCS Copyright: C. Faloutsos (2012)99 Power laws, cont’d In- and out-degree distribution of web sites [Barabasi], [IBM-CLEVER] log indegree - log(freq) from [Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins ]

CMU SCS Copyright: C. Faloutsos (2012)100 Power laws, cont’d In- and out-degree distribution of web sites [Barabasi], [IBM-CLEVER] log indegree log(freq) from [Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins ]

CMU SCS Copyright: C. Faloutsos (2012)101 “Foiled by power law” [Broder+, WWW’00] (log) in-degree (log) count

CMU SCS Copyright: C. Faloutsos (2012)102 “Foiled by power law” [Broder+, WWW’00] “The anomalous bump at 120 on the x-axis is due a large clique formed by a single spammer” (log) in-degree (log) count

CMU SCS Copyright: C. Faloutsos (2012)103 Power laws, cont’d In- and out-degree distribution of web sites [Barabasi], [IBM-CLEVER] length of file transfers [Crovella+Bestavros ‘96] duration of UNIX jobs [Harchol-Balter]

CMU SCS Copyright: C. Faloutsos (2012)104 Even more power laws: Distribution of UNIX file sizes web hit counts [Huberman]

CMU SCS Copyright: C. Faloutsos (2012)105 Road map Motivation – 3 problems / case studies Definition of fractals and power laws Solutions to posed problems More examples and tools Discussion - putting fractals to work! Conclusions – practitioner’s guide Appendix: gory details - boxcounting plots

CMU SCS Copyright: C. Faloutsos (2012)106 What else can they solve? separability [KDD’02] forecasting [CIKM’02] dimensionality reduction [SBBD’00] non-linear axis scaling [KDD’02] disk trace modeling [Wang+’02] selectivity of spatial/multimedia queries [PODS’94, VLDB’95, ICDE’00]...

CMU SCS Copyright: C. Faloutsos (2012)107 Settings for fractals: Points; areas (-> fat fractals), eg:

CMU SCS Copyright: C. Faloutsos (2012)108 Settings for fractals: Points; areas, eg: cities/stores/hospitals, over earth’s surface time-stamps of events (customer arrivals, packet losses, criminal actions) over time regions (sales areas, islands, patches of habitats) over space

CMU SCS Copyright: C. Faloutsos (2012)109 Settings for fractals: customer feature vectors (age, income, frequency of visits, amount of sales per visit) ‘good’ customers ‘bad’ customers

CMU SCS Copyright: C. Faloutsos (2012)110 Some uses of fractals: Detect non-existence of rules (if points are uniform) Detect non-homogeneous regions (eg., legal login time-stamps may have different fd than intruders’) Estimate number of neighbors / customers / competitors within a radius

CMU SCS Copyright: C. Faloutsos (2012)111 Multi-Fractals Setting: points or objects, w/ some value, eg: –cities w/ populations –positions on earth and amount of gold/water/oil underneath –product ids and sales per product –people and their salaries –months and count of accidents

CMU SCS Copyright: C. Faloutsos (2012)112 Use of multifractals: Estimate tape/disk accesses – how many of the 100 tapes contain my 50 phonecall records? – how many days without an accident? time Tape#1 Tape# N

CMU SCS Copyright: C. Faloutsos (2012)113 Use of multifractals how often do we exceed the threshold? time #bytes Poisson

CMU SCS Copyright: C. Faloutsos (2012)114 Use of multifractals cont’d Extrapolations for/from samples time #bytes

CMU SCS Copyright: C. Faloutsos (2012)115 Use of multifractals cont’d How many distinct products account for 90% of the sales? 20% 80%

CMU SCS Copyright: C. Faloutsos (2012)116 Road map Motivation – 3 problems / case studies Definition of fractals and power laws Solutions to posed problems More examples and tools Discussion - putting fractals to work! Conclusions – practitioner’s guide Appendix: gory details - boxcounting plots

CMU SCS Copyright: C. Faloutsos (2012)117 Conclusions Real data often disobey textbook assumptions (Gaussian, Poisson, uniformity, independence)

CMU SCS Copyright: C. Faloutsos (2012)118 Conclusions - cont’d Self-similarity & power laws: appear in many cases Bad news: lead to skewed distributions (no Gaussian, Poisson, uniformity, independence, mean, variance)

CMU SCS Copyright: C. Faloutsos (2012)119 Conclusions - cont’d Self-similarity & power laws: appear in many cases Bad news: lead to skewed distributions (no Gaussian, Poisson, uniformity, independence, mean, variance) Good news: ‘correlation integral’ for separability rank/frequency plots (multifractals) (Hurst exponent, strange attractors, renormalization theory, ++) X

CMU SCS Copyright: C. Faloutsos (2012)120 Conclusions tool#1: (for points) ‘correlation integral’: (#pairs within <= r) vs (distance r) tool#2: (for categorical values) rank- frequency plot (a’la Zipf) tool#3: (for numerical values) CCDF: Complementary cumulative distr. function (#of elements with value >= a )

CMU SCS Copyright: C. Faloutsos (2012)121 Practitioner’s guide: tool#1: #pairs vs distance, for a set of objects, with a distance function (slope = intrinsic dimensionality) log(hops) log(#pairs) 2.8 log( r ) log(#pairs(within <= r)) 1.51 internet MGcounty

CMU SCS Copyright: C. Faloutsos (2012)122 Practitioner’s guide: tool#2: rank-frequency plot (for categorical attributes) log(rank) log(degree) internet domains Bible log(freq) log(rank)

CMU SCS Copyright: C. Faloutsos (2012)123 Practitioner’s guide: tool#3: CCDF, for (skewed) numerical attributes, eg. areas of islands/lakes, UNIX jobs...) log(count( >= area)) log(area) scandinavian lakes

CMU SCS Copyright: C. Faloutsos (2012)124 Resources: Software for fractal dimension – –And specifically ‘fdnq_h’: – Also, in ‘R’: ‘fdim’ package

CMU SCS Copyright: C. Faloutsos (2012)125 Books Strongly recommended intro book: –Manfred Schroeder Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991 Classic book on fractals: –B. Mandelbrot Fractal Geometry of Nature, W.H. Freeman, 1977

CMU SCS Copyright: C. Faloutsos (2012)126 References [vldb95] Alberto Belussi and Christos Faloutsos, Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension Proc. of VLDB, p , 1995 [Broder+’00] Andrei Broder, Ravi Kumar, Farzin Maghoul1, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, Janet Wiener, Graph structure in the web, WWW’00 M. Crovella and A. Bestavros, Self similarity in World wide web traffic: Evidence and possible causes, SIGMETRICS ’96.

CMU SCS Copyright: C. Faloutsos (2012)127 References –[ieeeTN94] W. E. Leland, M.S. Taqqu, W. Willinger, D.V. Wilson, On the Self-Similar Nature of Ethernet Traffic, IEEE Transactions on Networking, 2, 1, pp 1- 15, Feb – [pods94] Christos Faloutsos and Ibrahim Kamel, Beyond Uniformity and Independence: Analysis of R- trees Using the Concept of Fractal Dimension, PODS, Minneapolis, MN, May 24-26, 1994, pp. 4-13

CMU SCS Copyright: C. Faloutsos (2012)128 References –[vldb96] Christos Faloutsos, Yossi Matias and Avi Silberschatz, Modeling Skewed Distributions Using Multifractals and the `80-20 Law’ Conf. on Very Large Data Bases (VLDB), Bombay, India, Sept

CMU SCS Copyright: C. Faloutsos (2012)129 References –[vldb96] Christos Faloutsos and Volker Gaede Analysis of the Z-Ordering Method Using the Hausdorff Fractal Dimension VLD, Bombay, India, Sept –[sigcomm99] Michalis Faloutsos, Petros Faloutsos and Christos Faloutsos, What does the Internet look like? Empirical Laws of the Internet Topology, SIGCOMM 1999

CMU SCS Copyright: C. Faloutsos (2012)130 References –[icde99] Guido Proietti and Christos Faloutsos, I/O complexity for range queries on region data stored using an R-tree International Conference on Data Engineering (ICDE), Sydney, Australia, March 23-26, 1999 –[sigmod2000] Christos Faloutsos, Bernhard Seeger, Agma J. M. Traina and Caetano Traina Jr., Spatial Join Selectivity Using Power Laws, SIGMOD 2000

CMU SCS Copyright: C. Faloutsos (2012)131 References - [Wang+’02] Mengzhi Wang, Anastassia Ailamaki and Christos Faloutsos, Capturing the spatio-temporal behavior of real traffic data Performance 2002 (IFIP Int. Symp. on Computer Performance Modeling, Measurement and Evaluation), Rome, Italy, Sept Capturing the spatio-temporal behavior of real traffic data

CMU SCS Copyright: C. Faloutsos (2012)132 Appendix - Gory details Bad news: There are more than one fractal dimensions –Minkowski fd; Hausdorff fd; Correlation fd; Information fd Great news: –they can all be computed fast! –they usually have nearby values

CMU SCS Copyright: C. Faloutsos (2012)133 Fast estimation of fd(s): How, for the (correlation) fractal dimension? A: Box-counting plot: log( r ) r pi log(sum(pi ^2))

CMU SCS Copyright: C. Faloutsos (2012)134 Definitions pi : the percentage (or count) of points in the i-th cell r: the side of the grid

CMU SCS Copyright: C. Faloutsos (2012)135 Fast estimation of fd(s): compute sum(pi^2) for another grid side, r’ log( r ) r’ pi’ log(sum(pi ^2))

CMU SCS Copyright: C. Faloutsos (2012)136 Fast estimation of fd(s): etc; if the resulting plot has a linear part, its slope is the correlation fractal dimension D2 log( r ) log(sum(pi ^2))

CMU SCS Copyright: C. Faloutsos (2012)137 Definitions (cont’d) Many more fractal dimensions Dq (related to Renyi entropies):

CMU SCS Copyright: C. Faloutsos (2012)138 Hausdorff or box-counting fd: Box counting plot: Log( N ( r ) ) vs Log ( r) r: grid side N (r ): count of non-empty cells (Hausdorff) fractal dimension D0:

CMU SCS Copyright: C. Faloutsos (2012)139 Definitions (cont’d) Hausdorff fd: r log(r) log(#non-empty cells) D0D0

CMU SCS Copyright: C. Faloutsos (2012)140 Observations q=0: Hausdorff fractal dimension q=2: Correlation fractal dimension (identical to the exponent of the number of neighbors vs radius) q=1: Information fractal dimension

CMU SCS Copyright: C. Faloutsos (2012)141 Observations, cont’d in general, the Dq’s take similar, but not identical, values. except for perfectly self-similar point-sets, where Dq=Dq’ for any q, q’

CMU SCS Copyright: C. Faloutsos (2012)142 Examples:MG county Montgomery County of MD (road end- points)

CMU SCS Copyright: C. Faloutsos (2012)143 Examples:LB county Long Beach county of CA (road end-points)

CMU SCS Copyright: C. Faloutsos (2012)144 Conclusions many fractal dimensions, with nearby values can be computed quickly (O(N) or O(N log(N)) (code: on the web: – –Or `R’ (‘fdim’ package)