CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #9: Fractals – examples & algo’s C. Faloutsos.

Slides:



Advertisements
Similar presentations
CMU SCS : Multimedia Databases and Data Mining Lecture #17: Text - part IV (LSI) C. Faloutsos.
Advertisements

CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - IV Grid files, dim. curse C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #4: Multi-key and Spatial Access Methods - I C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #19: SVD - part II (case studies) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #10: Fractals - case studies - I C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture#5: Multi-key and Spatial Access Methods - II C. Faloutsos.
Indexing and Data Mining in Multimedia Databases Christos Faloutsos CMU
CMU SCS : Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
15-826: Multimedia Databases and Data Mining
CMU SCS : Multimedia Databases and Data Mining Lecture #16: Text - part III: Vector space model and clustering C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals - case studies Part III (regions, quadtrees, knn queries) C. Faloutsos.
School of Computer Science Carnegie Mellon Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
1 Complex systems Made of many non-identical elements connected by diverse interactions. NETWORK New York Times Slides: thanks to A-L Barabasi.
Web as Graph – Empirical Studies The Structure and Dynamics of Networks.
CMU SCS Data Mining Meets Systems: Tools and Case Studies Christos Faloutsos SCS CMU.
Indexing and Data Mining in Multimedia Databases Christos Faloutsos CMU
Analysis of the Internet Topology Michalis Faloutsos, U.C. Riverside (PI) Christos Faloutsos, CMU (sub- contract, co-PI) DARPA NMS, no
CMU SCS Graph and stream mining Christos Faloutsos CMU.
1 ISI’02 Multidimensional Databases Challenge: representation for efficient storage, indexing & querying Examples (time-series, images) New multidimensional.
School of Computer Science Carnegie Mellon Boston U., 2005C. Faloutsos1 Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
10-603/15-826A: Multimedia Databases and Data Mining SVD - part I (definitions) C. Faloutsos.
Multimedia Databases Text II. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases Text databases Image and video.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14, 2006
Carnegie Mellon Powerful Tools for Data Mining Fractals, Power laws, SVD C. Faloutsos Carnegie Mellon University.
Self-Similar through High-Variability: Statistical Analysis of Ethernet LAN Traffic at the Source Level Walter Willinger, Murad S. Taqqu, Robert Sherman,
Spatial Indexing. Spatial Queries Given a collection of geometric objects (points, lines, polygons,...) organize them on disk, to answer point queries.
Data Mining using Fractals and Power laws
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Conclusions C. Faloutsos.
CMU SCS Data Mining in Streams and Graphs Christos Faloutsos CMU.
Introduction to Fractals and Fractal Dimension Christos Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #8: Fractals - introduction C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #8: Fractals - introduction C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #10: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
School of Computer Science Carnegie Mellon UIUC 04C. Faloutsos1 Advanced Data Mining Tools: Fractals and Power Laws for Graphs, Streams and Traditional.
CMU SCS : Multimedia Databases and Data Mining Lecture #12: Fractals - case studies Part III (quadtrees, knn queries) C. Faloutsos.
School of Computer Science Carnegie Mellon Data Mining using Fractals (fractals for fun and profit) Christos Faloutsos Carnegie Mellon University.
School of Computer Science Carnegie Mellon Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos.
School of Computer Science Carnegie Mellon Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
CMU SCS : Multimedia Databases and Data Mining Lecture #18: SVD - part I (definitions) C. Faloutsos.
R-trees: An Average Case Analysis. R-trees - performance analysis How many disk (=node) accesses we ’ ll need for range nn spatial joins why does it matter?
The Power-Method: A Comprehensive Estimation Technique for Multi- Dimensional Queries Yufei Tao U. Hong Kong Christos Faloutsos CMU Dimitris Papadias Hong.
23 1 Christian Böhm 1, Florian Krebs 2, and Hans-Peter Kriegel 2 1 University for Health Informatics and Technology, Innsbruck 2 University of Munich Optimal.
School of Computer Science Carnegie Mellon WRIGHT, 2005C. Faloutsos1 Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P9-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
CMU SCS : Multimedia Databases and Data Mining Lecture #7: Spatial Access Methods - Metric trees C. Faloutsos.
Carnegie Mellon Data Mining – Research Directions C. Faloutsos CMU
SCS-CMU Data Mining Tools A crash course C. Faloutsos.
Next Generation Data Mining Tools: SVD and Fractals
Indexing and Data Mining in Multimedia Databases
15-826: Multimedia Databases and Data Mining
Finding patterns in large, real networks
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
Mark E. Crovella and Azer Bestavros Computer Science Dept,
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
Feifei Li, Ching Chang, George Kollios, Azer Bestavros
15-826: Multimedia Databases and Data Mining
Data Mining using Fractals and Power laws
R-trees: An Average Case Analysis
15-826: Multimedia Databases and Data Mining
Presentation transcript:

CMU SCS : Multimedia Databases and Data Mining Lecture #9: Fractals – examples & algo’s C. Faloutsos

CMU SCS Copyright: C. Faloutsos (2014)2 Must-read Material Christos Faloutsos and Ibrahim Kamel, Beyond Uniformity and Independence: Analysis of R-trees Using the Concept of Fractal Dimension, Proc. ACM SIGACT- SIGMOD-SIGART PODS, May 1994, pp. 4-13, Minneapolis, MN. Beyond Uniformity and Independence: Analysis of R-trees Using the Concept of Fractal Dimension

CMU SCS Copyright: C. Faloutsos (2014)3 Recommended Material optional, but very useful: Manfred Schroeder Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991 – Chapter 10: boxcounting method – Chapter 1: Sierpinski triangle

CMU SCS Copyright: C. Faloutsos (2014)4 Outline Goal: ‘Find similar / interesting things’ Intro to DB Indexing - similarity search Data Mining

CMU SCS Copyright: C. Faloutsos (2014)5 Indexing - Detailed outline primary key indexing secondary key / multi-key indexing spatial access methods –z-ordering –R-trees –misc fractals –intro –applications text

CMU SCS Copyright: C. Faloutsos (2014)6 Road map Motivation – 3 problems / case studies Definition of fractals and power laws Solutions to posed problems More tools and examples Discussion - putting fractals to work! Conclusions – practitioner’s guide Appendix: gory details - boxcounting plots

CMU SCS Copyright: C. Faloutsos (2014)7 More power laws on the Internet degree vs rank, for Internet domains (log-log) [sigcomm99] log(rank) log(degree) -0.82

CMU SCS Copyright: C. Faloutsos (2014)8 More power laws - internet pdf of degrees: (slope: 2.2 ) Log(count) Log(degree) -2.2

CMU SCS Copyright: C. Faloutsos (2014)9 Even more power laws on the Internet Scree plot for Internet domains (log- log) [sigcomm99] log(i) log( i-th eigenvalue) 0.47

CMU SCS Copyright: C. Faloutsos (2014)10 Fractals & power laws: appear in numerous settings: medical geographical / geological social computer-system related

CMU SCS Copyright: C. Faloutsos (2014)11 More apps: Brain scans Oct-trees; brain-scans octree levels Log(#octants)

CMU SCS Copyright: C. Faloutsos (2014)12 More apps: Brain scans Oct-trees; brain-scans octree levels Log(#octants) 2.63 = fd

CMU SCS Copyright: C. Faloutsos (2014)13 More apps: Medical images [Burdett et al, SPIE ‘93]: benign tumors: fd ~ 2.37 malignant: fd ~ 2.56

CMU SCS Copyright: C. Faloutsos (2014)14 More fractals: cardiovascular system: 3 (!) lungs: 2.9

CMU SCS Copyright: C. Faloutsos (2014)15 Fractals & power laws: appear in numerous settings: medical geographical / geological social computer-system related

CMU SCS Copyright: C. Faloutsos (2014)16 More fractals: Coastlines:

CMU SCS Copyright: C. Faloutsos (2014)17

CMU SCS Copyright: C. Faloutsos (2014)18 More fractals: the fractal dimension for the Amazon river is 1.85 (Nile: 1.4) [ems.gphys.unc.edu/nonlinear/fractals/examples.html]

CMU SCS Copyright: C. Faloutsos (2014)19 More fractals: the fractal dimension for the Amazon river is 1.85 (Nile: 1.4) [ems.gphys.unc.edu/nonlinear/fractals/examples.html]

CMU SCS Copyright: C. Faloutsos (2014)20 More power laws Energy of earthquakes (Gutenberg-Richter law) [simscience.org] log(freq) magnitudeday amplitude

CMU SCS Copyright: C. Faloutsos (2014)21 Fractals & power laws: appear in numerous settings: medical geographical / geological social computer-system related

CMU SCS Copyright: C. Faloutsos (2014)22 More fractals: stock prices (LYCOS) - random walks: year2 years

CMU SCS Copyright: C. Faloutsos (2014)23 Even more power laws: Income distribution (Pareto’s law) size of firms publication counts (Lotka’s law)

CMU SCS Copyright: C. Faloutsos (2014)24 Even more power laws: web hit counts [w/ A. Montgomery] Web Site Traffic log(freq) log(count) Zipf “yahoo.com”

CMU SCS Copyright: C. Faloutsos (2014)25 Fractals & power laws: appear in numerous settings: medical geographical / geological social computer-system related

CMU SCS Copyright: C. Faloutsos (2014)26 Power laws, cont’d In- and out-degree distribution of web sites [Barabasi], [IBM-CLEVER] log indegree - log(freq) from [Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins ]

CMU SCS Copyright: C. Faloutsos (2014)27 Power laws, cont’d In- and out-degree distribution of web sites [Barabasi], [IBM-CLEVER] log indegree log(freq) from [Ravi Kumar, Prabhakar Raghavan, Sridhar Rajagopalan, Andrew Tomkins ]

CMU SCS Copyright: C. Faloutsos (2014)28 “Foiled by power law” [Broder+, WWW’00] (log) in-degree (log) count

CMU SCS Copyright: C. Faloutsos (2014)29 “Foiled by power law” [Broder+, WWW’00] “The anomalous bump at 120 on the x-axis is due a large clique formed by a single spammer” (log) in-degree (log) count

CMU SCS Copyright: C. Faloutsos (2014)30 Power laws, cont’d In- and out-degree distribution of web sites [Barabasi], [IBM-CLEVER] length of file transfers [Crovella+Bestavros ‘96] duration of UNIX jobs [Harchol-Balter]

CMU SCS Copyright: C. Faloutsos (2014)31 Even more power laws: Distribution of UNIX file sizes web hit counts [Huberman]

CMU SCS Copyright: C. Faloutsos (2014)32 Road map Motivation – 3 problems / case studies Definition of fractals and power laws Solutions to posed problems More examples and tools Discussion - putting fractals to work! Conclusions – practitioner’s guide Appendix: gory details - boxcounting plots

CMU SCS Copyright: C. Faloutsos (2014)33 What else can they solve? separability [KDD’02] forecasting [CIKM’02] dimensionality reduction [SBBD’00] non-linear axis scaling [KDD’02] disk trace modeling [Wang+’02] selectivity of spatial/multimedia queries [PODS’94, VLDB’95, ICDE’00]... time #bytes

CMU SCS Copyright: C. Faloutsos (2014)34 Settings for fractals: Points; areas (-> fat fractals), eg:

CMU SCS Copyright: C. Faloutsos (2014)35 Settings for fractals: Points; areas, eg: cities/stores/hospitals, over earth’s surface time-stamps of events (customer arrivals, packet losses, criminal actions) over time regions (sales areas, islands, patches of habitats) over space

CMU SCS Copyright: C. Faloutsos (2014)36 Settings for fractals: customer feature vectors (age, income, frequency of visits, amount of sales per visit) ‘good’ customers ‘bad’ customers

CMU SCS Copyright: C. Faloutsos (2014)37 Some uses of fractals: Detect non-existence of rules (if points are uniform) Detect non-homogeneous regions (eg., legal login time-stamps may have different fd than intruders’) Estimate number of neighbors / customers / competitors within a radius

CMU SCS Copyright: C. Faloutsos (2014)38 Multi-Fractals Setting: points or objects, w/ some value, eg: –cities w/ populations –positions on earth and amount of gold/water/oil underneath –product ids and sales per product –people and their salaries –months and count of accidents

CMU SCS Copyright: C. Faloutsos (2014)39 Use of multifractals: Estimate tape/disk accesses – how many of the 100 tapes contain my 50 phonecall records? – how many days without an accident? time Tape#1 Tape# N

CMU SCS Copyright: C. Faloutsos (2014)40 Use of multifractals how often do we exceed the threshold? time #bytes Poisson

CMU SCS Copyright: C. Faloutsos (2014)41 Use of multifractals cont’d Extrapolations for/from samples time #bytes

CMU SCS Copyright: C. Faloutsos (2014)42 Use of multifractals cont’d How many distinct products account for 90% of the sales? 20% 80%

CMU SCS Copyright: C. Faloutsos (2014)43 Road map Motivation – 3 problems / case studies Definition of fractals and power laws Solutions to posed problems More examples and tools Discussion - putting fractals to work! Conclusions – practitioner’s guide Appendix: gory details - boxcounting plots

CMU SCS Copyright: C. Faloutsos (2014)44 Conclusions Real data often disobey textbook assumptions (Gaussian, Poisson, uniformity, independence)

CMU SCS Copyright: C. Faloutsos (2014)45 Conclusions Real data often disobey textbook assumptions (Gaussian, Poisson, uniformity, independence)

CMU SCS Copyright: C. Faloutsos (2014)46 Conclusions - cont’d Self-similarity & power laws: appear in many cases Bad news: lead to skewed distributions (no Gaussian, Poisson, uniformity, independence, mean, variance)

CMU SCS Copyright: C. Faloutsos (2014)47 Conclusions - cont’d Self-similarity & power laws: appear in many cases Bad news: lead to skewed distributions (no Gaussian, Poisson, uniformity, independence, mean, variance) Good news: ‘correlation integral’ for separability rank/frequency plots (multifractals) (Hurst exponent, strange attractors, renormalization theory, ++)

CMU SCS Copyright: C. Faloutsos (2014)48 Conclusions tool#1: (for points) ‘correlation integral’: (#pairs within <= r) vs (distance r) tool#2: (for categorical values) rank- frequency plot (a’la Zipf) tool#3: (for numerical values) CCDF: Complementary cumulative distr. function (#of elements with value >= a )

CMU SCS Copyright: C. Faloutsos (2014)49 Practitioner’s guide: tool#1: #pairs vs distance, for a set of objects, with a distance function (slope = intrinsic dimensionality) log(hops) log(#pairs) 2.8 log( r ) log(#pairs(within <= r)) 1.51 internet MGcounty

CMU SCS Copyright: C. Faloutsos (2014)50 Practitioner’s guide: tool#2: rank-frequency plot (for categorical attributes) log(rank) log(degree) internet domains Bible log(freq) log(rank)

CMU SCS Copyright: C. Faloutsos (2014)51 Practitioner’s guide: tool#3: CCDF, for (skewed) numerical attributes, eg. areas of islands/lakes, UNIX jobs...) log(count( >= area)) log(area) scandinavian lakes

CMU SCS Copyright: C. Faloutsos (2014)52 Resources: Software for fractal dimension – –And specifically ‘fdnq_h’: – Also, in ‘R’: ‘fdim’ package

CMU SCS Copyright: C. Faloutsos (2014)53 Books Strongly recommended intro book: –Manfred Schroeder Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991 Classic book on fractals: –B. Mandelbrot Fractal Geometry of Nature, W.H. Freeman, 1977

CMU SCS Copyright: C. Faloutsos (2014)54 References [vldb95] Alberto Belussi and Christos Faloutsos, Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension Proc. of VLDB, p , 1995 [Broder+’00] Andrei Broder, Ravi Kumar, Farzin Maghoul1, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, Janet Wiener, Graph structure in the web, WWW’00 M. Crovella and A. Bestavros, Self similarity in World wide web traffic: Evidence and possible causes, SIGMETRICS ’96.

CMU SCS Copyright: C. Faloutsos (2014)55 References –[ieeeTN94] W. E. Leland, M.S. Taqqu, W. Willinger, D.V. Wilson, On the Self-Similar Nature of Ethernet Traffic, IEEE Transactions on Networking, 2, 1, pp 1- 15, Feb – [pods94] Christos Faloutsos and Ibrahim Kamel, Beyond Uniformity and Independence: Analysis of R- trees Using the Concept of Fractal Dimension, PODS, Minneapolis, MN, May 24-26, 1994, pp. 4-13

CMU SCS Copyright: C. Faloutsos (2014)56 References –[vldb96] Christos Faloutsos, Yossi Matias and Avi Silberschatz, Modeling Skewed Distributions Using Multifractals and the `80-20 Law’ Conf. on Very Large Data Bases (VLDB), Bombay, India, Sept

CMU SCS Copyright: C. Faloutsos (2014)57 References –[vldb96] Christos Faloutsos and Volker Gaede Analysis of the Z-Ordering Method Using the Hausdorff Fractal Dimension VLD, Bombay, India, Sept –[sigcomm99] Michalis Faloutsos, Petros Faloutsos and Christos Faloutsos, What does the Internet look like? Empirical Laws of the Internet Topology, SIGCOMM 1999

CMU SCS Copyright: C. Faloutsos (2014)58 References –[icde99] Guido Proietti and Christos Faloutsos, I/O complexity for range queries on region data stored using an R-tree International Conference on Data Engineering (ICDE), Sydney, Australia, March 23-26, 1999 –[sigmod2000] Christos Faloutsos, Bernhard Seeger, Agma J. M. Traina and Caetano Traina Jr., Spatial Join Selectivity Using Power Laws, SIGMOD 2000

CMU SCS Copyright: C. Faloutsos (2014)59 References - [Wang+’02] Mengzhi Wang, Anastassia Ailamaki and Christos Faloutsos, Capturing the spatio-temporal behavior of real traffic data Performance 2002 (IFIP Int. Symp. on Computer Performance Modeling, Measurement and Evaluation), Rome, Italy, Sept Capturing the spatio-temporal behavior of real traffic data

CMU SCS Copyright: C. Faloutsos (2014)60 Appendix - Gory details Bad news: There are more than one fractal dimensions –Minkowski fd; Hausdorff fd; Correlation fd; Information fd Great news: –they can all be computed fast! –they usually have nearby values

CMU SCS Copyright: C. Faloutsos (2014)61 Fast estimation of fd(s): How, for the (correlation) fractal dimension? A: Box-counting plot: log( r ) r pi log(sum(pi ^2))

CMU SCS Copyright: C. Faloutsos (2014)62 Definitions pi : the percentage (or count) of points in the i-th cell r: the side of the grid

CMU SCS Copyright: C. Faloutsos (2014)63 Fast estimation of fd(s): compute sum(pi^2) for another grid side, r’ log( r ) r’ pi’ log(sum(pi ^2))

CMU SCS Copyright: C. Faloutsos (2014)64 Fast estimation of fd(s): etc; if the resulting plot has a linear part, its slope is the correlation fractal dimension D2 log( r ) log(sum(pi ^2))

CMU SCS Copyright: C. Faloutsos (2014)65 Definitions (cont’d) Many more fractal dimensions Dq (related to Renyi entropies):

CMU SCS Copyright: C. Faloutsos (2014)66 Hausdorff or box-counting fd: Box counting plot: Log( N ( r ) ) vs Log ( r) r: grid side N (r ): count of non-empty cells (Hausdorff) fractal dimension D0:

CMU SCS Copyright: C. Faloutsos (2014)67 Definitions (cont’d) Hausdorff fd: r log(r) log(#non-empty cells) D0D0

CMU SCS Copyright: C. Faloutsos (2014)68 Observations q=0: Hausdorff fractal dimension q=2: Correlation fractal dimension (identical to the exponent of the number of neighbors vs radius) q=1: Information fractal dimension

CMU SCS Copyright: C. Faloutsos (2014)69 Observations, cont’d in general, the Dq’s take similar, but not identical, values. except for perfectly self-similar point-sets, where Dq=Dq’ for any q, q’

CMU SCS Copyright: C. Faloutsos (2014)70 Examples:MG county Montgomery County of MD (road end- points)

CMU SCS Copyright: C. Faloutsos (2014)71 Examples:LB county Long Beach county of CA (road end-points)

CMU SCS Copyright: C. Faloutsos (2014)72 Conclusions many fractal dimensions, with nearby values can be computed quickly (O(N) or O(N log(N)) (code: on the web: – –Or `R’ (‘fdim’ package)