Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMU SCS Graph and stream mining Christos Faloutsos CMU.

Similar presentations


Presentation on theme: "CMU SCS Graph and stream mining Christos Faloutsos CMU."— Presentation transcript:

1 CMU SCS Graph and stream mining Christos Faloutsos CMU

2 CMU SCS CALD IC 2004© C. Faloutsos (2004)2 CONGRATULATIONS!

3 CMU SCS CALD IC 2004© C. Faloutsos (2004)3 Outline Problem definition / Motivation Graphs and power laws Streams and forecasting Conclusions

4 CMU SCS CALD IC 2004© C. Faloutsos (2004)4 Motivation Data mining: ~ find patterns How do real graphs look like? How do (numerical) streams look like?

5 CMU SCS CALD IC 2004© C. Faloutsos (2004)5 Joint work with Deepayan Chakrabarti (CMU - CALD)

6 CMU SCS CALD IC 2004© C. Faloutsos (2004)6 Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] Protein Interactions [genomebiology.com] Friendship Network [Moody ’01]

7 CMU SCS CALD IC 2004© C. Faloutsos (2004)7 Problem #1 - network and graph mining How does the Internet look like? How does the web look like? What constitutes a ‘normal’ social network? What is the ‘network value’ of a customer? which gene/species affects the others the most?

8 CMU SCS CALD IC 2004© C. Faloutsos (2004)8 Problem#1 Given a graph: which node to market-to / defend / immunize first? Are there un-natural sub- graphs? (criminals’ rings or terrorist cells)? How do P2P networks evolve?

9 CMU SCS CALD IC 2004© C. Faloutsos (2004)9 Solution#1: A1: Power law in the degree distribution [SIGCOMM99] log(rank) log(degree) -0.82 internet domains att.com ibm.com

10 CMU SCS CALD IC 2004© C. Faloutsos (2004)10 Solution#1’: Eigen Exponent E power law in the eigenvalues of the adjacency matrix E = -0.48 Exponent = slope Eigenvalue Rank of decreasing eigenvalue May 2001

11 CMU SCS CALD IC 2004© C. Faloutsos (2004)11 But: Q1: How about graphs from other domains? Q2: How about temporal evolution?

12 CMU SCS CALD IC 2004© C. Faloutsos (2004)12 More power laws: citation counts: (citeseer.nj.nec.com 6/2001) log(#citations) log(count) Ullman

13 CMU SCS CALD IC 2004© C. Faloutsos (2004)13 More power laws: web hit counts [w/ A. Montgomery] Web Site Traffic log(freq) log(count) Zipf users sites

14 CMU SCS CALD IC 2004© C. Faloutsos (2004)14 The Peer-to-Peer Topology Frequency versus degree Number of adjacent peers follows a power-law [Jovanovic+]

15 CMU SCS CALD IC 2004© C. Faloutsos (2004)15 epinions.com who-trusts-whom [Richardson + Domingos, KDD 2001] (out) degree count

16 CMU SCS CALD IC 2004© C. Faloutsos (2004)16 More Power laws Also hold for other web graphs [Barabasi+], [Tomkins+], with additional ‘rules’ (bi- partite cores follow power laws)

17 CMU SCS CALD IC 2004© C. Faloutsos (2004)17

18 CMU SCS CALD IC 2004© C. Faloutsos (2004)18 A famous power law: Zipf’s law Bible - rank vs frequency (log-log) similarly, in many other languages; for customers and sales volume; city populations etc etc log(rank) log(freq)

19 CMU SCS CALD IC 2004© C. Faloutsos (2004)19 Olympic medals (Sidney): rank log(#medals)

20 CMU SCS CALD IC 2004© C. Faloutsos (2004)20 More power laws: areas – Korcak’s law Scandinavian lakes area vs complementary cumulative count (log-log axes) log(count( >= area)) log(area)

21 CMU SCS CALD IC 2004© C. Faloutsos (2004)21

22 CMU SCS CALD IC 2004© C. Faloutsos (2004)22 Time evolution Do these laws hold over time?

23 CMU SCS CALD IC 2004© C. Faloutsos (2004)23 Time Evolution: rank R The rank exponent has not changed! Domain level log(rank) log(degree) - 0.82 att.com ibm.com

24 CMU SCS CALD IC 2004© C. Faloutsos (2004)24 Outline Problem definition / Motivation Graphs and power laws Streams and forecasting Conclusions

25 CMU SCS CALD IC 2004© C. Faloutsos (2004)25 Why care about streams? Sensor devices –Temperature, weather measurements –Road traffic data –Geological observations –Patient physiological data Embedded devices –Network routers –Intelligent (active) disks

26 CMU SCS CALD IC 2004© C. Faloutsos (2004)26 Modeling bursty traffic Given a signal (eg., bytes over time) model bursty traffic generate realistic traces (Poisson does not work) time # bytes Poisson

27 CMU SCS CALD IC 2004© C. Faloutsos (2004)27 Modeling bursty traffic Given a signal (eg., bytes over time) give guarantees: time # bytes Poisson

28 CMU SCS CALD IC 2004© C. Faloutsos (2004)28 Solution: self-similarity # bytes time # bytes

29 CMU SCS CALD IC 2004© C. Faloutsos (2004)29 Approach Q1: How to generate a sequence, that is –bursty –self-similar –and has ‘realistic’ queue length distributions

30 CMU SCS CALD IC 2004© C. Faloutsos (2004)30 Binary multifractals 20 80

31 CMU SCS CALD IC 2004© C. Faloutsos (2004)31 Binary multifractals 20 80 [Mengzhi Wang, best student paper award, PEVA’02]

32 CMU SCS CALD IC 2004© C. Faloutsos (2004)32 Forecasting: AWSOM: using wavelets and AutoRegression [Papadimitriou+, 2003]

33 CMU SCS CALD IC 2004© C. Faloutsos (2004)33 Results Real data – Sunspot Sunspot intensity – Slightly time-varying “period” AR captures wrong trend (average) Seasonal ARIMA –Captures immediate wrong downward trend –Requires human to determine seasonal component period (fixed)

34 CMU SCS CALD IC 2004© C. Faloutsos (2004)34 Results Real data – Sunspot Sunspot intensity – Slightly time-varying “period”

35 CMU SCS CALD IC 2004© C. Faloutsos (2004)35 Conclusions Graphs & streams pose fascinating problems self-similarity, fractals and power laws work, when textbook methods fail!

36 CMU SCS CALD IC 2004© C. Faloutsos (2004)36 Other on-going projects video data mining [Pan + Yang] Disk traffic modeling [Ailamaki; Ganger]

37 CMU SCS CALD IC 2004© C. Faloutsos (2004)37 Books Manfred Schroeder: Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991 (Probably the BEST book on fractals!)

38 CMU SCS CALD IC 2004© C. Faloutsos (2004)38 Contact info christos@cs.cmu.edu www.cs.cmu.edu/~christos Wean Hall 7107 Ph#: x8.1457


Download ppt "CMU SCS Graph and stream mining Christos Faloutsos CMU."

Similar presentations


Ads by Google