CMU SCS Graph and stream mining Christos Faloutsos CMU.

Slides:



Advertisements
Similar presentations
1 Dynamics of Real-world Networks Jure Leskovec Machine Learning Department Carnegie Mellon University
Advertisements

CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
Indexing and Data Mining in Multimedia Databases Christos Faloutsos CMU
School of Computer Science Carnegie Mellon Sensor and Graph Mining Christos Faloutsos Carnegie Mellon University & IBM
CMU SCS : Multimedia Databases and Data Mining Lecture #9: Fractals - introduction C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals: M-trees and dim. curse (case studies – Part II) C. Faloutsos.
On Power-Law Relationships of the Internet Topology Michalis Faloutsos Petros Faloutsos Christos Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #11: Fractals - case studies Part III (regions, quadtrees, knn queries) C. Faloutsos.
CMU SCS C. Faloutsos (CMU)#1 Large Graph Algorithms Christos Faloutsos CMU McGlohon, Mary Prakash, Aditya Tong, Hanghang Tsourakakis, Babis Akoglu, Leman.
NetMine: Mining Tools for Large Graphs Deepayan Chakrabarti Yiping Zhan Daniel Blandford Christos Faloutsos Guy Blelloch.
Social Networks and Graph Mining Christos Faloutsos CMU - MLD.
1 Epidemic Spreading in Real Networks: an Eigenvalue Viewpoint Yang Wang Deepayan Chakrabarti Chenxi Wang Christos Faloutsos.
CMU SCS APWeb 07(c) 2007, C. Faloutsos 1 Copyright notice Copyright (c) 2007, Christos Faloutsos - all rights preserved. Permission to use all or some.
School of Computer Science Carnegie Mellon Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
Web as Graph – Empirical Studies The Structure and Dynamics of Networks.
CMU SCS Data Mining Meets Systems: Tools and Case Studies Christos Faloutsos SCS CMU.
Indexing and Data Mining in Multimedia Databases Christos Faloutsos CMU
On Power-Law Relationships of the Internet Topology CSCI 780, Fall 2005.
Analysis of the Internet Topology Michalis Faloutsos, U.C. Riverside (PI) Christos Faloutsos, CMU (sub- contract, co-PI) DARPA NMS, no
CMU SCS Bio-informatics, Graph and Stream mining Christos Faloutsos CMU.
School of Computer Science Carnegie Mellon Boston U., 2005C. Faloutsos1 Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
CMU SCS Graph Mining and Influence Propagation Christos Faloutsos CMU.
Carnegie Mellon Powerful Tools for Data Mining Fractals, Power laws, SVD C. Faloutsos Carnegie Mellon University.
CMU SCS Multimedia and Graph mining Christos Faloutsos CMU.
Data Mining using Fractals and Power laws
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Conclusions C. Faloutsos.
CMU SCS Data Mining in Streams and Graphs Christos Faloutsos CMU.
Introduction to Fractals and Fractal Dimension Christos Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #8: Fractals - introduction C. Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Lecture #8: Fractals - introduction C. Faloutsos.
Applications of Poisson Process
On Power-Law Relationships of the Internet Topology.
School of Computer Science Carnegie Mellon UIUC 04C. Faloutsos1 Advanced Data Mining Tools: Fractals and Power Laws for Graphs, Streams and Traditional.
CMU SCS : Multimedia Databases and Data Mining Lecture #9: Fractals – examples & algo’s C. Faloutsos.
CMU SCS Big (graph) data analytics Christos Faloutsos CMU.
CMU SCS Graph Mining: Laws, Generators and Tools Christos Faloutsos CMU.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P0-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
CMU SCS Mining Billion-node Graphs: Patterns, Generators and Tools Christos Faloutsos CMU.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
CMU SCS Mining Billion Node Graphs Christos Faloutsos CMU.
School of Computer Science Carnegie Mellon Data Mining using Fractals (fractals for fun and profit) Christos Faloutsos Carnegie Mellon University.
School of Computer Science Carnegie Mellon Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
Carnegie Mellon Finding patterns in large, real networks Christos Faloutsos CMU
CMU SCS Finding patterns in large, real networks Christos Faloutsos CMU.
R-MAT: A Recursive Model for Graph Mining Deepayan Chakrabarti Yiping Zhan Christos Faloutsos.
CMU SCS Graph Mining: patterns and tools for static and time-evolving graphs Christos Faloutsos CMU.
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.
School of Computer Science Carnegie Mellon Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
CMU SCS Mining Large Social Networks: Patterns and Anomalies Christos Faloutsos CMU.
CMU SCS Graph Mining: Laws, Generators and Tools Christos Faloutsos CMU.
School of Computer Science Carnegie Mellon WRIGHT, 2005C. Faloutsos1 Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P9-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P8-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 8: hadoop and Tera/Peta byte graphs.
Carnegie Mellon Data Mining – Research Directions C. Faloutsos CMU
SCS-CMU Data Mining Tools A crash course C. Faloutsos.
Next Generation Data Mining Tools: SVD and Fractals
Indexing and Data Mining in Multimedia Databases
NetMine: Mining Tools for Large Graphs
Finding patterns in large, real networks
Part 1: Graph Mining – patterns
15-826: Multimedia Databases and Data Mining
R-MAT: A Recursive Model for Graph Mining
15-826: Multimedia Databases and Data Mining
15-826: Multimedia Databases and Data Mining
Graph and Tensor Mining for fun and profit
Graph and Tensor Mining for fun and profit
Data Mining using Fractals and Power laws
Presentation transcript:

CMU SCS Graph and stream mining Christos Faloutsos CMU

CMU SCS CALD IC 2004© C. Faloutsos (2004)2 CONGRATULATIONS!

CMU SCS CALD IC 2004© C. Faloutsos (2004)3 Outline Problem definition / Motivation Graphs and power laws Streams and forecasting Conclusions

CMU SCS CALD IC 2004© C. Faloutsos (2004)4 Motivation Data mining: ~ find patterns How do real graphs look like? How do (numerical) streams look like?

CMU SCS CALD IC 2004© C. Faloutsos (2004)5 Joint work with Deepayan Chakrabarti (CMU - CALD)

CMU SCS CALD IC 2004© C. Faloutsos (2004)6 Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] Protein Interactions [genomebiology.com] Friendship Network [Moody ’01]

CMU SCS CALD IC 2004© C. Faloutsos (2004)7 Problem #1 - network and graph mining How does the Internet look like? How does the web look like? What constitutes a ‘normal’ social network? What is the ‘network value’ of a customer? which gene/species affects the others the most?

CMU SCS CALD IC 2004© C. Faloutsos (2004)8 Problem#1 Given a graph: which node to market-to / defend / immunize first? Are there un-natural sub- graphs? (criminals’ rings or terrorist cells)? How do P2P networks evolve?

CMU SCS CALD IC 2004© C. Faloutsos (2004)9 Solution#1: A1: Power law in the degree distribution [SIGCOMM99] log(rank) log(degree) internet domains att.com ibm.com

CMU SCS CALD IC 2004© C. Faloutsos (2004)10 Solution#1’: Eigen Exponent E power law in the eigenvalues of the adjacency matrix E = Exponent = slope Eigenvalue Rank of decreasing eigenvalue May 2001

CMU SCS CALD IC 2004© C. Faloutsos (2004)11 But: Q1: How about graphs from other domains? Q2: How about temporal evolution?

CMU SCS CALD IC 2004© C. Faloutsos (2004)12 More power laws: citation counts: (citeseer.nj.nec.com 6/2001) log(#citations) log(count) Ullman

CMU SCS CALD IC 2004© C. Faloutsos (2004)13 More power laws: web hit counts [w/ A. Montgomery] Web Site Traffic log(freq) log(count) Zipf users sites

CMU SCS CALD IC 2004© C. Faloutsos (2004)14 The Peer-to-Peer Topology Frequency versus degree Number of adjacent peers follows a power-law [Jovanovic+]

CMU SCS CALD IC 2004© C. Faloutsos (2004)15 epinions.com who-trusts-whom [Richardson + Domingos, KDD 2001] (out) degree count

CMU SCS CALD IC 2004© C. Faloutsos (2004)16 More Power laws Also hold for other web graphs [Barabasi+], [Tomkins+], with additional ‘rules’ (bi- partite cores follow power laws)

CMU SCS CALD IC 2004© C. Faloutsos (2004)17

CMU SCS CALD IC 2004© C. Faloutsos (2004)18 A famous power law: Zipf’s law Bible - rank vs frequency (log-log) similarly, in many other languages; for customers and sales volume; city populations etc etc log(rank) log(freq)

CMU SCS CALD IC 2004© C. Faloutsos (2004)19 Olympic medals (Sidney): rank log(#medals)

CMU SCS CALD IC 2004© C. Faloutsos (2004)20 More power laws: areas – Korcak’s law Scandinavian lakes area vs complementary cumulative count (log-log axes) log(count( >= area)) log(area)

CMU SCS CALD IC 2004© C. Faloutsos (2004)21

CMU SCS CALD IC 2004© C. Faloutsos (2004)22 Time evolution Do these laws hold over time?

CMU SCS CALD IC 2004© C. Faloutsos (2004)23 Time Evolution: rank R The rank exponent has not changed! Domain level log(rank) log(degree) att.com ibm.com

CMU SCS CALD IC 2004© C. Faloutsos (2004)24 Outline Problem definition / Motivation Graphs and power laws Streams and forecasting Conclusions

CMU SCS CALD IC 2004© C. Faloutsos (2004)25 Why care about streams? Sensor devices –Temperature, weather measurements –Road traffic data –Geological observations –Patient physiological data Embedded devices –Network routers –Intelligent (active) disks

CMU SCS CALD IC 2004© C. Faloutsos (2004)26 Modeling bursty traffic Given a signal (eg., bytes over time) model bursty traffic generate realistic traces (Poisson does not work) time # bytes Poisson

CMU SCS CALD IC 2004© C. Faloutsos (2004)27 Modeling bursty traffic Given a signal (eg., bytes over time) give guarantees: time # bytes Poisson

CMU SCS CALD IC 2004© C. Faloutsos (2004)28 Solution: self-similarity # bytes time # bytes

CMU SCS CALD IC 2004© C. Faloutsos (2004)29 Approach Q1: How to generate a sequence, that is –bursty –self-similar –and has ‘realistic’ queue length distributions

CMU SCS CALD IC 2004© C. Faloutsos (2004)30 Binary multifractals 20 80

CMU SCS CALD IC 2004© C. Faloutsos (2004)31 Binary multifractals [Mengzhi Wang, best student paper award, PEVA’02]

CMU SCS CALD IC 2004© C. Faloutsos (2004)32 Forecasting: AWSOM: using wavelets and AutoRegression [Papadimitriou+, 2003]

CMU SCS CALD IC 2004© C. Faloutsos (2004)33 Results Real data – Sunspot Sunspot intensity – Slightly time-varying “period” AR captures wrong trend (average) Seasonal ARIMA –Captures immediate wrong downward trend –Requires human to determine seasonal component period (fixed)

CMU SCS CALD IC 2004© C. Faloutsos (2004)34 Results Real data – Sunspot Sunspot intensity – Slightly time-varying “period”

CMU SCS CALD IC 2004© C. Faloutsos (2004)35 Conclusions Graphs & streams pose fascinating problems self-similarity, fractals and power laws work, when textbook methods fail!

CMU SCS CALD IC 2004© C. Faloutsos (2004)36 Other on-going projects video data mining [Pan + Yang] Disk traffic modeling [Ailamaki; Ganger]

CMU SCS CALD IC 2004© C. Faloutsos (2004)37 Books Manfred Schroeder: Fractals, Chaos, Power Laws: Minutes from an Infinite Paradise W.H. Freeman and Company, 1991 (Probably the BEST book on fractals!)

CMU SCS CALD IC 2004© C. Faloutsos (2004)38 Contact info Wean Hall 7107 Ph#: x8.1457