CMU SCS Bio-informatics, Graph and Stream mining Christos Faloutsos CMU.

Slides:



Advertisements
Similar presentations
1 Dynamics of Real-world Networks Jure Leskovec Machine Learning Department Carnegie Mellon University
Advertisements

Jurij Leskovec, CMU Jon Kleinberg, Cornell Christos Faloutsos, CMU
1 Realistic Graph Generation and Evolution Using Kronecker Multiplication Jurij Leskovec, CMU Deepay Chakrabarti, CMU/Yahoo Jon Kleinberg, Cornell Christos.
Beyond Streams and Graphs: Dynamic Tensor Analysis
CMU SCS I2.2 Large Scale Information Network Processing INARC 1 Overview Goal: scalable algorithms to find patterns and anomalies on graphs 1. Mining Large.
The Connectivity and Fault-Tolerance of the Internet Topology
Lecture 21 Network evolution Slides are modified from Jurij Leskovec, Jon Kleinberg and Christos Faloutsos.
On Power-Law Relationships of the Internet Topology Michalis Faloutsos Petros Faloutsos Christos Faloutsos.
CS728 Lecture 5 Generative Graph Models and the Web.
Modeling Real Graphs using Kronecker Multiplication
CMU SCS C. Faloutsos (CMU)#1 Large Graph Algorithms Christos Faloutsos CMU McGlohon, Mary Prakash, Aditya Tong, Hanghang Tsourakakis, Babis Akoglu, Leman.
NetMine: Mining Tools for Large Graphs Deepayan Chakrabarti Yiping Zhan Daniel Blandford Christos Faloutsos Guy Blelloch.
Social Networks and Graph Mining Christos Faloutsos CMU - MLD.
CMU SCS KDD 2006Leskovec & Faloutsos1 ??. CMU SCS KDD 2006Leskovec & Faloutsos2 Sampling from Large Graphs poster# 305 Jurij (Jure) Leskovec Christos.
CMU SCS Mining Large Graphs Christos Faloutsos CMU.
Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun.
CMU SCS APWeb 07(c) 2007, C. Faloutsos 1 Copyright notice Copyright (c) 2007, Christos Faloutsos - all rights preserved. Permission to use all or some.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
Web as Graph – Empirical Studies The Structure and Dynamics of Networks.
CMU SCS Data Mining Meets Systems: Tools and Case Studies Christos Faloutsos SCS CMU.
CMU SCS Graph mining: Patterns, Generators and Tools Christos Faloutsos CMU.
CS Lecture 6 Generative Graph Models Part II.
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
On Power-Law Relationships of the Internet Topology CSCI 780, Fall 2005.
Analysis of the Internet Topology Michalis Faloutsos, U.C. Riverside (PI) Christos Faloutsos, CMU (sub- contract, co-PI) DARPA NMS, no
CMU SCS Graph and stream mining Christos Faloutsos CMU.
CMU SCS Graph Mining and Influence Propagation Christos Faloutsos CMU.
CMU SCS Multimedia and Graph mining Christos Faloutsos CMU.
CMU SCS Graph Mining: Laws, Generators and Tools Christos Faloutsos CMU.
Data Mining using Fractals and Power laws
CMU SCS Data Mining in Streams and Graphs Christos Faloutsos CMU.
Measurement and Evolution of Online Social Networks Review of paper by Ophir Gaathon Analysis of Social Information Networks COMS , Spring 2011,
CMU SCS Big (graph) data analytics Christos Faloutsos CMU.
CMU SCS Graph Mining: Laws, Generators and Tools Christos Faloutsos CMU.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P0-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
InteMon: Intelligent monitoring system for large clusters Evan Hoke, Jimeng Sun and Christos Faloutsos.
CMU SCS Mining Billion-node Graphs: Patterns, Generators and Tools Christos Faloutsos CMU.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
CMU SCS Big (graph) data analytics Christos Faloutsos CMU.
CMU SCS Mining Billion Node Graphs Christos Faloutsos CMU.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
School of Computer Science Carnegie Mellon Data Mining using Fractals (fractals for fun and profit) Christos Faloutsos Carnegie Mellon University.
School of Computer Science Carnegie Mellon Data Mining using Fractals and Power laws Christos Faloutsos Carnegie Mellon University.
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 5: Graphs over time & tensors Faloutsos,
CMU SCS Finding patterns in large, real networks Christos Faloutsos CMU.
R-MAT: A Recursive Model for Graph Mining Deepayan Chakrabarti Yiping Zhan Christos Faloutsos.
CMU SCS Graph Mining: patterns and tools for static and time-evolving graphs Christos Faloutsos CMU.
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.
CMU SCS Graph Mining Christos Faloutsos CMU. CMU SCS iCAST, Jan. 09C. Faloutsos 2 Thank you! Prof. Hsing-Kuo Kenneth Pao Eric, Morgan, Ian, Teenet.
CMU SCS Mining Large Social Networks: Patterns and Anomalies Christos Faloutsos CMU.
CMU SCS Graph Mining: Laws, Generators and Tools Christos Faloutsos CMU.
CMU SCS Graph Mining: Laws, Generators and Tools Christos Faloutsos CMU.
1 Patterns of Cascading Behavior in Large Blog Graphs Jure Leskoves, Mary McGlohon, Christos Faloutsos, Natalie Glance, Matthew Hurst SDM 2007 Date:2008/8/21.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P9-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
CMU SCS Panel: Social Networks Christos Faloutsos CMU.
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P8-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 8: hadoop and Tera/Peta byte graphs.
Graph Models Class Algorithmic Methods of Data Mining
Large Graph Mining: Power Tools and a Practitioner’s guide
NetMine: Mining Tools for Large Graphs
Finding patterns in large, real networks
Part 1: Graph Mining – patterns
Lecture 13 Network evolution
R-MAT: A Recursive Model for Graph Mining
Graph and Tensor Mining for fun and profit
Graph and Tensor Mining for fun and profit
Christos Faloutsos CMU
Graph and Tensor Mining for fun and profit
Lecture 21 Network evolution
Modelling and Searching Networks Lecture 2 – Complex Networks
Presentation transcript:

CMU SCS Bio-informatics, Graph and Stream mining Christos Faloutsos CMU

CMU SCS IC '07C. Faloutsos2 CONGRATULATIONS!

CMU SCS IC '07C. Faloutsos3 Outline Problem definition / Motivation Biological image mining Graphs and power laws Streams and forecasting [Scalability: Gb and Tb of data…] Conclusions

CMU SCS IC '07C. Faloutsos4 Motivation Data mining: ~ find patterns (rules, outliers) Trends in fly embryos gene expressions? How do real graphs look like? How do (numerical) streams look like?

CMU SCS IC '07C. Faloutsos5 FEMine: Mining Fly Embryos

CMU SCS IC '07C. Faloutsos6 Problem Given –fly embryo images –and their labels (eg., 1h, 2h, radiated, etc) Find patterns and trends, e.g., Ultimate goal: how genes affect each other.

CMU SCS IC '07C. Faloutsos7 With Eric Xing (CMU CS) Bob Murphy (CMU – Bio) Tim Pan (CMU -> Google) Andre Balan (U. Sao Paulo)

CMU SCS IC '07C. Faloutsos8 Outline Problem definition / Motivation Biological image mining Graphs and power laws Streams and forecasting [Scalability: Gb and Tb of data…] Conclusions

CMU SCS IC '07C. Faloutsos9 Graphs - why should we care?

CMU SCS IC '07C. Faloutsos10 Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] Protein Interactions [genomebiology.com] Friendship Network [Moody ’01]

CMU SCS IC '07C. Faloutsos11 Joint work with Dr. Deepayan Chakrabarti (CMU/Yahoo R.L.)

CMU SCS IC '07C. Faloutsos12 Problem: network and graph mining How does the Internet look like? How does the web look like? What constitutes a ‘normal’ social network? What is ‘normal’/‘abnormal’? which patterns/laws hold?

CMU SCS IC '07C. Faloutsos13 Graph mining Are real graphs random?

CMU SCS IC '07C. Faloutsos14 Laws and patterns NO!! Diameter in- and out- degree distributions other (surprising) patterns

CMU SCS IC '07C. Faloutsos15 Laws – degree distributions Q: avg degree is ~3 - what is the most probable degree? degree count ?? 3

CMU SCS IC '07C. Faloutsos16 Laws – degree distributions Q: avg degree is ~3 - what is the most probable degree? degree count ?? 3 count 3

CMU SCS IC '07C. Faloutsos17 Solution: The plot is linear in log-log scale [FFF’99] freq = degree (-2.15) O = Exponent = slope Outdegree Frequency Nov’

CMU SCS IC '07C. Faloutsos18 But: Q1: How about graphs from other domains? Q2: How about temporal evolution?

CMU SCS IC '07C. Faloutsos19 The Peer-to-Peer Topology Frequency versus degree Number of adjacent peers follows a power-law [Jovanovic+]

CMU SCS IC '07C. Faloutsos20 More power laws: citation counts: (citeseer.nj.nec.com 6/2001) log(#citations) log(count) Ullman

CMU SCS IC '07C. Faloutsos21 More power laws: web hit counts [w/ A. Montgomery] Web Site Traffic log(in-degree) log(count) Zipf users sites ``ebay’’

CMU SCS IC '07C. Faloutsos22 But: Q1: How about graphs from other domains? Q2: How about temporal evolution?

CMU SCS IC '07C. Faloutsos23 Time evolution with Jure Leskovec (CMU) and Jon Kleinberg (Cornell) (‘best paper’ KDD05)

CMU SCS IC '07C. Faloutsos24 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: –diameter ~ O(log N) –diameter ~ O(log log N) What is happening in real data?

CMU SCS IC '07C. Faloutsos25 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: –diameter ~ O(log N) –diameter ~ O(log log N) What is happening in real data? Diameter shrinks over time –As the network grows the distances between nodes slowly decrease

CMU SCS IC '07C. Faloutsos26 Diameter – ArXiv citation graph Citations among physics papers 1992 –2003 One graph per year time [years] diameter

CMU SCS IC '07C. Faloutsos27 Diameter – “Patents” Patent citation network 25 years of data time [years] diameter

CMU SCS IC '07C. Faloutsos28 Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t)

CMU SCS IC '07C. Faloutsos29 Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t) A: over-doubled! –But obeying the ``Densification Power Law’’

CMU SCS IC '07C. Faloutsos30 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) 1.69

CMU SCS IC '07C. Faloutsos31 Densification – Patent Citations Citations among patents granted 1999 –2.9 million nodes –16.5 million edges Each year is a datapoint N(t) E(t) 1.66

CMU SCS IC '07C. Faloutsos32 Outline Problem definition / Motivation Biological image mining Graphs and power laws –Time evolving graphs + tensors Streams and forecasting [Scalability: Gb and Tb of data…] Conclusions

CMU SCS IC '07C. Faloutsos33 Tensors for time evolving graphs [Jimeng Sun+ KDD’06] [ “, SMD’07] [ CF, Kolda, Sun, SDM’07 tutorial]

CMU SCS IC '07C. Faloutsos34 Application: Network Anomaly Detection Anomaly detection Data –TCP flows collected at CMU backbone –500GB with compression – –1200 timestamps (hours) –‘Tensor’ destination source

CMU SCS IC '07C. Faloutsos35 with Jimeng Sun Hui Zhang Yinglian Xie (Dave Anderson)

CMU SCS IC '07C. Faloutsos36 destination source Network anomaly detection Identify when and where anomalies occurred. Prominent difference between normal and abnormal ones is mainly due to unusual scanning activity (confirmed by the campus admin). scanners Time (hour) destination source error AbnormalNormal

CMU SCS IC '07C. Faloutsos37 Outline Problem definition / Motivation Biological image mining Graphs and power laws Streams and forecasting [Scalability: Gb and Tb of data…] Conclusions

CMU SCS IC '07C. Faloutsos38 Why care about streams? Sensor devices –Temperature, weather measurements –Road traffic data –Geological observations –Patient physiological data –Chlorine in drinking water (*) Embedded devices –Network routers

CMU SCS IC '07C. Faloutsos39 SPIRIT / InteMon on.jsphttp://warsteiner.db.cs.cmu.edu/demo/intem on.jsp self-* storage system (PDL/CMU) with Jimeng Sun (CMU/CS) Evan Hoke (CMU/CS-ug) Prof. Greg Ganger (CMU/CS/ECE) John Strunk (CMU/ECE)

CMU SCS IC '07C. Faloutsos40 self-* CMU >200 nodes 40 racks of computing equipment 774kw of power. target: 1 PetaByte goal: self-correcting, self- securing, self-monitoring, self-...

CMU SCS IC '07C. Faloutsos41 System Architecture

CMU SCS IC '07C. Faloutsos42 Data center room monitoring Abnormal dehumidification and reheating cycle is identified Temperature Humidity

CMU SCS IC '07C. Faloutsos43 Outline Problem definition / Motivation Biological image mining Graphs and power laws Streams and forecasting [Scalability: Gb and Tb of data…] Conclusions

CMU SCS IC '07C. Faloutsos44 Our emphasis: scalability Gb (->Tb) of data Dream: to exploit ‘map-reduce’/hadoop, By re-designing D.M. algorithms for such an environment, within the D.I.S.C. effort (Data Intensive Scientific Computing)

CMU SCS IC '07C. Faloutsos45 D.I.S.C. Randy Bryant (SCS Dean) Dave O’Hallaron (dir. of INTEL Pittsburgh) Garth Gibson Greg Ganger ++ INTEL: 15 nodes; Yahoo: ~100 nodes, running hadoop

CMU SCS IC '07C. Faloutsos46 DM for Tera- and Peta-bytes Two-way street: <- DM can use such infrastructures to find patterns -> DM can help such infrastructures become self-healing, self-adjusting, ‘self-*’

CMU SCS IC '07C. Faloutsos47 Conclusions Biological images, graphs & streams pose fascinating problems self-similarity, fractals and power laws work, when other methods fail! SCALABILITY: creates fascinating research problems!

CMU SCS IC '07C. Faloutsos48 Contact info Wean Hall 7107 Ph#: x and, again WELCOME!

CMU SCS IC '07C. Faloutsos49

CMU SCS IC '07C. Faloutsos50

CMU SCS IC '07C. Faloutsos51 A famous power law: Zipf’s law Bible - rank vs frequency (log-log) similarly, in many other languages; for customers and sales volume; city populations etc etc log(rank) log(freq)

CMU SCS IC '07C. Faloutsos52 Olympic medals (Sidney’00, Athens’04): log( rank) log(#medals)

CMU SCS IC '07C. Faloutsos53

CMU SCS IC '07C. Faloutsos54 Diameter – “Autonomous Systems” Graph of Internet One graph per day 1997 – 2000 number of nodes diameter

CMU SCS IC '07C. Faloutsos55 Diameter – “Affiliation Network” Graph of collaborations in physics – authors linked to papers 10 years of data time [years] diameter

CMU SCS IC '07C. Faloutsos56 Densification – Autonomous Systems Graph of Internet 2000 –6,000 nodes –26,000 edges One graph per day N(t) E(t) 1.18

CMU SCS IC '07C. Faloutsos57 Densification – Affiliation Network Authors linked to their publications 2002 –60,000 nodes 20,000 authors 38,000 papers –133,000 edges N(t) E(t) 1.15