Social Networks and Graph Mining Christos Faloutsos CMU - MLD.

Slides:



Advertisements
Similar presentations
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.
Advertisements

1 Dynamics of Real-world Networks Jure Leskovec Machine Learning Department Carnegie Mellon University
Jurij Leskovec, CMU Jon Kleinberg, Cornell Christos Faloutsos, CMU
1 Realistic Graph Generation and Evolution Using Kronecker Multiplication Jurij Leskovec, CMU Deepay Chakrabarti, CMU/Yahoo Jon Kleinberg, Cornell Christos.
CMU SCS : Multimedia Databases and Data Mining Lecture #26: Graph mining - patterns Christos Faloutsos.
School of Computer Science Carnegie Mellon Sensor and Graph Mining Christos Faloutsos Carnegie Mellon University & IBM
Lecture 21 Network evolution Slides are modified from Jurij Leskovec, Jon Kleinberg and Christos Faloutsos.
CS728 Lecture 5 Generative Graph Models and the Web.
Exp. vs. Scale-Free Poisson distribution Exponential Network Power-law distribution Scale-free Network.
The structure of the Internet. How are routers connected? Why should we care? –While communication protocols will work correctly on ANY topology –….they.
CMU SCS C. Faloutsos (CMU)#1 Large Graph Algorithms Christos Faloutsos CMU McGlohon, Mary Prakash, Aditya Tong, Hanghang Tsourakakis, Babis Akoglu, Leman.
NetMine: Mining Tools for Large Graphs Deepayan Chakrabarti Yiping Zhan Daniel Blandford Christos Faloutsos Guy Blelloch.
CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU.
1 Epidemic Spreading in Real Networks: an Eigenvalue Viewpoint Yang Wang Deepayan Chakrabarti Chenxi Wang Christos Faloutsos.
CMU SCS Mining Large Graphs Christos Faloutsos CMU.
CMU SCS APWeb 07(c) 2007, C. Faloutsos 1 Copyright notice Copyright (c) 2007, Christos Faloutsos - all rights preserved. Permission to use all or some.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
Mapping the Internet Topology Via Multiple Agents.
CMU SCS Graph mining: Patterns, Generators and Tools Christos Faloutsos CMU.
Analysis of the Internet Topology Michalis Faloutsos, U.C. Riverside (PI) Christos Faloutsos, CMU (sub- contract, co-PI) DARPA NMS, no
CMU SCS Bio-informatics, Graph and Stream mining Christos Faloutsos CMU.
CMU SCS Graph and stream mining Christos Faloutsos CMU.
CMU SCS Graph Mining and Influence Propagation Christos Faloutsos CMU.
The structure of the Internet. How are routers connected? Why should we care? –While communication protocols will work correctly on ANY topology –….they.
The structure of the Internet. The Internet as a graph Remember: the Internet is a collection of networks called autonomous systems (ASs) The Internet.
CMU SCS Multimedia and Graph mining Christos Faloutsos CMU.
CMU SCS Graph Mining: Laws, Generators and Tools Christos Faloutsos CMU.
CMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos Faloutsos CMU.
CMU SCS : Multimedia Databases and Data Mining Lecture #28: Graph mining - patterns Christos Faloutsos.
CMU SCS Data Mining in Streams and Graphs Christos Faloutsos CMU.
Measurement and Evolution of Online Social Networks Review of paper by Ophir Gaathon Analysis of Social Information Networks COMS , Spring 2011,
CMU SCS Large Graph Mining - Patterns, Tools and Cascade Analysis Christos Faloutsos CMU.
CMU SCS Big (graph) data analytics Christos Faloutsos CMU.
CMU SCS Graph Mining: Laws, Generators and Tools Christos Faloutsos CMU.
Weighted Graphs and Disconnected Components Patterns and a Generator IDB Lab 현근수 In KDD 08. Mary McGlohon, Leman Akoglu, Christos Faloutsos.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P0-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
CMU SCS Mining Billion-node Graphs: Patterns, Generators and Tools Christos Faloutsos CMU.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
CMU SCS Mining Billion-Node Graphs Christos Faloutsos CMU.
CMU SCS Graph Mining - surprising patterns in real graphs Christos Faloutsos CMU.
CMU SCS Mining Billion Node Graphs Christos Faloutsos CMU.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
Winner-takes-all: Competing Viruses or Ideas on fair-play Networks B. Aditya Prakash, Alex Beutel, Roni Rosenfeld, Christos Faloutsos Carnegie Mellon University,
On-line Social Networks - Anthony Bonato 1 Dynamic Models of On-Line Social Networks Anthony Bonato Ryerson University WAW’2009 February 13, 2009 nt.
Carnegie Mellon Finding patterns in large, real networks Christos Faloutsos CMU
CMU SCS Finding patterns in large, real networks Christos Faloutsos CMU.
R-MAT: A Recursive Model for Graph Mining Deepayan Chakrabarti Yiping Zhan Christos Faloutsos.
CMU SCS Graph Mining: patterns and tools for static and time-evolving graphs Christos Faloutsos CMU.
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.
CMU SCS Graph Mining Christos Faloutsos CMU. CMU SCS iCAST, Jan. 09C. Faloutsos 2 Thank you! Prof. Hsing-Kuo Kenneth Pao Eric, Morgan, Ian, Teenet.
CMU SCS Mining Large Social Networks: Patterns and Anomalies Christos Faloutsos CMU.
CMU SCS Graph Mining: Laws, Generators and Tools Christos Faloutsos CMU.
CMU SCS Graph Mining: Laws, Generators and Tools Christos Faloutsos CMU.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P9-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
Abstract Networks. WWW (2000) Scientific Collaboration Girvan & Newman (2002)
Graph Models Class Algorithmic Methods of Data Mining
NetMine: Mining Tools for Large Graphs
Finding patterns in large, real networks
Part 1: Graph Mining – patterns
Lecture 13 Network evolution
15-826: Multimedia Databases and Data Mining
R-MAT: A Recursive Model for Graph Mining
Graph and Tensor Mining for fun and profit
Graph and Tensor Mining for fun and profit
Algorithms for Large Graph Mining
Large Graph Mining: Power Tools and a Practitioner’s guide
Lecture 21 Network evolution
Modelling and Searching Networks Lecture 2 – Complex Networks
Presentation transcript:

Social Networks and Graph Mining Christos Faloutsos CMU - MLD

MLD-AB '072 Outline Problem definition / Motivation Graphs and power laws [Virus propagation] [e-bay fraud detection] Conclusions

MLD-AB '073 Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do viruses propagate? Problem#3: How to spot fraudsters in e-bay?

MLD-AB '074 Problem#1: Joint work with Dr. Deepayan Chakrabarti (CMU/Yahoo R.L.)

MLD-AB '075 Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] Protein Interactions [genomebiology.com] Friendship Network [Moody ’01]

MLD-AB '076 Graphs - why should we care? network of companies & board-of-directors members ‘viral’ marketing web-log (‘blog’) news propagation computer network security: /IP traffic and anomaly detection....

MLD-AB '077 Problem #1 - network and graph mining How does the Internet look like? How does the web look like? What constitutes a ‘normal’ social network? What is ‘normal’/‘abnormal’? which patterns/laws hold?

MLD-AB '078 Graph mining Are real graphs random?

MLD-AB '079 Laws and patterns NO!! Diameter in- and out- degree distributions other (surprising) patterns

MLD-AB '0710 Solution Power law in the degree distribution [SIGCOMM99] log(rank) log(degree) internet domains att.com ibm.com

MLD-AB '0711 But: Q1: How about graphs from other domains? Q2: How about temporal evolution?

MLD-AB '0712 The Peer-to-Peer Topology Frequency versus degree Number of adjacent peers follows a power-law [Jovanovic+]

MLD-AB '0713 More power laws: citation counts: (citeseer.nj.nec.com 6/2001) log(#citations) log(count) Ullman

MLD-AB '0714 Swedish sex-web Nodes: people (Females; Males) Links: sexual relationships Liljeros et al. Nature Swedes; 18-74; 59% response rate. Albert Laszlo Barabasi Publication%20Categories/ 04%20Talks/2005-norway- 3hours.ppt

MLD-AB '0715 More power laws: web hit counts [w/ A. Montgomery] Web Site Traffic log(in-degree) log(count) Zipf users sites ``ebay’’

MLD-AB '0716 epinions.com who-trusts-whom [Richardson + Domingos, KDD 2001] (out) degree count trusts-2000-people user

MLD-AB '0717 But: Q1: How about graphs from other domains? Q2: How about temporal evolution?

MLD-AB '0718 Time evolution with Jure Leskovec (CMU/MLD) and Jon Kleinberg (Cornell – CMU)

MLD-AB '0719 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: –diameter ~ O(log N) –diameter ~ O(log log N) What is happening in real data?

MLD-AB '0720 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: –diameter ~ O(log N) –diameter ~ O(log log N) What is happening in real data? Diameter shrinks over time –As the network grows the distances between nodes slowly decrease

MLD-AB '0721 Diameter – ArXiv citation graph Citations among physics papers 1992 –2003 One graph per year time [years] diameter

MLD-AB '0722 Diameter – “Autonomous Systems” Graph of Internet One graph per day 1997 – 2000 number of nodes diameter

MLD-AB '0723 Diameter – “Affiliation Network” Graph of collaborations in physics – authors linked to papers 10 years of data time [years] diameter

MLD-AB '0724 Diameter – “Patents” Patent citation network 25 years of data time [years] diameter

MLD-AB '0725 Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t)

MLD-AB '0726 Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t) A: over-doubled! –But obeying the ``Densification Power Law’’

MLD-AB '0727 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) ??

MLD-AB '0728 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) 1.69

MLD-AB '0729 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) : tree

MLD-AB '0730 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) 1.69 clique: 2

MLD-AB '0731 Densification – Patent Citations Citations among patents granted 1999 –2.9 million nodes –16.5 million edges Each year is a datapoint N(t) E(t) 1.66

MLD-AB '0732 Densification – Autonomous Systems Graph of Internet 2000 –6,000 nodes –26,000 edges One graph per day N(t) E(t) 1.18

MLD-AB '0733 Densification – Affiliation Network Authors linked to their publications 2002 –60,000 nodes 20,000 authors 38,000 papers –133,000 edges N(t) E(t) 1.15

MLD-AB '0734 Outline Problem definition / Motivation Graphs and power laws [Virus propagation] [e-bay fraud detection] Conclusions

MLD-AB '0735 Virus propagation How do viruses/rumors propagate? Will a flu-like virus linger, or will it become extinct soon?

MLD-AB '0736 The model: SIS ‘Flu’ like: Susceptible-Infected-Susceptible Virus ‘strength’ s=  /  Infected Healthy NN1 N3 N2 Prob.  Prob. β Prob. 

MLD-AB '0737 Epidemic threshold  of a graph: the value of , such that if strength s =  /  <  an epidemic can not happen Thus, given a graph compute its epidemic threshold

MLD-AB '0738 Epidemic threshold  What should  depend on? avg. degree? and/or highest degree? and/or variance of degree? and/or third moment of degree? and/or diameter?

MLD-AB '0739 Epidemic threshold [Theorem] We have no epidemic, if β/δ <τ = 1/ λ 1,A

MLD-AB '0740 Epidemic threshold [Theorem] We have no epidemic, if β/δ <τ = 1/ λ 1,A largest eigenvalue of adj. matrix A attack prob. recovery prob. epidemic threshold Proof: [Wang+03]

MLD-AB '0741 Experiments (Oregon)  /  > τ (above threshold)  /  = τ (at the threshold)  /  < τ (below threshold)

MLD-AB '0742 Outline Problem definition / Motivation Graphs and power laws [Virus propagation] [e-bay fraud detection] Conclusions

MLD-AB '0743 E-bay Fraud detection w/ Polo Chau, CMU

MLD-AB '0744 E-bay Fraud detection - NetProbe

MLD-AB '0745 Conclusions Graphs pose fascinating problems self-similarity/fractals and power laws work, when textbook methods fail! Need: ML/AI, Stat, NA, DB (Gb/Tb), Systems (Networks+), sociology, ++…

MLD-AB '0746 Contact info