Social Networks and Graph Mining Christos Faloutsos CMU - MLD
MLD-AB '072 Outline Problem definition / Motivation Graphs and power laws [Virus propagation] [e-bay fraud detection] Conclusions
MLD-AB '073 Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do viruses propagate? Problem#3: How to spot fraudsters in e-bay?
MLD-AB '074 Problem#1: Joint work with Dr. Deepayan Chakrabarti (CMU/Yahoo R.L.)
MLD-AB '075 Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] Protein Interactions [genomebiology.com] Friendship Network [Moody ’01]
MLD-AB '076 Graphs - why should we care? network of companies & board-of-directors members ‘viral’ marketing web-log (‘blog’) news propagation computer network security: /IP traffic and anomaly detection....
MLD-AB '077 Problem #1 - network and graph mining How does the Internet look like? How does the web look like? What constitutes a ‘normal’ social network? What is ‘normal’/‘abnormal’? which patterns/laws hold?
MLD-AB '078 Graph mining Are real graphs random?
MLD-AB '079 Laws and patterns NO!! Diameter in- and out- degree distributions other (surprising) patterns
MLD-AB '0710 Solution Power law in the degree distribution [SIGCOMM99] log(rank) log(degree) internet domains att.com ibm.com
MLD-AB '0711 But: Q1: How about graphs from other domains? Q2: How about temporal evolution?
MLD-AB '0712 The Peer-to-Peer Topology Frequency versus degree Number of adjacent peers follows a power-law [Jovanovic+]
MLD-AB '0713 More power laws: citation counts: (citeseer.nj.nec.com 6/2001) log(#citations) log(count) Ullman
MLD-AB '0714 Swedish sex-web Nodes: people (Females; Males) Links: sexual relationships Liljeros et al. Nature Swedes; 18-74; 59% response rate. Albert Laszlo Barabasi Publication%20Categories/ 04%20Talks/2005-norway- 3hours.ppt
MLD-AB '0715 More power laws: web hit counts [w/ A. Montgomery] Web Site Traffic log(in-degree) log(count) Zipf users sites ``ebay’’
MLD-AB '0716 epinions.com who-trusts-whom [Richardson + Domingos, KDD 2001] (out) degree count trusts-2000-people user
MLD-AB '0717 But: Q1: How about graphs from other domains? Q2: How about temporal evolution?
MLD-AB '0718 Time evolution with Jure Leskovec (CMU/MLD) and Jon Kleinberg (Cornell – CMU)
MLD-AB '0719 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: –diameter ~ O(log N) –diameter ~ O(log log N) What is happening in real data?
MLD-AB '0720 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: –diameter ~ O(log N) –diameter ~ O(log log N) What is happening in real data? Diameter shrinks over time –As the network grows the distances between nodes slowly decrease
MLD-AB '0721 Diameter – ArXiv citation graph Citations among physics papers 1992 –2003 One graph per year time [years] diameter
MLD-AB '0722 Diameter – “Autonomous Systems” Graph of Internet One graph per day 1997 – 2000 number of nodes diameter
MLD-AB '0723 Diameter – “Affiliation Network” Graph of collaborations in physics – authors linked to papers 10 years of data time [years] diameter
MLD-AB '0724 Diameter – “Patents” Patent citation network 25 years of data time [years] diameter
MLD-AB '0725 Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t)
MLD-AB '0726 Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t) A: over-doubled! –But obeying the ``Densification Power Law’’
MLD-AB '0727 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) ??
MLD-AB '0728 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) 1.69
MLD-AB '0729 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) : tree
MLD-AB '0730 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) 1.69 clique: 2
MLD-AB '0731 Densification – Patent Citations Citations among patents granted 1999 –2.9 million nodes –16.5 million edges Each year is a datapoint N(t) E(t) 1.66
MLD-AB '0732 Densification – Autonomous Systems Graph of Internet 2000 –6,000 nodes –26,000 edges One graph per day N(t) E(t) 1.18
MLD-AB '0733 Densification – Affiliation Network Authors linked to their publications 2002 –60,000 nodes 20,000 authors 38,000 papers –133,000 edges N(t) E(t) 1.15
MLD-AB '0734 Outline Problem definition / Motivation Graphs and power laws [Virus propagation] [e-bay fraud detection] Conclusions
MLD-AB '0735 Virus propagation How do viruses/rumors propagate? Will a flu-like virus linger, or will it become extinct soon?
MLD-AB '0736 The model: SIS ‘Flu’ like: Susceptible-Infected-Susceptible Virus ‘strength’ s= / Infected Healthy NN1 N3 N2 Prob. Prob. β Prob.
MLD-AB '0737 Epidemic threshold of a graph: the value of , such that if strength s = / < an epidemic can not happen Thus, given a graph compute its epidemic threshold
MLD-AB '0738 Epidemic threshold What should depend on? avg. degree? and/or highest degree? and/or variance of degree? and/or third moment of degree? and/or diameter?
MLD-AB '0739 Epidemic threshold [Theorem] We have no epidemic, if β/δ <τ = 1/ λ 1,A
MLD-AB '0740 Epidemic threshold [Theorem] We have no epidemic, if β/δ <τ = 1/ λ 1,A largest eigenvalue of adj. matrix A attack prob. recovery prob. epidemic threshold Proof: [Wang+03]
MLD-AB '0741 Experiments (Oregon) / > τ (above threshold) / = τ (at the threshold) / < τ (below threshold)
MLD-AB '0742 Outline Problem definition / Motivation Graphs and power laws [Virus propagation] [e-bay fraud detection] Conclusions
MLD-AB '0743 E-bay Fraud detection w/ Polo Chau, CMU
MLD-AB '0744 E-bay Fraud detection - NetProbe
MLD-AB '0745 Conclusions Graphs pose fascinating problems self-similarity/fractals and power laws work, when textbook methods fail! Need: ML/AI, Stat, NA, DB (Gb/Tb), Systems (Networks+), sociology, ++…
MLD-AB '0746 Contact info