Presentation is loading. Please wait.

Presentation is loading. Please wait.

Social Networks and Graph Mining Christos Faloutsos CMU - MLD.

Similar presentations


Presentation on theme: "Social Networks and Graph Mining Christos Faloutsos CMU - MLD."— Presentation transcript:

1 Social Networks and Graph Mining Christos Faloutsos CMU - MLD

2 MLD-AB '072 Outline Problem definition / Motivation Graphs and power laws [Virus propagation] [e-bay fraud detection] Conclusions

3 MLD-AB '073 Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do viruses propagate? Problem#3: How to spot fraudsters in e-bay?

4 MLD-AB '074 Problem#1: Joint work with Dr. Deepayan Chakrabarti (CMU/Yahoo R.L.)

5 MLD-AB '075 Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] Protein Interactions [genomebiology.com] Friendship Network [Moody ’01]

6 MLD-AB '076 Graphs - why should we care? network of companies & board-of-directors members ‘viral’ marketing web-log (‘blog’) news propagation computer network security: email/IP traffic and anomaly detection....

7 MLD-AB '077 Problem #1 - network and graph mining How does the Internet look like? How does the web look like? What constitutes a ‘normal’ social network? What is ‘normal’/‘abnormal’? which patterns/laws hold?

8 MLD-AB '078 Graph mining Are real graphs random?

9 MLD-AB '079 Laws and patterns NO!! Diameter in- and out- degree distributions other (surprising) patterns

10 MLD-AB '0710 Solution Power law in the degree distribution [SIGCOMM99] log(rank) log(degree) -0.82 internet domains att.com ibm.com

11 MLD-AB '0711 But: Q1: How about graphs from other domains? Q2: How about temporal evolution?

12 MLD-AB '0712 The Peer-to-Peer Topology Frequency versus degree Number of adjacent peers follows a power-law [Jovanovic+]

13 MLD-AB '0713 More power laws: citation counts: (citeseer.nj.nec.com 6/2001) log(#citations) log(count) Ullman

14 MLD-AB '0714 Swedish sex-web Nodes: people (Females; Males) Links: sexual relationships Liljeros et al. Nature 2001 4781 Swedes; 18-74; 59% response rate. Albert Laszlo Barabasi http://www.nd.edu/~networks/ Publication%20Categories/ 04%20Talks/2005-norway- 3hours.ppt

15 MLD-AB '0715 More power laws: web hit counts [w/ A. Montgomery] Web Site Traffic log(in-degree) log(count) Zipf users sites ``ebay’’

16 MLD-AB '0716 epinions.com who-trusts-whom [Richardson + Domingos, KDD 2001] (out) degree count trusts-2000-people user

17 MLD-AB '0717 But: Q1: How about graphs from other domains? Q2: How about temporal evolution?

18 MLD-AB '0718 Time evolution with Jure Leskovec (CMU/MLD) and Jon Kleinberg (Cornell – sabb. @ CMU)

19 MLD-AB '0719 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: –diameter ~ O(log N) –diameter ~ O(log log N) What is happening in real data?

20 MLD-AB '0720 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: –diameter ~ O(log N) –diameter ~ O(log log N) What is happening in real data? Diameter shrinks over time –As the network grows the distances between nodes slowly decrease

21 MLD-AB '0721 Diameter – ArXiv citation graph Citations among physics papers 1992 –2003 One graph per year time [years] diameter

22 MLD-AB '0722 Diameter – “Autonomous Systems” Graph of Internet One graph per day 1997 – 2000 number of nodes diameter

23 MLD-AB '0723 Diameter – “Affiliation Network” Graph of collaborations in physics – authors linked to papers 10 years of data time [years] diameter

24 MLD-AB '0724 Diameter – “Patents” Patent citation network 25 years of data time [years] diameter

25 MLD-AB '0725 Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t)

26 MLD-AB '0726 Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t) A: over-doubled! –But obeying the ``Densification Power Law’’

27 MLD-AB '0727 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) ??

28 MLD-AB '0728 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) 1.69

29 MLD-AB '0729 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) 1.69 1: tree

30 MLD-AB '0730 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) 1.69 clique: 2

31 MLD-AB '0731 Densification – Patent Citations Citations among patents granted 1999 –2.9 million nodes –16.5 million edges Each year is a datapoint N(t) E(t) 1.66

32 MLD-AB '0732 Densification – Autonomous Systems Graph of Internet 2000 –6,000 nodes –26,000 edges One graph per day N(t) E(t) 1.18

33 MLD-AB '0733 Densification – Affiliation Network Authors linked to their publications 2002 –60,000 nodes 20,000 authors 38,000 papers –133,000 edges N(t) E(t) 1.15

34 MLD-AB '0734 Outline Problem definition / Motivation Graphs and power laws [Virus propagation] [e-bay fraud detection] Conclusions

35 MLD-AB '0735 Virus propagation How do viruses/rumors propagate? Will a flu-like virus linger, or will it become extinct soon?

36 MLD-AB '0736 The model: SIS ‘Flu’ like: Susceptible-Infected-Susceptible Virus ‘strength’ s=  /  Infected Healthy NN1 N3 N2 Prob.  Prob. β Prob. 

37 MLD-AB '0737 Epidemic threshold  of a graph: the value of , such that if strength s =  /  <  an epidemic can not happen Thus, given a graph compute its epidemic threshold

38 MLD-AB '0738 Epidemic threshold  What should  depend on? avg. degree? and/or highest degree? and/or variance of degree? and/or third moment of degree? and/or diameter?

39 MLD-AB '0739 Epidemic threshold [Theorem] We have no epidemic, if β/δ <τ = 1/ λ 1,A

40 MLD-AB '0740 Epidemic threshold [Theorem] We have no epidemic, if β/δ <τ = 1/ λ 1,A largest eigenvalue of adj. matrix A attack prob. recovery prob. epidemic threshold Proof: [Wang+03]

41 MLD-AB '0741 Experiments (Oregon)  /  > τ (above threshold)  /  = τ (at the threshold)  /  < τ (below threshold)

42 MLD-AB '0742 Outline Problem definition / Motivation Graphs and power laws [Virus propagation] [e-bay fraud detection] Conclusions

43 MLD-AB '0743 E-bay Fraud detection w/ Polo Chau, CMU

44 MLD-AB '0744 E-bay Fraud detection - NetProbe

45 MLD-AB '0745 Conclusions Graphs pose fascinating problems self-similarity/fractals and power laws work, when textbook methods fail! Need: ML/AI, Stat, NA, DB (Gb/Tb), Systems (Networks+), sociology, ++…

46 MLD-AB '0746 Contact info christos@cs.cmu.edu www.cs.cmu.edu/~christos


Download ppt "Social Networks and Graph Mining Christos Faloutsos CMU - MLD."

Similar presentations


Ads by Google