Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMU SCS Mining Large Social Networks: Patterns and Anomalies Christos Faloutsos CMU.

Similar presentations


Presentation on theme: "CMU SCS Mining Large Social Networks: Patterns and Anomalies Christos Faloutsos CMU."— Presentation transcript:

1 CMU SCS Mining Large Social Networks: Patterns and Anomalies Christos Faloutsos CMU

2 CMU SCS Thank you The Department of Informatics Happy 20-th! Prof. Yannis Manolopoulos Prof. Kostas Tsichlas Mrs. Nina Daltsidou AUTH, May 30, 2012C. Faloutsos (CMU) 2

3 CMU SCS International-caliber friends among AUTH alumni Prof. Evimaria Terzi (U. Boston) Prof. Kyriakos Mouratidis (SMU) Dr. Michalis Vlachos (IBM) … AUTH, May 30, 2012C. Faloutsos (CMU) 3

4 CMU SCS C. Faloutsos (CMU) 4 Outline Introduction – Motivation Problem#1: Patterns in graphs Problem#2: Tools Problem#3: Scalability Conclusions AUTH, May 30, 2012

5 CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] AUTH, May 30, 2012 $10s of BILLIONS revenue >500M users

6 CMU SCS C. Faloutsos (CMU) 6 Graphs - why should we care? IR: bi-partite graphs (doc-terms) web: hyper-text graph... and more: D1D1 DNDN T1T1 TMTM... AUTH, May 30, 2012

7 CMU SCS C. Faloutsos (CMU) 7 Graphs - why should we care? web-log (‘blog’) news propagation computer network security: email/IP traffic and anomaly detection.... [subject-verb-object:  graph] Graph == relational table with 2 columns (src, dst) BIG DATA – big graphs AUTH, May 30, 2012

8 CMU SCS C. Faloutsos (CMU) 8 Outline Introduction – Motivation Problem#1: Patterns in graphs –Static graphs –Weighted graphs –Time evolving graphs Problem#2: Tools Problem#3: Scalability Conclusions AUTH, May 30, 2012

9 CMU SCS C. Faloutsos (CMU) 9 Problem #1 - network and graph mining What does the Internet look like? What does FaceBook look like? What is ‘normal’/‘abnormal’? which patterns/laws hold? AUTH, May 30, 2012

10 CMU SCS C. Faloutsos (CMU) 10 Graph mining Are real graphs random? AUTH, May 30, 2012

11 CMU SCS C. Faloutsos (CMU) 11 Laws and patterns Are real graphs random? A: NO!! –Diameter –in- and out- degree distributions –other (surprising) patterns So, let’s look at the data AUTH, May 30, 2012

12 CMU SCS C. Faloutsos (CMU) 12 Solution# S.1 Power law in the degree distribution [SIGCOMM99] log(rank) log(degree) internet domains att.com ibm.com AUTH, May 30, 2012

13 CMU SCS C. Faloutsos (CMU) 13 Solution# S.1 Power law in the degree distribution [SIGCOMM99] log(rank) log(degree) -0.82 internet domains att.com ibm.com AUTH, May 30, 2012

14 CMU SCS C. Faloutsos (CMU) 14 But: How about graphs from other domains? AUTH, May 30, 2012

15 CMU SCS C. Faloutsos (CMU) 15 More power laws: web hit counts [w/ A. Montgomery] Web Site Traffic in-degree (log scale) Count (log scale) Zipf users sites ``ebay’’ AUTH, May 30, 2012

16 CMU SCS And numerous more Who-trusts-whom (epinions.com) Income [Pareto] –’80-20 distribution’ Duration of downloads [Bestavros+] Duration of UNIX jobs (‘mice and elephants’) Size of files of a user … ‘Black swans’ AUTH, May 30, 2012C. Faloutsos (CMU) 16

17 CMU SCS C. Faloutsos (CMU) 17 Outline Introduction – Motivation Problem#1: Patterns in graphs –Static graphs degree, diameter, eigen, Triangles –Time evolving graphs Problem#2: Tools AUTH, May 30, 2012

18 CMU SCS C. Faloutsos (CMU) 18 Solution# S.3: Triangle ‘Laws’ Real social networks have a lot of triangles AUTH, May 30, 2012

19 CMU SCS C. Faloutsos (CMU) 19 Solution# S.3: Triangle ‘Laws’ Real social networks have a lot of triangles –Friends of friends are friends Any patterns? AUTH, May 30, 2012

20 CMU SCS C. Faloutsos (CMU) 20 Triangle Law: #S.3 [Tsourakakis ICDM 2008] SNReuters Epinions X-axis: degree Y-axis: mean # triangles n friends -> ~n 1.6 triangles AUTH, May 30, 2012

21 CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 21 AUTH, May 30, 2012 21 C. Faloutsos (CMU)

22 CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 22 AUTH, May 30, 2012 22 C. Faloutsos (CMU)

23 CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 23 AUTH, May 30, 2012 23 C. Faloutsos (CMU)

24 CMU SCS C. Faloutsos (CMU) 24 Outline Introduction – Motivation Problem#1: Patterns in graphs –Static graphs –Time evolving graphs Problem#2: Tools … AUTH, May 30, 2012

25 CMU SCS C. Faloutsos (CMU) 25 Problem: Time evolution with Jure Leskovec (CMU -> Stanford) and Jon Kleinberg (Cornell – sabb. @ CMU) AUTH, May 30, 2012

26 CMU SCS C. Faloutsos (CMU) 26 T.1 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: –diameter ~ O(log N) –diameter ~ O(log log N) What is happening in real data? AUTH, May 30, 2012

27 CMU SCS C. Faloutsos (CMU) 27 T.1 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: –diameter ~ O(log N) –diameter ~ O(log log N) What is happening in real data? Diameter shrinks over time AUTH, May 30, 2012

28 CMU SCS C. Faloutsos (CMU) 28 T.1 Diameter – “Patents” Patent citation network 25 years of data @1999 –2.9 M nodes –16.5 M edges time [years] diameter AUTH, May 30, 2012

29 CMU SCS C. Faloutsos (CMU) 29 Outline Introduction – Motivation Problem#1: Patterns in graphs Problem#2: Tools –Belief Propagation Problem#3: Scalability Conclusions AUTH, May 30, 2012

30 CMU SCS AUTH, May 30, 2012C. Faloutsos (CMU) 30 E-bay Fraud detection w/ Polo Chau & Shashank Pandit, CMU [www’07]

31 CMU SCS AUTH, May 30, 2012C. Faloutsos (CMU) 31 E-bay Fraud detection

32 CMU SCS AUTH, May 30, 2012C. Faloutsos (CMU) 32 E-bay Fraud detection

33 CMU SCS AUTH, May 30, 2012C. Faloutsos (CMU) 33 E-bay Fraud detection - NetProbe

34 CMU SCS Popular press And less desirable attention: E-mail from ‘Belgium police’ (‘copy of your code?’) AUTH, May 30, 2012C. Faloutsos (CMU) 34

35 CMU SCS C. Faloutsos (CMU) 35 Outline Introduction – Motivation Problem#1: Patterns in graphs Problem#2: Tools Problem#3: Scalability -PEGASUS Conclusions AUTH, May 30, 2012

36 CMU SCS AUTH, May 30, 2012C. Faloutsos (CMU) 36 Scalability Google: > 450,000 processors in clusters of ~2000 processors each [ Barroso, Dean, Hölzle, “Web Search for a Planet: The Google Cluster Architecture” IEEE Micro 2003 ] Yahoo: 5Pb of data [Fayyad, KDD’07] Problem: machine failures, on a daily basis How to parallelize data mining tasks, then? A: map/reduce – hadoop (open-source clone) http://hadoop.apache.org/ http://hadoop.apache.org/

37 CMU SCS C. Faloutsos (CMU) 37 Outline Introduction – Motivation Problem#1: Patterns in graphs Problem#2: Tools Problem#3: Scalability –PEGASUS –Radius plot Conclusions AUTH, May 30, 2012

38 CMU SCS HADI for diameter estimation Radius Plots for Mining Tera-byte Scale Graphs U Kang, Charalampos Tsourakakis, Ana Paula Appel, Christos Faloutsos, Jure Leskovec, SDM’10 Naively: diameter needs O(N**2) space and up to O(N**3) time – prohibitive (N~1B) Our HADI: linear on E (~10B) –Near-linear scalability wrt # machines –Several optimizations -> 5x faster C. Faloutsos (CMU) 38 AUTH, May 30, 2012

39 CMU SCS ???? 19+ [Barabasi+] 39 C. Faloutsos (CMU) Radius Count AUTH, May 30, 2012 ~1999, ~1M nodes

40 CMU SCS YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) Largest publicly available graph ever studied. ???? 19+ [Barabasi+] 40 C. Faloutsos (CMU) Radius Count AUTH, May 30, 2012 ?? ~1999, ~1M nodes

41 CMU SCS YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) Largest publicly available graph ever studied. ???? 19+? [Barabasi+] 41 C. Faloutsos (CMU) Radius Count AUTH, May 30, 2012 14 (dir.) ~7 (undir.)

42 CMU SCS YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) 7 degrees of separation (!) Diameter: shrunk ???? 19+? [Barabasi+] 42 C. Faloutsos (CMU) Radius Count AUTH, May 30, 2012 14 (dir.) ~7 (undir.)

43 CMU SCS YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) Q: Shape? ???? 43 C. Faloutsos (CMU) Radius Count AUTH, May 30, 2012 ~7 (undir.)

44 CMU SCS 44 C. Faloutsos (CMU) YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) effective diameter: surprisingly small. Multi-modality (?!) AUTH, May 30, 2012

45 CMU SCS Radius Plot of GCC of YahooWeb. 45 C. Faloutsos (CMU)AUTH, May 30, 2012

46 CMU SCS 46 C. Faloutsos (CMU) YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) effective diameter: surprisingly small. Multi-modality: probably mixture of cores. AUTH, May 30, 2012

47 CMU SCS 47 C. Faloutsos (CMU) YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) effective diameter: surprisingly small. Multi-modality: probably mixture of cores. AUTH, May 30, 2012 EN ~7 Conjecture: DE BR

48 CMU SCS 48 C. Faloutsos (CMU) YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) effective diameter: surprisingly small. Multi-modality: probably mixture of cores. AUTH, May 30, 2012 ~7 Conjecture:

49 CMU SCS C. Faloutsos (CMU) 49 Outline Introduction – Motivation Problem#1: Patterns in graphs Problem#2: Tools Problem#3: Scalability Conclusions AUTH, May 30, 2012

50 CMU SCS C. Faloutsos (CMU) 50 OVERALL CONCLUSIONS – low level: Several new patterns (shrinking diameters, triangle-laws, etc) New tools: –Fraud detection (belief propagation) Scalability: PEGASUS / hadoop AUTH, May 30, 2012

51 CMU SCS C. Faloutsos (CMU) 51 OVERALL CONCLUSIONS – medium-level BIG DATA: Large datasets reveal patterns/outliers that are invisible otherwise AUTH, May 30, 2012

52 CMU SCS C. Faloutsos (CMU) 52 Project info Akoglu, Leman Chau, Polo Kang, U McGlohon, Mary Tong, Hanghang Prakash, Aditya AUTH, May 30, 2012 Thanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC ; Yahoo (M45), LLNL, IBM, SPRINT, Google, INTEL, HP, iLab www.cs.cmu.edu/~pegasus Koutra, Danae

53 CMU SCS Thank you for the honor! Congratulations for 20-th anniversary and… AUTH, May 30, 2012C. Faloutsos (CMU) 53

54 CMU SCS High-level conclusion: Collaborations Sociology + CS (triangles) Civil engineering + CS (sensor placement) fMRI/medical + graphs (medical db’s) … AUTH, May 30, 2012C. Faloutsos (CMU) 54

55 CMU SCS Never stop learning AUTH, May 30, 2012C. Faloutsos (CMU) 55 Socrates Plato Aristotle 


Download ppt "CMU SCS Mining Large Social Networks: Patterns and Anomalies Christos Faloutsos CMU."

Similar presentations


Ads by Google