CMU SCS Mining Large Social Networks: Patterns and Anomalies Christos Faloutsos CMU.

Slides:



Advertisements
Similar presentations
CMU SCS Large Graph Mining - Patterns, tools and cascade analysis Christos Faloutsos CMU.
Advertisements

CMU SCS Large Graph Mining - Patterns, Explanations and Cascade Analysis Christos Faloutsos CMU.
1 Dynamics of Real-world Networks Jure Leskovec Machine Learning Department Carnegie Mellon University
CMU SCS Large Graph Mining - Patterns, Explanations and Cascade Analysis Christos Faloutsos CMU.
CMU SCS I2.2 Large Scale Information Network Processing INARC 1 Overview Goal: scalable algorithms to find patterns and anomalies on graphs 1. Mining Large.
School of Computer Science Carnegie Mellon University National Taiwan University of Science & Technology Unifying Guilt-by-Association Approaches: Theorems.
CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU.
CMU SCS : Multimedia Databases and Data Mining Lecture #26: Graph mining - patterns Christos Faloutsos.
CMU SCS : Multimedia Databases and Data Mining Extra: intro to hadoop C. Faloutsos.
CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU.
CMU SCS Large Graph Mining - Patterns, Tools and Cascade Analysis Christos Faloutsos CMU.
CMU SCS C. Faloutsos (CMU)#1 Large Graph Algorithms Christos Faloutsos CMU McGlohon, Mary Prakash, Aditya Tong, Hanghang Tsourakakis, Babis Akoglu, Leman.
CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU.
Social Networks and Graph Mining Christos Faloutsos CMU - MLD.
CMU SCS Mining Large Graphs Christos Faloutsos CMU.
CMU SCS APWeb 07(c) 2007, C. Faloutsos 1 Copyright notice Copyright (c) 2007, Christos Faloutsos - all rights preserved. Permission to use all or some.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
CMU SCS Graph mining: Patterns, Generators and Tools Christos Faloutsos CMU.
CMU SCS Bio-informatics, Graph and Stream mining Christos Faloutsos CMU.
CMU SCS Graph and stream mining Christos Faloutsos CMU.
CMU SCS Graph Mining and Influence Propagation Christos Faloutsos CMU.
CMU SCS Yahoo/Hadoop, 2008#1 Peta-Graph Mining Christos Faloutsos Prakash, Aditya Shringarpure, Suyash Tsourakakis, Charalampos Appel, Ana Chau, Polo Leskovec,
CMU SCS Multimedia and Graph mining Christos Faloutsos CMU.
CMU SCS Graph Mining: Laws, Generators and Tools Christos Faloutsos CMU.
CMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos Faloutsos CMU.
CMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos Faloutsos CMU.
CMU SCS : Multimedia Databases and Data Mining Lecture #28: Graph mining - patterns Christos Faloutsos.
CMU SCS Data Mining in Streams and Graphs Christos Faloutsos CMU.
CMU SCS Large Graph Mining - Patterns, Tools and Cascade Analysis Christos Faloutsos CMU.
CMU SCS Big (graph) data analytics Christos Faloutsos CMU.
CMU SCS Graph Mining: Laws, Generators and Tools Christos Faloutsos CMU.
Weighted Graphs and Disconnected Components Patterns and a Generator IDB Lab 현근수 In KDD 08. Mary McGlohon, Leman Akoglu, Christos Faloutsos.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P0-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
CMU SCS Mining Billion-node Graphs: Patterns, Generators and Tools Christos Faloutsos CMU.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
CMU SCS Mining Billion-Node Graphs Christos Faloutsos CMU.
CMU SCS Mining Billion-Node Graphs: Patterns and Algorithms Christos Faloutsos CMU.
CMU SCS Graph Mining - surprising patterns in real graphs Christos Faloutsos CMU.
CMU SCS Big (graph) data analytics Christos Faloutsos CMU.
CMU SCS Mining Billion Node Graphs Christos Faloutsos CMU.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
CMU SCS Talk 3: Graph Mining Tools – Tensors, communities, parallelism Christos Faloutsos CMU.
CMU SCS Mining Large Graphs: Fraud Detection, and Algorithms Christos Faloutsos CMU.
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P5-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 5: Graphs over time & tensors Faloutsos,
CMU SCS Graph Mining: patterns and tools for static and time-evolving graphs Christos Faloutsos CMU.
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.
CMU SCS Graph Mining Christos Faloutsos CMU. CMU SCS iCAST, Jan. 09C. Faloutsos 2 Thank you! Prof. Hsing-Kuo Kenneth Pao Eric, Morgan, Ian, Teenet.
CMU SCS Patterns, Anomalies, and Fraud Detection in Large Graphs Christos Faloutsos CMU.
CMU SCS Graph Mining: Laws, Generators and Tools Christos Faloutsos CMU.
CMU SCS Graph Mining: Laws, Generators and Tools Christos Faloutsos CMU.
1 Patterns of Cascading Behavior in Large Blog Graphs Jure Leskoves, Mary McGlohon, Christos Faloutsos, Natalie Glance, Matthew Hurst SDM 2007 Date:2008/8/21.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P9-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
CMU SCS Panel: Social Networks Christos Faloutsos CMU.
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P8-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 8: hadoop and Tera/Peta byte graphs.
CMU SCS Anomaly Detection in Large Graphs Christos Faloutsos CMU.
Graph Models Class Algorithmic Methods of Data Mining
Anomaly detection in large graphs
15-826: Multimedia Databases and Data Mining
Anomaly detection in large graphs
NetMine: Mining Tools for Large Graphs
Kijung Shin1 Mohammad Hammoud1
Part 1: Graph Mining – patterns
15-826: Multimedia Databases and Data Mining
Graph and Tensor Mining for fun and profit
Christos Faloutsos CMU
Graph and Tensor Mining for fun and profit
Algorithms for Large Graph Mining
Large Graph Mining: Power Tools and a Practitioner’s guide
Presentation transcript:

CMU SCS Mining Large Social Networks: Patterns and Anomalies Christos Faloutsos CMU

CMU SCS Thank you The Department of Informatics Happy 20-th! Prof. Yannis Manolopoulos Prof. Kostas Tsichlas Mrs. Nina Daltsidou AUTH, May 30, 2012C. Faloutsos (CMU) 2

CMU SCS International-caliber friends among AUTH alumni Prof. Evimaria Terzi (U. Boston) Prof. Kyriakos Mouratidis (SMU) Dr. Michalis Vlachos (IBM) … AUTH, May 30, 2012C. Faloutsos (CMU) 3

CMU SCS C. Faloutsos (CMU) 4 Outline Introduction – Motivation Problem#1: Patterns in graphs Problem#2: Tools Problem#3: Scalability Conclusions AUTH, May 30, 2012

CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] AUTH, May 30, 2012 $10s of BILLIONS revenue >500M users

CMU SCS C. Faloutsos (CMU) 6 Graphs - why should we care? IR: bi-partite graphs (doc-terms) web: hyper-text graph... and more: D1D1 DNDN T1T1 TMTM... AUTH, May 30, 2012

CMU SCS C. Faloutsos (CMU) 7 Graphs - why should we care? web-log (‘blog’) news propagation computer network security: /IP traffic and anomaly detection.... [subject-verb-object:  graph] Graph == relational table with 2 columns (src, dst) BIG DATA – big graphs AUTH, May 30, 2012

CMU SCS C. Faloutsos (CMU) 8 Outline Introduction – Motivation Problem#1: Patterns in graphs –Static graphs –Weighted graphs –Time evolving graphs Problem#2: Tools Problem#3: Scalability Conclusions AUTH, May 30, 2012

CMU SCS C. Faloutsos (CMU) 9 Problem #1 - network and graph mining What does the Internet look like? What does FaceBook look like? What is ‘normal’/‘abnormal’? which patterns/laws hold? AUTH, May 30, 2012

CMU SCS C. Faloutsos (CMU) 10 Graph mining Are real graphs random? AUTH, May 30, 2012

CMU SCS C. Faloutsos (CMU) 11 Laws and patterns Are real graphs random? A: NO!! –Diameter –in- and out- degree distributions –other (surprising) patterns So, let’s look at the data AUTH, May 30, 2012

CMU SCS C. Faloutsos (CMU) 12 Solution# S.1 Power law in the degree distribution [SIGCOMM99] log(rank) log(degree) internet domains att.com ibm.com AUTH, May 30, 2012

CMU SCS C. Faloutsos (CMU) 13 Solution# S.1 Power law in the degree distribution [SIGCOMM99] log(rank) log(degree) internet domains att.com ibm.com AUTH, May 30, 2012

CMU SCS C. Faloutsos (CMU) 14 But: How about graphs from other domains? AUTH, May 30, 2012

CMU SCS C. Faloutsos (CMU) 15 More power laws: web hit counts [w/ A. Montgomery] Web Site Traffic in-degree (log scale) Count (log scale) Zipf users sites ``ebay’’ AUTH, May 30, 2012

CMU SCS And numerous more Who-trusts-whom (epinions.com) Income [Pareto] –’80-20 distribution’ Duration of downloads [Bestavros+] Duration of UNIX jobs (‘mice and elephants’) Size of files of a user … ‘Black swans’ AUTH, May 30, 2012C. Faloutsos (CMU) 16

CMU SCS C. Faloutsos (CMU) 17 Outline Introduction – Motivation Problem#1: Patterns in graphs –Static graphs degree, diameter, eigen, Triangles –Time evolving graphs Problem#2: Tools AUTH, May 30, 2012

CMU SCS C. Faloutsos (CMU) 18 Solution# S.3: Triangle ‘Laws’ Real social networks have a lot of triangles AUTH, May 30, 2012

CMU SCS C. Faloutsos (CMU) 19 Solution# S.3: Triangle ‘Laws’ Real social networks have a lot of triangles –Friends of friends are friends Any patterns? AUTH, May 30, 2012

CMU SCS C. Faloutsos (CMU) 20 Triangle Law: #S.3 [Tsourakakis ICDM 2008] SNReuters Epinions X-axis: degree Y-axis: mean # triangles n friends -> ~n 1.6 triangles AUTH, May 30, 2012

CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 21 AUTH, May 30, C. Faloutsos (CMU)

CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 22 AUTH, May 30, C. Faloutsos (CMU)

CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 23 AUTH, May 30, C. Faloutsos (CMU)

CMU SCS C. Faloutsos (CMU) 24 Outline Introduction – Motivation Problem#1: Patterns in graphs –Static graphs –Time evolving graphs Problem#2: Tools … AUTH, May 30, 2012

CMU SCS C. Faloutsos (CMU) 25 Problem: Time evolution with Jure Leskovec (CMU -> Stanford) and Jon Kleinberg (Cornell – CMU) AUTH, May 30, 2012

CMU SCS C. Faloutsos (CMU) 26 T.1 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: –diameter ~ O(log N) –diameter ~ O(log log N) What is happening in real data? AUTH, May 30, 2012

CMU SCS C. Faloutsos (CMU) 27 T.1 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: –diameter ~ O(log N) –diameter ~ O(log log N) What is happening in real data? Diameter shrinks over time AUTH, May 30, 2012

CMU SCS C. Faloutsos (CMU) 28 T.1 Diameter – “Patents” Patent citation network 25 years of –2.9 M nodes –16.5 M edges time [years] diameter AUTH, May 30, 2012

CMU SCS C. Faloutsos (CMU) 29 Outline Introduction – Motivation Problem#1: Patterns in graphs Problem#2: Tools –Belief Propagation Problem#3: Scalability Conclusions AUTH, May 30, 2012

CMU SCS AUTH, May 30, 2012C. Faloutsos (CMU) 30 E-bay Fraud detection w/ Polo Chau & Shashank Pandit, CMU [www’07]

CMU SCS AUTH, May 30, 2012C. Faloutsos (CMU) 31 E-bay Fraud detection

CMU SCS AUTH, May 30, 2012C. Faloutsos (CMU) 32 E-bay Fraud detection

CMU SCS AUTH, May 30, 2012C. Faloutsos (CMU) 33 E-bay Fraud detection - NetProbe

CMU SCS Popular press And less desirable attention: from ‘Belgium police’ (‘copy of your code?’) AUTH, May 30, 2012C. Faloutsos (CMU) 34

CMU SCS C. Faloutsos (CMU) 35 Outline Introduction – Motivation Problem#1: Patterns in graphs Problem#2: Tools Problem#3: Scalability -PEGASUS Conclusions AUTH, May 30, 2012

CMU SCS AUTH, May 30, 2012C. Faloutsos (CMU) 36 Scalability Google: > 450,000 processors in clusters of ~2000 processors each [ Barroso, Dean, Hölzle, “Web Search for a Planet: The Google Cluster Architecture” IEEE Micro 2003 ] Yahoo: 5Pb of data [Fayyad, KDD’07] Problem: machine failures, on a daily basis How to parallelize data mining tasks, then? A: map/reduce – hadoop (open-source clone)

CMU SCS C. Faloutsos (CMU) 37 Outline Introduction – Motivation Problem#1: Patterns in graphs Problem#2: Tools Problem#3: Scalability –PEGASUS –Radius plot Conclusions AUTH, May 30, 2012

CMU SCS HADI for diameter estimation Radius Plots for Mining Tera-byte Scale Graphs U Kang, Charalampos Tsourakakis, Ana Paula Appel, Christos Faloutsos, Jure Leskovec, SDM’10 Naively: diameter needs O(N**2) space and up to O(N**3) time – prohibitive (N~1B) Our HADI: linear on E (~10B) –Near-linear scalability wrt # machines –Several optimizations -> 5x faster C. Faloutsos (CMU) 38 AUTH, May 30, 2012

CMU SCS ???? 19+ [Barabasi+] 39 C. Faloutsos (CMU) Radius Count AUTH, May 30, 2012 ~1999, ~1M nodes

CMU SCS YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) Largest publicly available graph ever studied. ???? 19+ [Barabasi+] 40 C. Faloutsos (CMU) Radius Count AUTH, May 30, 2012 ?? ~1999, ~1M nodes

CMU SCS YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) Largest publicly available graph ever studied. ???? 19+? [Barabasi+] 41 C. Faloutsos (CMU) Radius Count AUTH, May 30, (dir.) ~7 (undir.)

CMU SCS YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) 7 degrees of separation (!) Diameter: shrunk ???? 19+? [Barabasi+] 42 C. Faloutsos (CMU) Radius Count AUTH, May 30, (dir.) ~7 (undir.)

CMU SCS YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) Q: Shape? ???? 43 C. Faloutsos (CMU) Radius Count AUTH, May 30, 2012 ~7 (undir.)

CMU SCS 44 C. Faloutsos (CMU) YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) effective diameter: surprisingly small. Multi-modality (?!) AUTH, May 30, 2012

CMU SCS Radius Plot of GCC of YahooWeb. 45 C. Faloutsos (CMU)AUTH, May 30, 2012

CMU SCS 46 C. Faloutsos (CMU) YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) effective diameter: surprisingly small. Multi-modality: probably mixture of cores. AUTH, May 30, 2012

CMU SCS 47 C. Faloutsos (CMU) YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) effective diameter: surprisingly small. Multi-modality: probably mixture of cores. AUTH, May 30, 2012 EN ~7 Conjecture: DE BR

CMU SCS 48 C. Faloutsos (CMU) YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) effective diameter: surprisingly small. Multi-modality: probably mixture of cores. AUTH, May 30, 2012 ~7 Conjecture:

CMU SCS C. Faloutsos (CMU) 49 Outline Introduction – Motivation Problem#1: Patterns in graphs Problem#2: Tools Problem#3: Scalability Conclusions AUTH, May 30, 2012

CMU SCS C. Faloutsos (CMU) 50 OVERALL CONCLUSIONS – low level: Several new patterns (shrinking diameters, triangle-laws, etc) New tools: –Fraud detection (belief propagation) Scalability: PEGASUS / hadoop AUTH, May 30, 2012

CMU SCS C. Faloutsos (CMU) 51 OVERALL CONCLUSIONS – medium-level BIG DATA: Large datasets reveal patterns/outliers that are invisible otherwise AUTH, May 30, 2012

CMU SCS C. Faloutsos (CMU) 52 Project info Akoglu, Leman Chau, Polo Kang, U McGlohon, Mary Tong, Hanghang Prakash, Aditya AUTH, May 30, 2012 Thanks to: NSF IIS , IIS , CTA-INARC ; Yahoo (M45), LLNL, IBM, SPRINT, Google, INTEL, HP, iLab Koutra, Danae

CMU SCS Thank you for the honor! Congratulations for 20-th anniversary and… AUTH, May 30, 2012C. Faloutsos (CMU) 53

CMU SCS High-level conclusion: Collaborations Sociology + CS (triangles) Civil engineering + CS (sensor placement) fMRI/medical + graphs (medical db’s) … AUTH, May 30, 2012C. Faloutsos (CMU) 54

CMU SCS Never stop learning AUTH, May 30, 2012C. Faloutsos (CMU) 55 Socrates Plato Aristotle 