Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMU SCS Anomaly Detection in Large Graphs Christos Faloutsos CMU.

Similar presentations


Presentation on theme: "CMU SCS Anomaly Detection in Large Graphs Christos Faloutsos CMU."— Presentation transcript:

1 CMU SCS Anomaly Detection in Large Graphs Christos Faloutsos CMU

2 CMU SCS Thank you! Prof. Bill Scherlis Sharon Blazevich hotSoS, 2016(c) 2016, C. Faloutsos 2

3 CMU SCS (c) 2016, C. Faloutsos 3 Roadmap Introduction – Motivation –Why study (big) graphs? Part#1: Patterns in graphs Part#2: time-evolving graphs; tensors Conclusions hotSoS, 2016

4 CMU SCS (c) 2016, C. Faloutsos 4 Graphs - why should we care? Internet Map [lumeta.com] hotSoS, 2016 computer network security: Email traffic IP traffic (src, dst, dst-port, t) Malware propagation (machine-id, infected-file-id)

5 CMU SCS (c) 2016, C. Faloutsos 5 Graphs - why should we care? >$10B; ~1B users hotSoS, 2016

6 CMU SCS (c) 2016, C. Faloutsos 6 Graphs - why should we care? hotSoS, 2016 U Kang, Jay-Yoon Lee, Danai Koutra, and Christos Faloutsos. Net-Ray: Visualizing and Mining Billion-Scale Graphs PAKDD 2014, Tainan, Taiwan. ~1B nodes (web sites) ~6B edges (http links) ‘YahooWeb graph’

7 CMU SCS (c) 2016, C. Faloutsos 7 Graphs - why should we care? web-log (‘blog’) news propagation Recommendation systems Documents & terms (and plagiarism; fake appstore reviews,....) Many-to-many db relationship -> graph hotSoS, 2016

8 CMU SCS Motivating problems P1: patterns? Fraud detection? P2: patterns in time-evolving graphs / tensors hotSoS, 2016(c) 2016, C. Faloutsos 8 time source destination

9 CMU SCS Motivating problems P1: patterns? Fraud detection? P2: patterns in time-evolving graphs / tensors hotSoS, 2016(c) 2016, C. Faloutsos 9 time source destination Patterns anomalies

10 CMU SCS (c) 2016, C. Faloutsos 10 Roadmap Introduction – Motivation –Why study (big) graphs? Part#1: Patterns & fraud detection Part#2: time-evolving graphs; tensors Conclusions hotSoS, 2016

11 CMU SCS hotSoS, 2016(c) 2016, C. Faloutsos 11 Part 1: Patterns, & fraud detection

12 CMU SCS (c) 2016, C. Faloutsos 12 Laws and patterns Q1: Are real graphs random? hotSoS, 2016

13 CMU SCS (c) 2016, C. Faloutsos 13 Laws and patterns Q1: Are real graphs random? A1: NO!! –Diameter (‘6 degrees’; ‘Kevin Bacon’) –in- and out- degree distributions –other (surprising) patterns So, let’s look at the data hotSoS, 2016

14 CMU SCS (c) 2016, C. Faloutsos 14 Solution# S.1 Power law in the degree distribution [Faloutsos x 3 SIGCOMM99] log(rank) log(degree) internet domains att.com ibm.com hotSoS, 2016

15 CMU SCS (c) 2016, C. Faloutsos 15 Solution# S.1 Power law in the degree distribution [Faloutsos x 3 SIGCOMM99; + Siganos] log(rank) log(degree) -0.82 internet domains att.com ibm.com hotSoS, 2016

16 CMU SCS 16 S2: connected component sizes Connected Components – 4 observations: Size Count (c) 2016, C. Faloutsos hotSoS, 2016 1.4B nodes 6B edges

17 CMU SCS 17 S2: connected component sizes Connected Components Size Count (c) 2016, C. Faloutsos hotSoS, 2016 1) 10K x larger than next

18 CMU SCS 18 S2: connected component sizes Connected Components Size Count (c) 2016, C. Faloutsos hotSoS, 2016 2) ~0.7B singleton nodes

19 CMU SCS 19 S2: connected component sizes Connected Components Size Count (c) 2016, C. Faloutsos hotSoS, 2016 3) SLOPE!

20 CMU SCS 20 S2: connected component sizes Connected Components Size Count 300-size cmpt X 500. Why? 1100-size cmpt X 65. Why? (c) 2016, C. Faloutsos hotSoS, 2016 4) Spikes!

21 CMU SCS 21 S2: connected component sizes Connected Components Size Count suspicious financial-advice sites (not existing now) (c) 2016, C. Faloutsos hotSoS, 2016

22 CMU SCS (c) 2016, C. Faloutsos 22 Roadmap Introduction – Motivation Part#1: Patterns in graphs –Patterns: Degree; Triangles –Anomaly/fraud detection Part#2: time-evolving graphs; tensors Conclusions hotSoS, 2016

23 CMU SCS (c) 2016, C. Faloutsos 23 Solution# S.3: Triangle ‘Laws’ Real social networks have a lot of triangles hotSoS, 2016

24 CMU SCS (c) 2016, C. Faloutsos 24 Solution# S.3: Triangle ‘Laws’ Real social networks have a lot of triangles –Friends of friends are friends Any patterns? –2x the friends, 2x the triangles ? hotSoS, 2016

25 CMU SCS (c) 2016, C. Faloutsos 25 Triangle Law: #S.3 [Tsourakakis ICDM 2008] SNReuters Epinions X-axis: degree Y-axis: mean # triangles n friends -> ~n 1.6 triangles hotSoS, 2016

26 CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 26 hotSoS, 2016 26 (c) 2016, C. Faloutsos

27 CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 27 hotSoS, 2016 27 (c) 2016, C. Faloutsos

28 CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 28 hotSoS, 2016 28 (c) 2016, C. Faloutsos

29 CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 29 hotSoS, 2016 29 (c) 2016, C. Faloutsos

30 CMU SCS MORE Graph Patterns hotSoS, 2016(c) 2016, C. Faloutsos 30 ✔ ✔ ✔ RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akoglu and Christos Faloutsos. PKDD’09.

31 CMU SCS MORE Graph Patterns hotSoS, 2016(c) 2016, C. Faloutsos 31 Mary McGlohon, Leman Akoglu, Christos Faloutsos. Statistical Properties of Social Networks. in "Social Network Data Analytics” (Ed.: Charu Aggarwal) Deepayan Chakrabarti and Christos Faloutsos, Graph Mining: Laws, Tools, and Case Studies Oct. 2012, Morgan Claypool. Graph Mining: Laws, Tools, and Case Studies

32 CMU SCS (c) 2016, C. Faloutsos 32 Roadmap Introduction – Motivation Part#1: Patterns in graphs –Patterns –Anomaly / fraud detection CopyCatch Spectral methods (‘fBox’) Belief Propagation Part#2: time-evolving graphs; tensors Conclusions hotSoS, 2016 Patterns anomalies

33 CMU SCS Fraud Given –Who ‘likes’ what page, and when Find –Suspicious users and suspicious products hotSoS, 2016(c) 2016, C. Faloutsos 33 CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks, Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, Christos Faloutsos WWW, 2013.

34 CMU SCS Fraud Given –Who ‘likes’ what page, and when Find –Suspicious users and suspicious products hotSoS, 2016(c) 2016, C. Faloutsos 34 CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior in Social Networks, Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, Christos Faloutsos WWW, 2013. Likes

35 CMU SCS Our intuition ▪ Lockstep behavior: Same Likes, same time Graph Patterns and Lockstep Behavior Likes hotSoS, 2016 35 (c) 2016, C. Faloutsos

36 CMU SCS Our intuition ▪ Lockstep behavior: Same Likes, same time Graph Patterns and Lockstep Behavior Likes hotSoS, 2016 36 (c) 2016, C. Faloutsos

37 CMU SCS Our intuition ▪ Lockstep behavior: Same Likes, same time Graph Patterns and Lockstep Behavior Suspicious Lockstep Behavior Likes hotSoS, 2016 37 (c) 2016, C. Faloutsos

38 CMU SCS MapReduce Overview ▪ Use Hadoop to search for many clusters in parallel: 1. Start with randomly seed 2. Update set of Pages and center Like times for each cluster 3. Repeat until convergence Likes hotSoS, 2016 38 (c) 2016, C. Faloutsos

39 CMU SCS Deployment at Facebook ▪ CopyCatch runs regularly (along with many other security mechanisms, and a large Site Integrity team) 3 months of CopyCatch @ Facebook #users caught time hotSoS, 2016 39 (c) 2016, C. Faloutsos

40 CMU SCS Deployment at Facebook Manually labeled 22 randomly selected clusters from February 2013 Most clusters (77%) come from real but compromised users Fake acct hotSoS, 2016 40 (c) 2016, C. Faloutsos

41 CMU SCS (c) 2016, C. Faloutsos 41 Roadmap Introduction – Motivation Part#1: Patterns in graphs –Patterns –Anomaly / fraud detection CopyCatch Spectral methods (‘fBox’) Belief Propagation Part#2: time-evolving graphs; tensors Conclusions hotSoS, 2016

42 CMU SCS (c) 2016, C. Faloutsos 42 Problem: Social Network Link Fraud hotSoS, 2016 Target: find “stealthy” attackers missed by other algorithms Clique Bipartite core 41.7M nodes 1.5B edges

43 CMU SCS (c) 2016, C. Faloutsos 43 Problem: Social Network Link Fraud hotSoS, 2016 Neil Shah, Alex Beutel, Brian Gallagher and Christos Faloutsos. Spotting Suspicious Link Behavior with fBox: An Adversarial Perspective. ICDM 2014, Shenzhen, China. Target: find “stealthy” attackers missed by other algorithms Takeaway: use reconstruction error between true/latent representation!

44 CMU SCS (c) 2016, C. Faloutsos 44 Roadmap Introduction – Motivation Part#1: Patterns in graphs –Patterns –Anomaly / fraud detection CopyCatch Spectral methods (‘fBox’) Belief Propagation Part#2: time-evolving graphs; tensors Conclusions hotSoS, 2016

45 CMU SCS hotSoS, 2016(c) 2016, C. Faloutsos 45 E-bay Fraud detection w/ Polo Chau & Shashank Pandit, CMU [www’07]

46 CMU SCS hotSoS, 2016(c) 2016, C. Faloutsos 46 E-bay Fraud detection

47 CMU SCS hotSoS, 2016(c) 2016, C. Faloutsos 47 E-bay Fraud detection

48 CMU SCS hotSoS, 2016(c) 2016, C. Faloutsos 48 E-bay Fraud detection - NetProbe

49 CMU SCS Popular press And less desirable attention: E-mail from ‘Belgium police’ (‘copy of your code?’) hotSoS, 2016(c) 2016, C. Faloutsos 49

50 CMU SCS (c) 2016, C. Faloutsos 50 Roadmap Introduction – Motivation Part#1: Patterns in graphs –Patterns –Anomaly / fraud detection CopyCatch Spectral methods (‘fBox’) Belief Propagation; antivirus app Part#2: time-evolving graphs; tensors Conclusions hotSoS, 2016

51 CMU SCS Polo Chau Machine Learning Dept Carey Nachenberg Vice President & Fellow Jeffrey Wilhelm Principal Software Engineer Adam Wright Software Engineer Prof. Christos Faloutsos Computer Science Dept Polonium: Tera-Scale Graph Mining and Inference for Malware Detection PATENT PENDING SDM 2011, Mesa, Arizona

52 CMU SCS Polonium: The Data 60+ terabytes of data anonymously contributed by participants of worldwide Norton Community Watch program 50+ million machines 900+ million executable files Constructed a machine-file bipartite graph (0.2 TB+) 1 billion nodes (machines and files) 37 billion edges hotSoS, 2016 52 (c) 2016, C. Faloutsos

53 CMU SCS Polonium: Key Ideas Use Belief Propagation to propagate domain knowledge in machine-file graph to detect malware Use “guilt-by-association” (i.e., homophily) –E.g., files that appear on machines with many bad files are more likely to be bad Scalability: handles 37 billion-edge graph hotSoS, 2016 53 (c) 2016, C. Faloutsos

54 CMU SCS Polonium: One-Interaction Results 84.9% True Positive Rate 1% False Positive Rate True Positive Rate % of malware correctly identified 54 Ideal hotSoS, 2016(c) 2016, C. Faloutsos False Positive Rate % of non-malware wrongly labeled as malware

55 CMU SCS (c) 2016, C. Faloutsos 55 Roadmap Introduction – Motivation Part#1: Patterns in graphs Part#2: time-evolving graphs; tensors Conclusions hotSoS, 2016

56 CMU SCS hotSoS, 2016(c) 2016, C. Faloutsos 56 Part 2: Time evolving graphs; tensors

57 CMU SCS Graphs over time -> tensors! Problem #2: –Given who calls whom, and when –Find patterns / anomalies hotSoS, 2016(c) 2016, C. Faloutsos 57 smith johnson

58 CMU SCS Graphs over time -> tensors! Problem #2: –Given who calls whom, and when –Find patterns / anomalies hotSoS, 2016(c) 2016, C. Faloutsos 58

59 CMU SCS Graphs over time -> tensors! Problem #2: –Given who calls whom, and when –Find patterns / anomalies hotSoS, 2016(c) 2016, C. Faloutsos 59 Mon Tue

60 CMU SCS Graphs over time -> tensors! Problem #2: –Given who calls whom, and when –Find patterns / anomalies hotSoS, 2016(c) 2016, C. Faloutsos 60 callee caller time

61 CMU SCS Graphs over time -> tensors! Problem #2’: –Given author-keyword-date –Find patterns / anomalies hotSoS, 2016(c) 2016, C. Faloutsos 61 keyword author date MANY more settings, with >2 ‘modes’

62 CMU SCS Graphs over time -> tensors! Problem #2’’: –Given subject – verb – object facts –Find patterns / anomalies hotSoS, 2016(c) 2016, C. Faloutsos 62 object subject verb MANY more settings, with >2 ‘modes’

63 CMU SCS Graphs over time -> tensors! Problem #2’’’: –Given –Find patterns / anomalies hotSoS, 2016(c) 2016, C. Faloutsos 63 mode2 mode1 mode3 MANY more settings, with >2 ‘modes’ (and 4, 5, etc modes)

64 CMU SCS (c) 2016, C. Faloutsos 64 Roadmap Introduction – Motivation Part#1: Patterns in graphs Part#2: time-evolving graphs; tensors –Intro to tensors –Results –Speed Conclusions hotSoS, 2016

65 CMU SCS Answer to both: tensor factorization Recall: (SVD) matrix factorization: finds blocks hotSoS, 2016(c) 2016, C. Faloutsos 65 N users M products ‘meat-eaters’ ‘steaks’ ‘vegetarians’ ‘plants’ ‘kids’ ‘cookies’ ~ + +

66 CMU SCS Answer to both: tensor factorization PARAFAC decomposition hotSoS, 2016(c) 2016, C. Faloutsos 66 = + + subject object verb politicians artistsathletes

67 CMU SCS Answer: tensor factorization PARAFAC decomposition Results for who-calls-whom-when –4M x 15 days hotSoS, 2016(c) 2016, C. Faloutsos 67 = + + caller callee time ??

68 CMU SCS Anomaly detection in time- evolving graphs Anomalous communities in phone call data: –European country, 4M clients, data over 2 weeks ~200 calls to EACH receiver on EACH day! 1 caller5 receivers4 days of activity hotSoS, 2016 68 (c) 2016, C. Faloutsos =

69 CMU SCS Anomaly detection in time- evolving graphs Anomalous communities in phone call data: –European country, 4M clients, data over 2 weeks ~200 calls to EACH receiver on EACH day! hotSoS, 2016 69 (c) 2016, C. Faloutsos = 1 caller5 receivers4 days of activity

70 CMU SCS Anomaly detection in time- evolving graphs Anomalous communities in phone call data: –European country, 4M clients, data over 2 weeks ~200 calls to EACH receiver on EACH day! hotSoS, 2016 70 (c) 2016, C. Faloutsos = Miguel Araujo, Spiros Papadimitriou, Stephan Günnemann, Christos Faloutsos, Prithwish Basu, Ananthram Swami, Evangelos Papalexakis, Danai Koutra. Com2: Fast Automatic Discovery of Temporal (Comet) Communities. PAKDD 2014, Tainan, Taiwan.

71 CMU SCS (c) 2016, C. Faloutsos 71 Roadmap Introduction – Motivation Part#1: Patterns in graphs Part#2: time-evolving graphs; tensors –Tensors for intrusion detection –Inter-arrival times Conclusions hotSoS, 2016

72 CMU SCS Honeypots MalSpot: Multi2 Malicious Network Behavior Patterns Analysis Ching-Hao Mao, Chung-Jung Wu, Kuo-Chen Lee, Evangelos E. Papalexakis, Christos Faloutsos, and Tien-Cheu Kao, PAKDD’14, Tainan, TW hotSoS, 2016(c) 2016, C. Faloutsos 72

73 CMU SCS Data The typical: –IP source –IP destination (honeypot) –Operation id (eg., ssh, ftp) –timestamp hotSoS, 2016(c) 2016, C. Faloutsos 73 128.1.1.1 128.3.3.3 128.2.2.2 ftp ssh 128.4.4.4

74 CMU SCS Preliminary analysis/visualization hotSoS, 2016(c) 2016, C. Faloutsos 74 Op-id Honeypot ‘A’Honeypot ‘B’Honeypot ‘C’ time

75 CMU SCS Tensor analysis hotSoS, 2016(c) 2016, C. Faloutsos 75 1 st latent variable 2 nd latent variable attacker, brute-force on POP3 (port 110) attacker on port 25 =

76 CMU SCS (c) 2016, C. Faloutsos 76 Roadmap Introduction – Motivation –Why study (big) graphs? Part#1: Patterns in graphs Part#2: time-evolving graphs; tensors –Inter-arrival time patterns Acknowledgements and Conclusions hotSoS, 2016

77 CMU SCS RSC: Mining and Modeling Temporal Activity in Social Media Alceu F. Costa * Yuto Yamaguchi Agma J. M. Traina Caetano Traina Jr. Christos Faloutsos Universidade de São Paulo KDD 2015 – Sydney, Australia * alceufc@icmc.usp.br

78 CMU SCS Reddit Dataset Time-stamp from comments 21,198 users 20 Million time-stamps Twitter Dataset Time-stamp from tweets 6,790 users 16 Million time-stamps Pattern Mining: Datasets For each user we have: Sequence of postings time-stamps: T = (t 1, t 2, t 3, …) Inter-arrival times (IAT) of postings: (∆ 1, ∆ 2, ∆ 3, …) 78 t1t1 t2t2 t3t3 t4t4 ∆1∆1 ∆2∆2 ∆3∆3 time hotSoS, 2016(c) 2016, C. Faloutsos

79 CMU SCS Pattern Mining Pattern 1: Distribution of IAT is heavy-tailed Users can be inactive for long periods of time before making new postings IAT Complementary Cumulative Distribution Function (CCDF) (log-log axis) 79 Reddit UsersTwitter Users hotSoS, 2016(c) 2016, C. Faloutsos

80 CMU SCS Pattern Mining Pattern 2: Bimodal IAT distribution Users have highly active sections and resting periods Log-binned histogram of postings IAT 80 Twitter Users 1 st Mode (1min) 2 nd Mode (3h) hotSoS, 2016(c) 2016, C. Faloutsos

81 CMU SCS Human? Robots? log linear hotSoS, 2016(c) 2016, C. Faloutsos 81

82 CMU SCS Human? Robots? log linear 2’ 3h 1day hotSoS, 2016(c) 2016, C. Faloutsos 82

83 CMU SCS Experiments: Can RSC-Spotter Detect Bots? Precision vs. Sensitivity Curves Good performance: curve close to the top 83 Precision > 94% Sensitivity > 70% With strongly imbalanced datasets # humans >> # bots Precision > 94% Sensitivity > 70% With strongly imbalanced datasets # humans >> # bots Twitter hotSoS, 2016(c) 2016, C. Faloutsos

84 CMU SCS Experiments: Can RSC-Spotter Detect Bots? Precision vs. Sensitivity Curves Good performance: curve close to the top 84 Precision > 96% Sensitivity > 47% With strongly imbalanced datasets # humans >> # bots Precision > 96% Sensitivity > 47% With strongly imbalanced datasets # humans >> # bots Reddit hotSoS, 2016(c) 2016, C. Faloutsos

85 CMU SCS Work under progress Active Directory log data From a large Asian institution (user-id, operation-id, timestamp) hotSoS, 2016(c) 2016, C. Faloutsos 85 Hemank Lamba

86 CMU SCS 30sec, 1min and 2min. - > scripts/attacks? IAT (log scale) PDF (log scale)

87 CMU SCS (c) 2016, C. Faloutsos 87 Roadmap Introduction – Motivation –Why study (big) graphs? Part#1: Patterns in graphs Part#2: time-evolving graphs; tensors Acknowledgements and Conclusions hotSoS, 2016

88 CMU SCS (c) 2016, C. Faloutsos 88 Thanks hotSoS, 2016 Thanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC ; Yahoo (M45), LLNL, IBM, SPRINT, Google, INTEL, HP, iLab Disclaimer: All opinions are mine; not necessarily reflecting the opinions of the funding agencies

89 CMU SCS (c) 2016, C. Faloutsos 89 Cast Akoglu, Leman Chau, Polo Kang, U Prakash, Aditya hotSoS, 2016 Koutra, Danai Beutel, Alex Papalexakis, Vagelis Shah, Neil Lee, Jay Yoon Araujo, Miguel

90 CMU SCS (c) 2016, C. Faloutsos 90 CONCLUSION#1 – Big data Patterns Anomalies Large datasets reveal patterns/outliers that are invisible otherwise hotSoS, 2016

91 CMU SCS (c) 2016, C. Faloutsos 91 CONCLUSION#2 – tensors powerful tool hotSoS, 2016 = 1 caller5 receivers4 days of activity

92 CMU SCS (c) 2016, C. Faloutsos 92 References D. Chakrabarti, C. Faloutsos: Graph Mining – Laws, Tools and Case Studies, Morgan Claypool 2012 http://www.morganclaypool.com/doi/abs/10.2200/S004 49ED1V01Y201209DMK006 hotSoS, 2016

93 CMU SCS TAKE HOME MESSAGE: Cross-disciplinarity hotSoS, 2016(c) 2016, C. Faloutsos 93 =

94 CMU SCS Cross-disciplinarity hotSoS, 2016(c) 2016, C. Faloutsos 94 = Thank you!


Download ppt "CMU SCS Anomaly Detection in Large Graphs Christos Faloutsos CMU."

Similar presentations


Ads by Google