Presentation is loading. Please wait.

Presentation is loading. Please wait.

Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks Yehonatan Cohen Daniel Gordon Danny Hendler Ben-Gurion University Yehonatan.

Similar presentations


Presentation on theme: "Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks Yehonatan Cohen Daniel Gordon Danny Hendler Ben-Gurion University Yehonatan."— Presentation transcript:

1 Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks Yehonatan Cohen Daniel Gordon Danny Hendler Ben-Gurion University Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013

2  Preliminaries  ErDOS: An Early Detection Scheme for Outgoing Spam  Evaluation  Conclusions and Future Work Danny Hendler and Philipp Woelfel, PODC 2009 Talk outline

3 Preliminaries  Spam Unsolicited mail, typically sent in large quantities  Hazards Malware distribution Phishing Resource consumption Poor user experience  Detection may be attempted when Mail is sent (outgoing spam detection) Mail is received (incoming spam detection) Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013

4 Outgoing spam detection  Spam can be blocked before leaving the Email Service Provider (ESP)  Advantages Reduces load on ESP infrastructure Prevents damage to ESP reputation Detection may be based on hosted accounts' activity Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013

5 Outgoing spam filtering techniques  Contents-based filtering: Learn & identify messages' textual patterns typical of spam messages May be tricked by manipulating spam content o Image-based o Random string insertion (hash busters) Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013 Non-negligible false negative rate

6 Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013 Outgoing spam filtering techniques (cont'd)  Inter-account communication patterns analysis: Models accounts' behaviour Based on inter-account social interactions Typically utilizes machine-learning techniques May leverage ESP account identification

7  Devise an effective detector of outgoing spammers for large ESPs (the ErDOS detector)  Emphasis on early detection Detects spammers before the contents-based filter  Short training periods Highly adaptive to changing spamming patterns Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013 Our goals

8 Most relevant related work  Lam & Yeung, CEAS 2007 Introduce “social-network”-based outgoing spam detection Use the k-NN classifier Relatively small dataset (ENRON) Labeling based on simulated spammer accounts  Tseng & Chen, CSE 2009 Uses same set of features Uses SVM classifier Larger, non-ESP dataset (University email server) Incremental model update Labeling based on pure accounts Account identification based on “from” header field Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013

9 Comparison with data-sets of previous work Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013 Our data setNTUEnron #mails9.86E72.13E82.86E65.17E5 #accounts5.63E75.81E76.37E53.67E4 #edges7.40E712.90E7-3.68E5 time period 4 days (in/out) 26 days (outgoing) 10 days3.5 years contentsspam & ham ham  Collected by a very large ESP  Consists of incoming and outgoing log files o 4 days of bi-directional data + 22 days of outgoing traffic only  Both incoming and outgoing messages are labeled as spam/ham by a content-based detector

10 Comparison with data-sets of previous work Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013 Our data setNTUEnron #mails9.86E72.13E82.86E65.17E5 #accounts5.63E75.81E76.37E53.67E4 #edges7.40E712.90E7-3.68E5 time period 4 days (in/out) 26 days (outgoing) 10 days3.5 years contentsspam & ham ham  Collected by a very large ESP  Consists of incoming and outgoing log files o 4 days of bi-directional data + 22 days of outgoing traffic only  Both incoming and outgoing messages are labeled as spam/ham by a content-based detector

11 Danny Hendler and Philipp Woelfel, PODC 2009  Preliminaries  ErDOS: An Early Detection Scheme for Outgoing Spam Computation Flow Features  Evaluation  Conclusions and Future Work Talk outline

12 The ErDOS detector: computation flow Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013 Scored accounts Classified data set Classification model Undersampling: extract all spammers and equal number of legitimate accounts as training set Training set Remainder of accounts not in training set Determine accounts' classification Compute account feature values based on a single day of email logs Build rotation forest model Assign account scores using classification model Construct suspect accounts list of configurable size Pre-processing Feature values computed

13 Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013  Preliminaries  ErDOS: An Early Detection Scheme for Outgoing Spam Computation Flow Features  Evaluation  Conclusions and Future Work Talk outline

14 Legitimate users  Maintain social interactions  Often belong to mailing lists Spammers  Sent messages seldom replied Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013 An account’s IOR = #incoming/#outgoing mails Low IOR characteristic of spammers ErDOS features: IOR

15 Danny Hendler and Philipp Woelfel, PODC 2009 ErDOS features: IOR (cont'd)

16  Communication Reciprocity (CR) Fraction of recipients who responded to an account's emails Defined by Gomes et al. IOR is superior for short training periods Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013 ErDOS features: IOR versus CR

17  IEBC (Internal/External Behaviour Consistency) An account can send/receive emails to/from  Internal addresses (accounts hosted by ESP)  External addresses Legitimate accounts show correlation between internal and external IOR, spammers less so Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013 ErDOS features: IEBC

18 ErDOS features: #outgoing messages  Number of outgoing messages Spamming accounts send more emails than legitimate Insufficient for detecting low-volume spammers Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013

19  A large fraction of spammers' incoming mail is spam! Legitimate accounts seldom send emails to spamming accounts Dictionary attacks may cause spammers to spam each other  Analyse senders' characteristics Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013 ErDOS: Sender Accounts' Characteristics

20 Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013  Preliminaries  ErDOS: An Early Detection Scheme for Outgoing Spam  Evaluation  Conclusions and Future Work Talk outline

21 Accuracy for Single-Day training  Evaluate Accuracy attained for single day logs Email accounts are classified based on the tags of the contents-base detector True Positive (TP) and False Positive (FP) values are averaged over available 4 days of bidirectional data Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013 ErDOSLY-knn MailNET TPFPTPFPTPFP 718.976.347.822.644.2

22 Early detection evaluation  Spamming accounts detected before the contents-based detector Suspected by detector, send messages tagged as spam only on later days Evaluation uses all 26 days of data  Early detection quality criteria: e-Precision: fraction of early detected accounts out of suspects list. Enrichment Factor (EF): ratio between detector's e-Precision and that of a random accounts list. Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013

23 Early detection  Early detection results, averaged over 4 days:  Prior art’s early detections results compared to ErDOS: Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013 ErDOS’s suspectsEntire population #accounts100 Early detections90.53 e-Precision0.090.0053 ErDOSLY-knnMailNET e-Precision90.00.0120.025 EF16.92.34.7

24 Early detection (cont’d)  e-Precision for varying suspects list lengths: Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013

25  Preliminaries  ErDOS: An Early Detection Scheme for Outgoing Spam  Evaluation  Conclusions and Future Work Talk outline

26 Conclusions and Future Work  Conclusions The case of outgoing spam detection for ESPs has its unique nature Contents-based filtering is not enough Early detection of spamming accounts can be achieve by a combination of contents-based filter and network level- based detector  Future Work Enhancement of ErDOS’s early detection performance by additional features A low-volume spammers expert detector, based on ErDOS’s computation flow and features Yehonatan Cohen, Daniel Gordon and Danny Hendler, DIMVA 2013


Download ppt "Early Detection of Outgoing Spammers in Large-Scale Service Provider Networks Yehonatan Cohen Daniel Gordon Danny Hendler Ben-Gurion University Yehonatan."

Similar presentations


Ads by Google