Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spam: An Analysis of Spam Filters Joe Chiarella Jason O’Brien Advisors: Professor Wills and Professor Claypool.

Similar presentations


Presentation on theme: "Spam: An Analysis of Spam Filters Joe Chiarella Jason O’Brien Advisors: Professor Wills and Professor Claypool."— Presentation transcript:

1

2 Spam: An Analysis of Spam Filters Joe Chiarella Jason O’Brien Advisors: Professor Wills and Professor Claypool

3 Project Goals To analyze the effectiveness of different kinds of spam filters. To analyze the effectiveness of different kinds of spam filters. Focused on SpamAssassin and Bogofilter Focused on SpamAssassin and Bogofilter

4 SpamAssassin Rule-based filter – over 400 rules. Rule-based filter – over 400 rules. Each Rule has an associated weight. Each Rule has an associated weight. Score of an email is sum of weights across all matching rules. Score of an email is sum of weights across all matching rules. User adjustable threshold. User adjustable threshold.

5 Bogofilter Bayesian filter. Bayesian filter. Calculates probability that an email is spam using past email. Calculates probability that an email is spam using past email. Looks at frequency of words (not order of words). Looks at frequency of words (not order of words). Accuracy should improve over time. Accuracy should improve over time.

6 Data Collection Email collected from students, professors, small business employees, and free email accounts. Email collected from students, professors, small business employees, and free email accounts. 4626 ham emails, 5010 spam emails, separated into ham and spam mailboxes for each user. 4626 ham emails, 5010 spam emails, separated into ham and spam mailboxes for each user.

7 Methodology Compared accuracy of SpamAssassin and Bogofilter for each user’s email. Compared accuracy of SpamAssassin and Bogofilter for each user’s email. Tested same number of ham emails and spam emails from each user. Tested same number of ham emails and spam emails from each user. Ignored results from first 50 emails to allow Bogofilter to learn. Ignored results from first 50 emails to allow Bogofilter to learn.

8 Comparison of Bogofilter and SpamAssassin on Ham CP = Company Person PR = Professor ST = Student FE = Free Email

9 Comparison of Bogofilter and SpamAssassin on Spam CP = Company Person PR = Professor ST = Student FE = Free Email

10 SpamAssassin Score Analysis

11 Conclusion Bogofilter and SpamAssassin effectiveness depend greatly on the user. Bogofilter and SpamAssassin effectiveness depend greatly on the user. Neither filter outperformed the other in all cases. Neither filter outperformed the other in all cases. Filtering Spam is hard. Filtering Spam is hard.

12 Questions?


Download ppt "Spam: An Analysis of Spam Filters Joe Chiarella Jason O’Brien Advisors: Professor Wills and Professor Claypool."

Similar presentations


Ads by Google