Justin Mason, SpamAssassin Project & Deersoft

Justin Mason, SpamAssassin Project & Deersoft
Filtering Spam With Justin Mason, SpamAssassin Project & Deersoft

What Is Spam? Best description: "Unsolicited Bulk E-mail"
In human terms: bulk you didn't want, and didn't ask for Mailing lists, newsletters, "latest offers": not spam, if you asked for them in the first place Name courtesy of Monty Python: “spam, spam, spam and spam”

Why Bother Filtering Spam?
Seems to be about 30% to 60% of mail traffic, and increasing Users are forced to waste time wading through their inbox costs their employers money Impossible to unsubscribe “unsubscribe” addresses work only 37% of the time, according to the FTC Legal retaliation not possible, yet Just plain irritating!

Spam Volume Is Increasing
(data from Brightmail.com)

Filtering: Homebrew Blacklists
First round of "spam filters": internal blacklists, maintained by in-house admin staff Match addresses, and delete those from known spammers Later, match "bad words" (Viagra, porn) Quite hard to configure; centralised; lots of work to keep up to date

Filtering: DNS Blacklists
Identify spam source computers by IP address Allow mail system to look up a public database on the internet as mail arrives Block the message, if its sender's address is blacklisted Now at least 20 DNS blacklists, with varying reliability Many false positives eircom.net's main mail server!

SpamAssassin Concepts
Zero-configuration where possible Lots of rules to determine if a mail is spam or not "Fuzzy logic": rules are assigned scores, based on our confidence in their accuracy These are combined to produce an overall score for each message If over a user-defined threshold, the mail is judged as spam No one rule, alone, can mark a mail as spam

SpamAssassin Concepts, pt.2
Combines many systems for a "broad-spectrum" approach: Detect forged headers Spam-tool signatures in headers Text keyword scanner in the message body DNS blacklists Razor, DCC (Distributed Checksum Clearinghouse), Pyzor Spammers cannot aim to defeat 1 system; the others will catch them out

Integration Into Mail Systems
Wrote SpamAssassin with flexibility of integration in mind Many have been written: Integration into Mail Transfer Agents (sendmail, qmail, Exim, Postfix, Microsoft Exchange) Integration into virus-scanner MTA plug-ins (MIMEDefang, amavisd-new) IMAP/POP proxies and clients Commercial plug-ins for Windows clients (Eudora, MS Outlook) And many more I don't know about!

Accuracy and False Positives
The big issue with filtering to date: not just “how much spam does it catch?” but “how many legitimate mails get caught, too?” Many systems do not pay attention to this problem Some blacklists even use "false positives" as a weapon against service providers selling to spammers FPs are much worse than spam getting through much more inconvenient to user

Evolving a Better Filter
SpamAssassin assigns scores using a genetic algorithm Given a big collection of human-classified mail, determine what tests each mail triggers Use this to "evolve" an efficient score set Exactly the kind of problem a genetic algorithm is good at Allows "shotgun" rules to be scored low, where they cannot do damage

False Positive Rate SpamAssassin is 98.5% accurate on our test corpora, with default settings 0.6% false positives 91% of all spam caught correctly with network tests on, spam hit-rate probably increases to about 93-95% Highest rate available among present tools Tunable by the user -- reduce FPs by increasing the threshold, ditto vice-versa

Effect of the Threshold Setting

What To Do When You've Caught It
Since classifiers are imperfect, blind deletion is bad Better to mark the mails, and allow user to check over them infrequently Also good to mark for legal reasons In the UK, it may be illegal to hold mail (even spam) for more than 3 days

Features For Large-Scale Use: "spamd"
Client-server interface to SpamAssassin Pre-loads, so much faster for high volumes Can load user preferences from an SQL database Can load-balance -- uses TCP/IP Deployed at several large organisations and ISPs: The Well, Salon.com, Panix, Transmeta, SourceForge, Stanford

Large-Scale Filtering For Your Network
Different from filtering for yourself Many users get little spam Should use conservative settings Better to use “opt-out by default” notify that spam filtering is available, and ask them if they want it

How Can Network Administrators Fight Spam?
Scan for Open Relays & Proxies on your network Block proxy ports at the firewall Audit web servers for “FormMail” or other insecure web-to-mail scripts Spam traps reporting to network blacklists: Razor, DCC, Pyzor Run SpamAssassin, or SpamAssassin Pro!

How Do The Spammers Feel?
Already hurting, according to CBS: “[I’ve gone through] unbelievable hardships [to keep spamming] ... My operating costs have gone up 1,000% this year, just so I can figure out how to get around all these filters” Spam relies on low overheads and extremely cheap delivery Disrupt the equation and they will give up!

Future Directions Learning filters (Bayesian probability etc.)
Learn automatically, to detect what "good" mail to your network looks like "Hash-cash" Sending mail currently more-or-less free With hash-cash, each recipient requires CPU time for the sender SpamAssassin can provide "bonus points" for hash-cash users

Fin http://spamassassin.org/ http://www.deersoft.com/
SpamAssassin for UNIX (free software) SpamAssassin Pro: MS Outlook, Exchange (commercial version) (my employers!)

Justin Mason, SpamAssassin Project & Deersoft

Similar presentations

Presentation on theme: "Justin Mason, SpamAssassin Project & Deersoft"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Justin Mason, SpamAssassin Project & Deersoft

Similar presentations

Presentation on theme: "Justin Mason, SpamAssassin Project & Deersoft"— Presentation transcript:

Similar presentations

About project

Feedback