Presentation on theme: "Fighting spam: the thin grey line Alun Jones,"— Presentation transcript:
Fighting spam: the thin grey line Alun Jones, firstname.lastname@example.org
Constraints at Aber ● The recipient hates spam and wants us to block it all. ● The recipient hates incorrectly blocked messages, and wants us never to do it. ● The recipient must have the choice whether to receive suspected spam. ● Suspected spam must not be dropped silently.
Implications ● We need effective filters. ● We need a method which allows the recipients to register their filtering choices. ● We must accept mail at SMTP time whether or not we suspect it to be spam. ● If the recipient has opted to block spam, we must do something with it.
Filtering preferences ● Web page that users can use to: – choose which filters to use – Choose what to do with detected spam (allow, block, flag, refile)
How we cheat... ● All that stuff is strictly within the constraints. It's not 100% effective and requires a lot of maintenance. ● We could do a lot better at SMTP time and never actually block any legitimate mail.
Cheat 1 ● We're now quite strict in what we accept. We reject: – Mail claiming to be from aber.ac.uk, but not from an existing Aber address. – Mail with too many non-existant recipient addresses.
Cheat 2 ● “Teergrube”, or tarpit. We put artificial delays onto SMTP responses when: – The message comes from a DNS blacklisted site. – The message comes from an IP address which doesn't have an rDNS entry. – The mail has lots of recipients.
Cheat 3 – the one that works: Greylisting ● Advantages: – Never blocks mail completely. – Almost no processing overhead. – Blocks 95% of spam at SMTP time. ● Disadvantages: – Causes delivery delays. – Config problems at the other end can interact badly with the system.
So how does it work? ● SMTP is robust – temporary problems can be handled within the protocol. ● Spammers must get mail through quickly and they use forgery to hide their identity. ● Spammers almost never use a full-featured mail system to send their messages.
● When a new mail comes in, for each recipient: – Take a hash of sender+recipient. – Look it up in a database. – If not present ● fake a temporary problem for that recipient and store the hash and the time in the database. – Else If hash was stored < 1 hour ago ● fake the same temporary problem for that recipient. – Else ● accept the message for that recipient.
Example: Legit mail email@example.com mails firstname.lastname@example.org for the first time at 09:00 09:00 email@example.com => firstname.lastname@example.org - Not in database, fake a temporary error and add to database. Remote server tries again automatically 09:20 email@example.com => firstname.lastname@example.org - In database but retry was too soon, fake a temporary error. Remote server tries again automatically 10:20 email@example.com => firstname.lastname@example.org - In database, retry OK - accept message, albeit late. All subsequent messages from Fred are accepted.
Example: Spamming software Spammer tries to mail email@example.com using forgery and dedicated spamming software: 09:00 firstname.lastname@example.org => email@example.com - Probably not in database, fake a temporary error. Spam software probably gives up trying. Or hits us later with a different forged address: 10:20 firstname.lastname@example.org => email@example.com - Probably not in database, fake a temporary error. If the spam software doesn't implement retries, it never gets the messages through.
Implementation ● Exim MTA software talking via Unix domain socket to ● Perl daemon which uses a ● Perl module to make deferral decisions using hashes stored in a ● MySQL database
Results Week 21 st - 28 th March Total sender/recipient pairs tried: 519,221 Total delivered: 204,096 Total delivered without delay: 165,702 (81%) Total delivered within 2 hours: 93% Uncompleted: 315,125 Complaints received about undelivered mail: 0 Assumed spam: 61% of all mail attempted.