Monday, May 19, 2008

.

How not to fight spam

There’s been a discussion on the IETF working group chairs mailing list about some of the criteria that are used to look for spam sent to IETF mailing lists. In particular, the IT department is using these four rules:

  1. Check that the HELO hostname has an A or MX record, defer if not.
  2. Check that the client IP address maps to a name; defer if not.
  3. Check that that name maps back to an IP address; defer if not.
  4. Check that the name-to-address mapping matches the client IP; defer if not (that is, check that steps 2 and 3 give consistent results).
There are some current cases where these rules are blocking legitimate posts to the mailing lists, and the discussion is about whether these rules are reasonable. I’ve sent the comments below as my contribution to the discussion, and I’m posting them here too, because they lead up to a significant point about how we fight spam.

The HELO command, in an SMTP email session, serves only to start the session and to identify the sending domain for the log file. The domain that’s given there is used for nothing — nothing at all. It’s easy to think that it’s reasonable to at least make sure that it’s a valid domain for sending mail from (the purpose of test 1, above), but the reality is that it can be bypassed by putting any domain at all there. I can send email and use HELO whitehouse.gov and the receiving mail server will be happy.

There’ve been proposals to validate the sending server’s authority to use that domain (CSV, for example), and to actually use the domain for some purpose, such as checking it against a whitelist or blacklist. But unless it’s both validated and used, there’s no value in doing the DNS check.

When a legitimate user at example.com sends email to the IETF, it will usually be a dedicated email server that makes the connection and transmits the mail. Let’s say that server is email1.example.com, and that its Internet address (IP address) is 2.4.68.1. The IETF mail server will see that the connection came from 2.4.68.1, and step 2 will resolve that to email1.example.com. Step 3 will then resolve the name back to the IP address, and step 4 will confirm that it got that same 2.4.68.1 back. That’s fine.

But if mail is sent illegitimately, by some user within example.com, bypassing the normal mail servers — common if spam is sent by an infected zombie computer — it’s possible that one or both of the DNS resolutions in 2 and 3 will fail.

That’s the theory, anyway. The practice is that the transient IP addresses that are usually assigned to users’ computers often do resolve in DNS... and small domains often don’t have all their computers listed (they aren’t expecting people to want/need to resolve them), so they wind up as false positives and get their mail blocked.

And step 4 just seems odd. If the DNS resolutions work in 2 and 3, a failure in 4 would most likely indicate a configuration error, rather than a “bad guy”. It seems strange to use a check like that to block mail. Can that really be catching any spam?

All of these checks seem wrong to me as currently implemented. And in general, I think it’s a bad idea to use anti-spam techniques that address current spammer tactics that are easily changed once the spammers are aware of what we’re doing. It’s rather like rejecting mail that contains certain keywords: the spammers just start misspelling the words. We long ago learned that that’s a pointless skirmish, and that we need to fight more strategic battles.

No comments: