Wednesday, December 24, 2008


Having trouble viewing this blog?

I’ve been getting a great deal of spam, lately, that has a particular characteristic: the body of the message has nothing more than an image with alternative text that says, “Having trouble viewing this email? Click here to view as a webpage.” The image (and the alt-text) is a clickable link to the web page they want you to go to. The spammers have clearly clued into the fact that many email readers are refusing to automatically show images from unknown sources.

Spam like that — essentially image only — is more difficult to filter than text is, but we do have technology for it. Here’s an example of one of these images from recent email (click to see it full-sized):

Image from a spam message

Note a few things about it:

  1. While I don’t have other images to compare this one to, there doesn’t appear to have been any attempt to use background variations or funny colour mixtures in order to make this image different from those in other spam messages.
  2. The image contains easy-to-read text. Nothing’s been distorted.
  3. The message is straightforward, but short; there isn’t a lot of text to work with.
  4. What text there is would be considered very spam-like if it were plain text being analyzed by a typical spam filter.
The first level of technology we have is similarity checking: how similar is this image to images that appear in known spam messages? In this case, it’s likely that the image is identical, or nearly so, to known spam, so it will have been caught by that.

The next step is to do character recognition within the image. We can easily pick out clean, clear, non-obfuscated text from images, and then treat it as plain text (we can also do it with some degree of obfuscation, though we’re not as accurate with that). This image should certainly have been no problem there, and it will have been caught by this technique as well.

Finally, there’s image analysis that looks for certain characteristics, such as shapes, colours, element edges, and the like. The success with this sort of thing is a bit hit-and-miss. It’s the sort of thing that might be used to guess that there are naked bodies, and classify an image as pornography... though it will often get that wrong. It probably could have detected the MasterCard logo, but it wouldn’t be reasonable to declare a message as spam just because of that.

What’s more, a great deal of this spam is sent as purportedly coming from my — from my own email address. That, too, will often trip the spam filters, and likely did (all of these messages were, indeed, classified as spam). One wonders why spammers persist in doing this. While it’s certainly reasonable, and sometimes common, for someone to send email to herself, such mail is not likely to come from arbitrary places on the Internet.

Of course, they also use the usual spammer trick of various innocuous sounding subject lines, often with the “Re:” prefix to try to make the recipient think it’s actually a reply to an earlier communication. Surely, no one is fooled by that any more. Here are some of the subjects I’ve recently seen for these types of messages:

Re: Message
Re: Order status
Re: Your inquiry
Delivery Status Notification (failure)
That last, of course, is meant to make me think this is a “bounce” message, and that some important message I sent to someone never got delivered... so I’d better look to see what it was.

What’s good is that we’re pretty much catching all of these. What’s bad is that the spammers just keep sending more and more and more of them.

1 comment:

Frisky070802 said...

Yeah, a few months ago my gmail address leaked out and started getting spam, and the last few weeks almost all this spam has been from "me". Very easy to ignore it, and I do wonder why they even try it. Certainly gmail does a good job of classifying these ...