Image SPAM and the future of the anti-SPAM battle

We were doing so well!�

Most anti-SPAM solutions are getting pretty good at categorizing SPAM by doing statistical analysis on the text found in the email. Run a message through a Bayesian filter, do a few regular expression checks, and you can be relatively sure if a message is SPAM or not.

So what happens when the “message” in the SPAM message isn’t text but an image instead? The spammers have come to the conclusion that text based SPAM isn’t working anymore, so now they have moved on to a new strategy: image SPAM. I’m sure you’ve all seen it. A typical image SPAM email contains a small image with fuzzy text that promotes a “pump & dump” stock scheme. Beneath the image is usually several paragraphs of random text which are meant to fool anti-SPAM software into allowing the message to pass as a legitimate email.

The image SPAM emails pose a really interesting problem for anti-SPAM software because they almost force us into using Optical Character Recognition (OCR) technology to try to suck the text out of the image and determine if it is “SPAMMY”. Images are intentionally made fuzzy and obscure in order to make life more difficult for OCR software. The real problem is that OCR technology is processor intensive and an onslaught of image SPAM can potentially bring a mail server to its knees.

Short Term� Solutions

So what is the short term solution? By all means, use OCR to try to catch the offending images.� However,� the best case scenario would be if we could stop relying so heavily on the content of the email to determine if a message its SPAM-i-ness. Currently, that means we’re left with RBLs (Real-time Blackhole List) that track known SPAM sending IP addresses, HELO checks, greylisting, checking for forged message headers, and Sender Policy Framework (SPF) checks. Essentially, we need to make the most efficient use of all the tools available to us which don’t involve analyzing the actual body content of the message.

Long Term Solutions

In the long term it seems clear that some sort of universal system for establishing the trustworthiness of email senders needs to be established. Not only that, but the solution is likely going to need to come from the open source community and gain endorsement from a group of major technology vendors in order to meet industry wide acceptance. A solution along the lines of Microsoft’s Sender ID is needed, but its not likely that anything from Microsoft is going to gain acceptance from the open source community.

One thing is clear: Every time the anti-SPAM camp makes progress, the spammers regroup and come back with refreshed vigor. The problem of SPAM isn’t going to go away anytime soon, and it is likely to get worse before it gets better.

2 thoughts on “Image SPAM and the future of the anti-SPAM battle

  1. Image spam isn’t all that difficult to deal with, the open source community already has a solution. The combination of gocr and fuzzyocr integrated into your spam scanner will to a fairly effective job of identifying image spam. Some researches have made headway with identifying the rather repetitive MIME structure of the pump-and-dump messages as well, since the spammers aren’t yet varying the structure of the message, just the content.

    I did want to take exception to your comment about the open source community not accepting a solution put forth by Microsoft. Of course they will, so long as Microsoft releases it so the world can take advantage of it. To date Microsoft has failed to realize that significantly lessening spam is in their best interest and have subsequently failed to release Sender-ID under an open license. If they do produce a working solution under a reasonable license, the world will beat a path to their door.

    It’s not clear (to me) that Sender-ID is a workable solution either, but it most certainly isn’t going to be adopted as long as Microsoft has their foot on the wallets of anyone who adopts their protocol. This is not what the Internet is about.

  2. I think that the open source community has a general distrust of Microsoft’s profit motives and would thus be fairly unlikely to accept any solution put forward. The only way it would likely be fully accepted is if other companies and a few open source heavyweights got behind it. Not saying it won’t happen… but I’m not holding my breath!

    Yes, OCR is dealing with some of the image SPAMS. But the spammers will get better. Then we’ll get better. There is no end in sight to the see-saw battle.

    My point is that something really does need to be done to bring email into the 21st century. SMTP mail was designed based on “I trust everyone”, and it really needs to be changed to “I trust nobody unless they can prove otherwise”!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s