Fighting Image SPAM: FuzzyOCR Resources

I posted recently regarding the battle against image SPAM. One of the comments pointed out that defeating image SPAM was actually fairly straightforward if OCR software was used to scanned image attachments. I agree, but the real problem is that very few people have access to an Optical Character Recognition (OCR) based scanning solution. Either you pay the BIG bucks for a commerical SPAM filtering solution, or you get an uber-geek to integrate FuzzyOCR into your existing Spamassassin based solution. I would guess that 95%+ of email inboxes are not currently protected by OCR so it appears the image spammers aren’t yet too concerned about fooling OCR-based anti-SPAM solutions.

I admit that I haven’t gotten around to implementing FuzzyOCR yet. I guess I just assumed (wrongly?) that the spammers would test their messages against FuzzyOCR before sending and that would make it difficult to catch them. Apparently that is not necessarily the case so I started searching around the web for resources on implementing FuzzyOCR. My preferred platform is FreeBSD, but I also run several Ubuntu servers so that is an option as well. Here’s what I came up with:

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

Gravatar
WordPress.com Logo

Please log in to WordPress.com to post a comment to your blog.

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s