Procmail and obfusticated spam

As a result of the torrent of spam I've been receiving from the Sobig.F virus, my tolerance for spam is at an all-time low. Like most people I get my share of 'medical' spam, offering products to increase, decrease or otherwise modify various parts of my anatomy. In the past most of these have gone to an email address I have kept for web use and were therefore easy to catch, but I'm now starting to get them on my primary email address as well. I therefore decided to whip up a procmail recipe to deal with them, using a list of keywords and procmail scoring. However, as I soon learned, the spammers have tried to prevent you doing this by obfusticating the contents of the spam. They do this by sending out HTML-format emails, and obfusticating the HTML so that a simple keyword match won't work. However, with a small perl script and a little bit of procmail magic, this was easily circumvented. I've written this up because I think it show some useful and underused features of both perl and procmail. If you are interested, read on.

