Running a blog – or indeed any website which allows people to submit content and comments – pretty much guarantees that you’re going to receive comment spam. You’ll likely find that comments are submitted on popular or prominent posts, but that these comments often don’t make sense, are off topic, or are simply illegible.
Since launching this website, I've received well over 100 obviously spam comments - and whilst this isn't the end of the world, I'd rather spend my time writing than digging through adverts for questionably effective pharmaceuticals.
The Challenge
Whilst it can be feasible to manage and manually delete small amounts of this kind of spam, this becomes impractical when dealing with large volumes of comments. It’s also not always as easy as you might think to identify spam content – the quality of spam ranges from illegible characters and broken text through to clever, targeted messaging; intelligent spam is often remarkably easy to miss. The time required to determine whether or not a seemingly legitimate comment may in fact be spam by, e.g., ensuring that the author's website appears to match their poster profile and contains legitimate content, becomes a significant time sink when trying to keep on top of large numbers of comments.

This is loosely on-topic, has a name, a face, and isn't obviously linking out to a spam domain - it could easily have slipped through if I was dealing with large volumes of comments and wasn't reading them closely.
As such, an automated solution which can identify and stop spam is a must-have. Beforejumping in to the deep end and picking plugins, systems and solutions, it’s valuable to understand what comment spam is, why it exists, and how to avoid some pitfalls.
Why does comment spam exist, and who are the spammers?
The most important thing to understand about spam is that it’s rarely random, and almost always has commercial intent. More often than not, this kind of spam is a SEO technique. The spammers are attempting to gain a large volume and variety of links pointing back to their websites in order to increase their authority, and in turn to sell links to other websites (including those of legitimate businesses) in order to increase their authority - the more websites upon which they can place links pointing back to their sites, or intermediary sites pointing back to their clients and contacts, the more performance and profit they can achieve.
Thankfully, the major search engines got savvy to this kind of manipulation very early on and introduced the much misunderstood 'nofollow' link attribute, which instructs search engines that the destination of the link isn't explicitly endorsed (or is explicitly not endorsed, depending on your perspective), and that its shouldn't be treated in the same way as a link from within, e.g., an authored blog post*. Contrary to popular belief, nofollow links actually tend to be 'followed' (or crawled) in the same way as other links on a page, but no value, context or authority is thought to passed through them in the same way that it might be through a normal link; though it should be noted that theory shows that the presence of nofollow links within an overall profile has a correlation with higher rankings - this may be a direct relationship, or simply a correlation with a more organic and naturally segmented link profile.
*This distinction was used historically to manipulate the flow of value through internal links within a website - a process known as 'PageRank sculpting'. This worked on the premise that, in line with a very basic model of PageRank, value flowed proportionately between all links on a page; and adding a nofollow attribute to links would result in proportionately more value flowing through the remaining links. Google announced in 2009 that this 'redistribution' of value had actually been removed from as long since 2008, and that the proportional value simply 'evaporates'. This again is over-simplistic, and based on a very simple PageRank model - the reality is likely to be vastly more complex.
Frustratingly, spam often leads to more spam - once you've received just a few comments, you'll likely find that significantly more will follow. The presence of spam comments can subtly alter the inbound and outbound link profiles of a site sufficiently for it to be easily detectable by the processes used by comment spammers to find likely prospects. The increased (albeit off-topic) content created by the spam comments is also liable to make the page look more on-topic in terms of what the spammers are looking for - if your authoritative post has content discussing illicit pharmaceuticals, you'll find spammers queuing at your door to add their links.
The example spam comment I've highlighted at the top of this post is particularly interesting, as the motivation isn't entirely clear. They're linking out to a Facebook profile (which upon investigation is either fake, or in no way associated with the pseudo-identity which presents the comment - in this case, it looks like the profile of a teenage Japanese girl), which would suggest that either they're working to promote that account itself (possibly as part of a multi-layered spam network where that profile in turn links to and promotes a subsequent URL), or they're looking to add a semi-legitimate comment in the hopes of increasing the relevance of the affected post page for terms they're looking to target - and they'll follow up with a second round of comments once the page appears to be more on topic for 'SEO Pack for WordPress' in the example above (and yes, I've probably just done their job for them).

I'm still not sure if this one is actually spam, or just poorly written - except that the domain is heavily promoting some kind of muscle building voodoo... That is some impressively targeted spam.
It should be noted that, historically, spam comments are rarely submitted by humans – the effort required to manually write, complete and submit comment forms across pages spanning thousands of websites (with no guarantee that your comments would get through any spam filters, or be accepted by website owners) is a huge manual overhead. Spammers are often software programmers who create bots, crawlers and scrapers in an attempt to find valuable websites and pages with comment forms, and to intelligently submit content through these forms.
More rarely, comment spam that isn't designed as an SEO tactic may simply be targeting your readers, attempting to incentivise them to click through to a targeted link, and to ultimately spend their money (all the while attempting to look like a legitimate comment).
Disclaimer: As a technically-minded SEO practitioner, I'm absolutely fascinated by this kind of tactic, the nuances of the approach, and the mechanics required to execute it at a scale which makes the production effort and maintenance overhead worthwhile... However, as an SEO practitioner with an aspirational outlook and business-consultancy focused approach, I do not endorse or encourage this kind of tactic.
Comment spam is not the kind of SEO I'd ever recommend to a client or to any legitimate business - moral and ethical considerations aside, it's my firm opinion that no matter how technically clever and scalable the approach, the return and value generated (both direct and secondary) can never be as high as the returns generated through undertaking strategic improvements to a business, website, content strategy, product/service offering and online marketing mix; even the best comment spam infrastructure will never rival a genuinely synergistic SEO and Social strategy which connects real consumers with a business, regardless of how many links those comments might acquire.
Creating a system capable of making this work is a pretty cool, geeky pipe dream, and a criminal waste of time; even if it makes you rich overnight (after months of solid development), there's an increasing chance that it will become utterly redundant in any number of ways just quickly, as Google focus increasingly on authorship, link quality and user signals.
Why does my WordPress blog get lots of spam?
As a blogger using WordPress as a CMS I've some considerable advantages and disadvantages when it comes to spam. WordPress' size and popularity means that there are all manner of tools, plugins and processes for managing spam, when makes resolving or burying the problem reasonably achievable; and more are developed, released and improved each day to tackle the latest challenges (much like an anti-virus software solution on your computer). However, consider that the primary mechanic of most comment spam systems is to find ways to inject comments through web forms - the proliferation of WordPress websites (which all share a universal back-end system, and where a vast majority of sites use off-the-shelf comment systems) means that generally speaking, if the spammers can work out a way to submit comment spam through a single plugin or website, they can roll it out in principle to the 15% of the Internet which is powered by WordPress (according to W3Tech's recent [Feb 2012] report on the utilisation of content management systems across the web). Comment spammers, especially in the case of WordPress sites, are working at incredible economies of scale.

W3Techs reporting WordPress as the underlying platform for nearly 16% of platforms - Feb 2012
This weakness or vulnerability of scale isn't just skin-deep, and take WordPress' greatest strength and use it against the platform.
From a development perspective, one of WordPress' key features is that every element of functionality and behaviour can be tapped into, controlled, modified, extended, changed or removed entirelyu through the use of 'hooks' - bits you can grab onto and take control. For example, there hooks for when posts are saved, when pages are viewed, when settings change, and when comments are submitted - and these hooks are universal 'under the hood'.
If I develop a plugin which changes the presentation, functionality and behaviour of the way in which my comments system works, chances are that I'll produce that plugin by building on WordPress' existing hooks - regardless of the look, feel, bells and whistles, nothing's changed at the very core of the system. As such, the vast majority of WordPress websites, regardless of their design, functionality or comments system utilise a single back-end system, which is designed to play nicely with whatever their plugins and addons want it to achieve.
If I know and can anticipate that WordPress is expecting to process a piece of POST data for a 'comment' field (i.e., somebody has filled in a textarea with an ID attribute of 'comment' and submitted a form), then it's almost irrelevant what the form looks and feels like, and how it functions - I can bypass the front end entirely. In fact, there are some scenarios where a page doesn't even need to be published and/or accessible in order to target it and submit spam comments to that post.
Essentially, all a comment spam system needs to be able to do is to anticipate what the WordPress comment system hooks are anticipating (in terms of POST data), and to submit that appropriately. Clever captchas, varying field layouts and ID attributes and other differences between individual websites, plugins and approaches are generally minor hurdles.
WordPress' flexibility makes it a wide, open target for this kind of spam.
How do I stop spam?
Intelligent spam is incredibly difficult to stop completely - given their economies of scale, commercial drive and myriad of attack vectors, the best you can hope is to attempt to stay ahead of (or, at least, in line with) the latest trends.

Machines might struggle to read this, but cheap labour sure can - and at scale.
Even solutions such as CAPTCHAs aren't fool-proof - though I've alluded to the fact that much spam is generated by software rather than humans, human labour can certainly form part of their automated spam attacks. CAPTCHAs are great at blocking automated comment spam, but the rise of cheap, digitally managed labour (such as that available from Mechanical Turk or oDesk), human-powered CAPTCHA-breaking can be easily integrated into an automated system at an incredibly low cost, where mass labour can complete CAPTCHAs at a rate of hundreds - if not thousands - per hour. Aside from the poor usability implications of this approach, it's by no means a robust enough solution to significantly reduce or solve your spam problem.
The increasing availability of this kind of labour is making human-generated and managed spam much more commercially viable; where machine-based spam is failing, simply hiring low cost labour to write spam by hand, to brief, and on topic is often more cost effective. This has worrying implications, and will become increasingly challenging to tackle and avoid.
So, what are the options?
Plugins to the rescue - Akismet and Bad Behavior
Akismet is shipped with all WordPress installations as standard, and it represents one-half of the most comprehensive but hands-free solution to spam that's easily available off-the-shelf.
It works against spam in much the same way that spam itself works - they maintain and evolve a central, scalable system which attempts to identify and intercept spam in increasingly intelligent ways, by monitoring the nature and behaviour of spam and attempting to continually keep up with and overtake the spammers. Every time a comment is submitted to your website, Akismet uses WordPress' comment submission hook to intercept the comment, check it against their spam detection algorithm and software (hosted on their servers), and determine whether it's spam or not - and process it appropriately in your admin area.
Akismet is massively underutilised, primarily due to its requirement for entering an API key. Whilst Akismet is free for bloggers, it isn't obvious on their website (despite them mentioning it frequently) how to get a free key. Registering an account with WordPress.com entitles you to a key as part of your account, but this journey isn't well signposted - and the perceived 'technicalness' of this process means that whilst all WP sites come with Askismet installed, it's rarely activated, and rarer still configured and active.
It should be noted that Akismet isn't limited to WordPress (and that there are a myriad of ways to tap into their servers, API and systems), but that usage outside of free, not-for-profit blogging beyond requires (and deserves) a licensed subscription.
Bad Behavior (note the American spelling) represents the other half of our magic solution - it operates in a completely different way from Akismet, and tackles spam by stopping it before it even arrives on your website.
Every time a user (or piece of software) requests a URL on a website, it must perform a handshake with the server - the requesting device presents it's credentials, exchanges details, and then proceeds on to load the webpage. Bad Behavior assesses these credentials and looks for anomalies, missing elements, or signals consistent with spam behaviour. As Bad Behavior's website points out, the development quality and adherence to web standards on spam systems is generally low, and generally tend to leave an obvious fingerprint in their server requests.
Aside from stopping the spam systems ever getting near your website, the fact that these systems and users aren't requesting resources, downloading HTML and images, and using your website means that you're saving a nice bit of bandwidth. Win/win.
(If you're installing Bad Behavior, don't forget to un-tick the 'show statistics in blog footer' option in the settings page, unless you're really keen on adding that information to all of your pages)
The combination of both Akismet and Bad Behavior means that you stop a significant chunk of spam at the door, and anything that gets as far as actually submitting a comment will be subject to rigorous testing and checking - anything that gets through probably deserves to get posted just for sheer ingenuity and perseverance.
Moral of the story
I'll admit that I've cheated on this one, and actually installed both of these plugins several days ago, as the half-dozen painfully low quality spam submissions that I was receiving each day was driving me mad - I've not had a single spam comment since.