Search bots and third party spiders are generally pretty stupid. They request files which don’t exist, get stuck in dead ends, don’t take into account site or platform-specific evidence
which might help them more effectively crawl, and generally prefer brute force (more crawling depth and resources) over crawl finesse. This behaviour clogs up your server logs with 404s and adds unnecessary server strain (you’re still using processing and bandwidth overhead to serve those 404s, and they generally aren’t cached), but also means that search engines and evaluative tools aren’t getting a good understanding of your website. More sophisticated bots are problematic in other ways – Google, for example, pro-actively manipulates URL strings and forms to pages and URLs which it otherwise miss; but in doing so, it too generates overhead and errors.
Bot-by-bot, this shouldn’t affect you greatly. However, it’s not unheard of for a relatively large website to get hit by many dozens of bots per minute, generating many thousands of
erroneous records (and causing SEO/social/user/technical issues) per day. If you’re an SEO perfectionist, or looking to squeeze out an extra drop of performance and visibility from your website, you’ll want to make sure that any time or resources which search engines are using to crawl your site is being used as effectively as possible. That means anticipating the kinds of mistakes they’ll make, and catching them before they happen.
The following is a list of ‘core redirects’ which I’ve compiled from my own experience – if you have other ‘must have’ redirects, let me know in the comments!
If you’re running WordPress, the Redirection plugin is perfect for setting up and managing your redirects and error management behaviour. Otherwise, your .htaccess file is a great place to start.
Lastly, bear in mind that, as ever, these are tailored for my sites, my needs. Implementing them as-is my cause redirect loops or errors. These are meant as learning material to inspire you to craft your own solutions, rather than as a copy-paste resource.
My Core Redirects
Redirects which tidy up erroneous filename requests (make sure that the correct/canonical version of the file exists, and/or adapt the rules to fit your own unique circumstances!):
- Redirect erroneous requests for an IOS icon to the correct file.
- Redirect erroneous requests for a favicon icon to the correct file
- Requests to invalid/incorrect XML sitemap filenames (modify as appropriate to suit the correct version(s) for your site)
Redirects which clean up generally unfavourable behaviour
- Requests to paginated children of, or date-based queries to the root URL (doesn’t make sense on non-blog websites – also bear in mind that this rule is intentionally quite greedy)
- Requests for empty search strings (triggered by form submissions)
- Breaking requests for images with malformed parameters (frequently used by Bing!)
- Redirect all feed requests (if you don’t use feeds)
Utility redirects which solve specific problems caused by third party bots/networks
- (Sometimes) breaking requests from Facebook containing post tracking parameters (do not use, or at least modify, if you rely on these strings for tracking/analysis or other functionality).
Things I haven’t covered
This is really just the tip of the ice-berg; at the moment, for Days Of The Year, for example, I’m running upwards of 600 redirect rules – however, it’s the rules I’ve outlined here which do most of the heavy lifting, and catch the majority of issues and problems.
Beyond these, there are definitely other areas you should think about; here are just a few places you’ll want to consider turning over some rocks:
- Requests to your own, unique legacy/old/changed URLs and URL patterns (if you’re using Redirection’s, don’t rely on its auto redirect creation on changed post slugs; it’s a little flakey sometimes).
- Pattern rules for when you change your image/thumbnail sizes, e.g,. -76[4-5]x382\.(jpg|png) –> -800-600.jpg
- Pattern rules for when you update dependencies and libraries, e.g., jquery-1\.8\.js –> jquery-1.9.js
- Pattern rules for security probes, such as for ‘backup.tar.gz’ type file sturctures – but these should typically be caught upstream (by something like iThemes Security for WordPress, and/or Cloudflare)
- Fixing broken or malformed internal/inbound links, but this is already covered off superbly elsewhere
- Rules for platform-level problems (or symptoms thereof) which can/should be solved elsewhere, like WordPress’ tendency to allow for empty archives, or page-zero concepts.