A client recently asked me to chip in on a debate they were having with an internal tech team about retiring old 301 redirects. Concerns had been raised about ageing and legacy CMS functionality which needed to be replaced, and as part of this process the IT folks pushed to remove (and not replace) the thousands of redirects which the system was powering. The only hint as to why they’d think this would be a good/necessary idea was a quote from the proponents stating that as Google had seen, visited and clearly indexed the destination URLs for some time, the redirects were no longer necessary, and their removal shouldn’t cause any harm. At best, they’d concede that the SEO guys needed to maintain a smaller list of redirects which were specifically set up ‘for SEO purposes’ (such as forcing http/s and lowercase, and a handful of specific page-level redirects), but that otherwise they’d need to ‘trim the fat’.
Of course, this instantly set alarms bells ringing – the business dominates their market, largely because of the performance of the destination URLs for the redirects in question, and so any negative impact whatsoever on the equity flowing to those pages could have huge real-world, commercial implications.
This is a type of scenario which I see frequently. There’s always huge reluctance from the IT/developer community to implement, support or maintain ‘bulky’ redirect lists either on a day-to-day basis or as part of website launches or significant overhauls, and it’s invariably a painful experience fighting to justify why “let’s just redirect the top 100 most visited URLs on the website” isn’t a viable option. Because it’s hard to quantify the tangible opportunity cost of failing to implement redirects, it’s an incredibly challenging argument to have, and often resorts to pleading for people to take leaps of faith.
Given how many times I’ve fought this corner in my own experience, I thought I’d go all out, and attempt to constructive a comprehensive argument which weighed indisputably in favour of maintaining legacy redirects. Here’s the (slightly edited to preserve anonymity) transcript.
Don’t ever remove redirects. There’s no reason to, and no benefit in doing so. The performance gloom and doom your developers preach is nonsense.
Don’t ever remove redirects. URLs still get hit/crawled/found/requested years after they’re gone. This might impact equity flow, quality, and crawl stuff.
Don’t ever remove redirects. It gives a bad user experience, which impacts the bottom line.
There are a few core considerations from my perspective (from my agency experience, technical SEO knowledge, and Linkdex experience):
- Tech teams are always highly skeptical of redirects, as I’m sure you’re well-aware. They’re often hesitant to implement even foundational numbers of global redirects (such as case-forcing, http/s, etc) due to either:
- A lack of understanding/consideration of the user experience and/or the way in which Google discovers, crawls and indexes (and the way in which equity flows in this process)
- Concern over server performance resulting from high levels of redirect lookups on each request
- A desire to ‘trim the fat’ as a part of general housekeeping because it’s good practice to do that kind of thing in other scenarios and dev processes
- In each of these cases, I’ve found that focusing on the user experience component is a good solution, where setting an expectation that any scenario in which a user hits a 404 error page has a negative commercial impact on the business removes a lot of the ‘SEO politics’ from the equation. You can even go so far as to quantify this through surveying (e.g., “would seeing an error page on our site affect how much you trust the brand?”) and ascribe a £loss value to each resultant 404 per hit, based on estimated impact on conversion rates, average order values, etc. This gives you lots of ammunition to justify the continued existence of the redirects in a context which they can understand and buy into.
- This also goes some way to resolving the performance concerns; any small investment made into optimising the impact of large volumes of redirects (which is always marginal at most, and vastly less than they anticipate). I should note that I’ve personally handled and argued for implementing/retaining large redirect sets on a number of huge site launches/relaunches, and in each case experienced these kinds of reactions; I’ve never once seen any actual performance impact following the successful deployment of redirects numbering in the low to mid thousands (formed of mixes of static and pattern matching / regex rules). London Stock Exchange were particularly hesitant to implement ~1k redirects as part of a large site relaunch, but were surprised to see no measurable performance impact following their implementation.
- If performance concerns are still a barrier, I typically resort to identifying equivalent performance opportunities, load speed improvements and technical optimisations which can offset the perceived cost; and there’s never any shortage of opportunities for simple things like improved resource caching, client-side rendering efficiencies, etc. If it’s helpful, I can assist/support on reeling off some suggestions in this arena, from hardcore back-end stuff to softer front-end stuff. If they can quantify the ‘damage’, it’s easy enough to offset.
- It’s also worth considering that almost all of the major CMS platforms operate by looking up the requested URL against a database to find a pattern matched result; as such, depending on the approach to implementation, having a large list of redirects to check against needn’t necessarily represent an extra lookup, process, or performance hit. If your team are particularly sophisticated, emulating the load-balancer style approach with layers of high-level caching can further mitigate performance concerns.
In an ideal scenario, you’d have a solution in place which monitors how many times a redirect has been hit, and the last time/date at which this occurred; I use a system like this on a number of my own sites, and periodically review for and retire redirects which have been stale for over 6 months, and have a low overall hit count [I use the Redirection plugin for WordPress, it’s one of my favourite tools].
What I’ve found interesting from running this system for a number of years over multiple sites is that:
- Any URLs which used to have any real levels of equity, links and/or performance keep getting hit for years.
- Redirects which have been removed and return a 404 keep getting hit by search engines, indefinitely
- I run ~4k redirects on one of my own sites at present, with no measurable performance hit for turning them all on/off
- Removing redirects in these scenarios has a definite impact on SEO, even if this is only indirect, due to the impact being on the reduced discovery, crawlability, and crawl equity distribution resulting from a bot entering the site at a dead URL
- People still link to dead content; much of the web is stale. We also know that urls which used to be linked to still maintain some degree of equity, so removal of any of these is liable to negatively impact the destination page’s equity.
- Any marginal performance overhead for running this kind of system (the performance fears resurface here in the context of maintaining these kinds of logs) is vastly offset by the value retention and improved user experience
I think that crawl quota is a particularly relevant point for you guys; with a site of your size and scale, any activity which is likely to result in Google allocating lower crawl resource is liable to have enormous consequences to indexation levels and speed, which is obviously to be avoided at all costs.
I’d expect a site of your scale to want to maintain as many of these legacy redirects as possible, for as long as possible, until the impact of them is measurably and demonstrably detrimental and this cannot be offset elsewhere. Upwards of 6,000 sounds a lot, but I think that it’s a reasonable and realistic volume given your site, legacy, etc. Having said that, I’d definitely want to be putting processes in place in the future to minimise the likelihood of URL structures expiring, and planning for more future-proof’d patterning (obviously easier said than done!). Good practice suggests that no URL should ever change or die, which is a card you might be able to play, but I suspect that may lead to blame games around who-chose/designed-which-legacy-URL-structure, which may cause more harm than good at this stage!
If there’s continued pressure to ‘trim the fat’, I’d personally want to investigate, query and quantify every single rule which is proposed for deletion to understand whether there’s still anything linking to it, whether the URL has/had any social shares, and whether server-logs indicate that it’s been hit recently. This is something Linkdex could definitely help with, however, it’s likely to be a resource-intensive process and may not provide any value – even if the vast majority of URLs turn out to be ‘duds’, all of the above rationale around user experience and equity management still apply.
I wonder – as part of the migration process, is there any opportunity to combine any of the redirects into pattern matching rules? E.g., if there are 100 URLs with a shared folder root, it may be prudent to craft a single rule based on matching that pattern. This should reduce quantities significantly.
From a completely alternate perspective, if deletion is to proceed, I’d potentially consider returning a 410 rather than 404 status on the URLs in question. This may help in sending a stronger signal that, rather than a large section of your website having broken/vanished (and the negative connotations associated with this), that a deliberate decision has been made to remove those pages. I’m not convinced that this would make any realistic difference, but it feels like a low risk/effort which may help out.
Hopefully some of this can help bolster your arguments.
As a predominantly technical person, I’ve often argued in favour of redirects from an equity-preservation perspective; it’s only in writing the email that I realised that the two biggest points here are the tangible impact on user experience, and the expected (and somewhat measurable) impact on crawl quotas. I think that next time I run into this scenario, I might not talk about ‘link juice’ or the difference between different HTTP status codes at all, and just focus on these elements. Who’d have thought?