One of the first decisions I make when installing a new WordPress site (and even prior to this step, if proper consideration has been given to the information hierarchy and content structure) is to configure the permalink structure. Configuring permalinks defines how WordPress produces, structures and manages the URLs of pages, posts and content.
From a technical perspective, it’s important to understand that regardless of the URL you type in, WordPress ‘routes’ any request through a central index.php file (which is handled via the default WordPress .htaccess configuration) which attempts to match the requested URL to one of the WordPress template files.
This means that as long as URLs have a predictable pattern, it doesn’t matter what that pattern is – URLs can utilise any logical structure or pattern, including date-based systems (where, e.g., URLs might reflect the year, month and day of a post), author based approaches, or entirely custom structures.
Whilst WordPress provides a handful of reasonable defaults, without fail, I almost always elect to change from the default of ‘?p=n‘ (where n is a numeric identifier for the post) and to set a custom structure of /%category%/%postname%/.
This creates URLs which output the name of the page, and if present/applicable, intelligently prefix this with any categorisation (or levels of categorisation) that page has. It changes the URL of this very post from ?p=16 to /blog-updates/permalinks-category-pagination/. You’ll notice that I’ve also removed some redundant words from the string – ‘change-1-configuring-permalink-types-category-pagination-fix-plugin’ is a little unwieldy, and I don’t strictly necessarily need all of those ‘stop words’ in place to accurately convey the core concepts of the post; I’ve also avoided stripping out so much information so as to leave the URL generic, and to risk having large numbers of very similar URLs addressing very different topics.
The /%category%/%postname%/ configuration is more adaptive than it might seem, and automatically aligns itself to different content types, requests and structures – date-based content (e.g., an archive page listing all posts from 2012) follow a date-based hierarchy even though there are no category or post-name components in the URL, and tag, taxonomy and custom structures all follow suit. ‘/%category%/%postname%/’ essentially outputs a structured, hierarchical URL regardless of content type, request or page.
This approach has a number of advantages, not least of which is that is passes the ‘gran’ rule of thumb test, where an elderly relative should be able to follow instructions over the phone URL to enter a URL. At a very basic level, this is a much more comprehensible structure from a human perspective, and that should always be a key priority.
Secondarily, this approach allows for ease of identification of present location – regardless of where you are in the site, the URL acts as a clear breadcrumb and signpost; any URL with multiple levels represents a page which which has ancestors which reside at a ‘level above’ the current page, whereas ?p=n, or a date, author or other solution may not provide this clarity of location.
From an SEO perspective, this approach can be benefitial, as it contains relevant keywords in the URL and clearly demonstrates the location of the page in relation to other pages to search engines and crawlers. Whilst Google aren’t fussed about the hierarchy of your content (and rather look much more closely at the internal linking and relationships between pages), it still helps to have a ‘structured’ hierarchy for housekeeping and to clarify the relationship. Interestingly, Bing (and Yahoo) do allude to having a ‘maximum crawl depth’, and suggest that they’ll tend not to crawl deeper than 5 folders down. Whether this explicitly relates to the number of components in the URL, or the depth of the content when navigated down through (or a hybrid – or neither) they’re not clear. Food for thought!
Analytics tools also benefit here, as many (including Google Analytics) provide facilities to analyse content ‘silos’ through ‘drilldown’ reports – these are all based on the premise of categorised, hierarchical content, and whilst these features can be configured to work with non-hierarchical or categorised URLs, it’s a heft piece of work to get that kind of integration off the ground and continually maintained.
Conversely, this hierarchical approach does some times cause challenges by introducing a significant level of depth to URLs. Attempting to navigate deeply through site sections might result in complex or long URLs involving both categorisation and sub-categorisation (as well as pagination, date refinement and so forth), leading to results such as /blog-updates/subcategory-name/long-post-name-here/. This is arguably borderline on our ‘gran’ test, and in an ideal world would be much ‘flatter’ (i.e., it wouldn’t contain the sub-category string, and/or the category, or perhaps even both); this is fine in principle, but the more we try to artificially manipulate the hierarchy the less maintainable the whole becomes, where each tweak and change results in an increasingly fragmented experience and less coherent hierarchy.
There is a wider argument here around whether a straight hierarchy is the best way to structure content. There is a strong school of though around date-based structures (e.g., /yyyy/mm/dd/post-name/), and a strong school of thought for completely ‘flat’ structures – there are definitely cases where these kinds of approaches makes sense (e.g., where content has a significant temporal or transient nature), but I’d argue that for a generic ‘blog’ approach that this is the best all-round approach.
Unfortunately, this type of permalink structure is somewhat inefficient from a technical perspective. Because all URL requests are routed through a central index file, the system (server and database) must work to process the URL and to ‘map’ it to the correct content. With the default URL structure this is easy, as, ‘?p=n‘ or indeed ‘?category=c‘ tells the file precisely which page, template and content it should return – however, if we’re not explicitly identifying the post or category ID, the system must search for, find and retrieve the correct content with each request. On small sites this is may represent only a minimal performance hit, but on larger sites (which may have similarly named pages, category names which are similar to page names, and large databases) it can take some serious processing power to retrieve and serve the correct content, and it might not even always get it right.
Fortunately, WordPress are aware of these issues, and are continuing to release performance improvements which specifically address efficiencies in this area. This argument has historically been one of the very few compelling reasons not to make this permalink structure a standardised approach, as owners of large sites would find that as content built up, not only would it slow down as the system attempted to process the URLs, but that their sites might increasingly tend to return entirely incorrect content for some fringe case URLs (e.g., where posts have similar character strings, etc.).
One final drawback is that this approach breaks WordPress pagination within categories. Because the permalink structure is /%category%/%postname%/, any component after ‘category’ which isn’t a subcategory is assumed to be a post name – however, in the case of paginated results (such as /category/page/2/), the word ‘page’ shouldn’t be considered as part of the structure from a functional perspective – it’s a human conceit to maintain the hierarchical, structured approach. Thankfully, there’s an easy fix for this available in the form of the Category Pagination Fix plugin which simply rewrites requested URLs to get them to ignore the ‘page’ component.
Bingo, clean, optimal, (mostly efficiently) URLs.
I’m reasonably convinced that, with the exclusion of some fringe cases involving non-standard content, this is the best approach for URL structures on blog-like sites. What do you think.