Wordpress CMS 404 Error On External Files

It’s no secret that as part of our day-to-day work we have found ourselves continually recommending that small to medium businesses start blogging as a way to connect with customers and associates.  When it comes to blogging software, we simply can’t go past WordPress.  It is easy to use for even those with limited technical abilities and as developers we find it highly configurable.

In recent times we have also started expanding how our customers are using WordPress by turning it into a simple content management system that they were already familiar using.  Many of the sites we have been converting were already using WordPress as their blog/news area and usually had a dedicated section (e.g. site.com.au/news/), and when it came time to push it a bit further as a content management system we knew that we would be able to use a lot of WordPress functions within their existing pages even though they were external to where WordPress was located.

Using WordPress externally is a pretty simple cut-and-paste:

Add this to the top of your document:

<?php 
define(’WP_USE_THEMES’, false);
require(’../news/wp-blog-header.php’);
query_posts(’page_id=5′);
?>

Add this to where you want to show page content

<?php if ( have_posts() ) : while ( have_posts() ) : the_post(); ?>
<?php the_content(); ?>
<?php endwhile; else: ?>
<?php endif; ?>

The solution works great, you don’t have to change any of your page structure and the information is now easily updated by the site owner through a system that they were already comfortable using.  At least we thought the solution was working great until noticing that certain versions of Internet Explorer were not showing some of the pages correctly, and the search engines were no longer indexing the same pages experiencing these display problems.

It turned out that the pages that were experiencing these problems were providing a 404 status code to the browser and search engines.  This problem was being caused by WordPress not recognizing the URL name existing in its structure which it relies on to retrieve information from its database when using permilinks.

The technicalities of how WordPress works is not important here, what is important is how do we make this work.  Through trial and error I noticed an interesting pattern.

http://www.site.com.au/company/  was showing correctly and giving the correct 200 response code
http://www.site.com.au/hotel/ was not showing correctly and giving 440 error codes.

When looking at how the pages were named in the WordPress database I noticed that the page slugs (page names) that were not showing correctly were different to the actual page name we were using. The page slug within WordPress was named “the-hotel”.  When I renamed it to “hotel” everything started working correctly.

Now this is fine if you are aware of it, but what about if a customer decides to add another page or unbeknown changes the page slug…. we are back to square one. We need to force WordPress into doing the right thing.

This can be achieved by either bypassing some of the WordPress functionality and only calling the specific functions that we need to extract information into the pages, or forcing the correct status codes after calling the WordPress functions.

Option 1: To bypass some of the WordPress functionality you can use the following alternate code at the top of the page:

require(’../news/wp-config.php’);
$wp->init();
$wp->parse_request();
$wp->query_posts();
$wp->register_globals();

Option 2: To force your page to display the correct status code do the following:

define(’WP_USE_THEMES’, false);
require(’../news/wp-blog-header.php’);
header(”HTTP/1.1 200 OK”);
header(”Status: 200″);

So what is the best way to get this job done?

Well the first thing is to make sure that you name your page slugs identically where possible.  Secondly there are advantages and disadvantages for either option 1 (bypassing some functionality) or option (forcing WordPress to display correct status codes). In terms of forward compatibility I would opt for forcing WordPress to display the correct status codes. The trouble with the first option is that if the WordPress team for some reason decides to change its naming conventions (which they have been known to do), a simple upgrade could easily bring down your entire site.

With all these changes you would think that it would be smooth sailing from here on in….  you’d be wrong!  There is one other issue that you need to take care of….. preventing the search engines from indexing duplicate content.

The duplicate content problem!

Because we are adapting an existing site that already has a blog (News area) into a fully fledged content management system we need to be aware that the pages that we create within WordPress will have its own unique address within the WordPress structure.  For example we have installed WordPress within its own folder “news” and all of the news posts have an address something like “http://www.site.com.au/news/2007/11/23/story-name/”.  Now as we start adding pages for the CMS, WordPress will create its own unique address along the lines of “http://www.site.com.au/news/company/”, then when we extract this information into an existing page we will have created an exact match at http://www.site.com.au/company/, hence the duplicate content problem.

The easiest way to get around this is to build a simple conditional statement within the WordPress template that tells the search engines not to index and not to follow any information if it happens to access any of these newly generated WordPress pages.  This can be achieved by adding the following code into the header.php file of your active theme:

<?php if ( is_page() ) { ?>
<meta name=”robots” content=”noindex, nofollow” />
<?php } ?>

So there you go, a simple and effective content management system without some unexpected bugs.  Perhaps in the future I might do a few more posts about some additional steps that we take to expand WordPress ability as a search engine friendly content management system.

Leave a Reply

© 2008 eMedia Worx | ABN 22 114 262 601
1 Tuckeroo Drive Ballina NSW 2478 Australia
Phone: +61 2 6686 6262

eMedia Worx Search Marketing Blog is proudly powered by WordPress
Entries (RSS) and Comments (RSS).