cfheader, 404 status codes, and why you shouldn't use them.

{ Posted By : Eric Cobb on August 11, 2009 }
2923 Views
Related Categories: CFML

In a previous post, I wrote about using cfheader to specify different HTTP status codes and how you can use them to guide the Search Engines. I briefly touched on using a 410 "Gone" status code versus a 404 "Not Found", and wanted to expand on that a little bit here.

A lot of times people mistakenly use a 404 status code when they should be using a 410. In fact, there are very few (if any) cases where you would want to programmatically return a 404 error. There is a definite distinction between a 404 and a 410, and it's important to understand just what each one is actually telling the search engines. Just to be clear I'm talking about the HTTP status codes themselves, the responses given by the web server (or CF Server) to the search engines, not the actual error handler page that gets displayed when an HTTP error occurs. I am specifically addressing the status codes that are handed back to the search engines via cfheader.

According to the Wikipedia list of HTTP response status codes, the 404 and 410 are defined as:

  • 404 Not Found - The requested resource could not be found but may be available again in the future. Subsequent requests by the client are permissible.
  • 410 Gone - Indicates that the resource requested is no longer available and will not be available again. This should be used when a resource has been intentionally removed;
Note that the 404 Status Code specifies "The requested resource could not be found but may be available again in the future." What this basically means is your web server is telling the search engines "Whatever you're looking for, it's not here. I don't know where it is or what happened to it, but just keep coming back because it may eventually show back up." I can't think of any good reason where you would actually program a page to return that message to a search engine. The fact that there would even be a page that returns a 404 status defeats the very definition of a 404! In my mind, the actual web server itself should be the only thing that ever returns a 404, and it does so because there is literally nothing there to return and it throws up its hands and quits.

The 410 Status Code, on the other hand, tells the search engines exactly what has happened. "The resource requested is no longer available and will not be available again." That's pretty cut and dry. If you've removed a page and you know it's not coming back (and you're not doing a 301 redirect), then you should use the 410 status code to tell the search engines what is going on.

Now, by using a 404 does that mean that the search engines will keep your old, dead links indexed forever? No. They'll eventually pass away and be dropped from the indexes...eventually. But why would you want your site to sit there and continuously throw errors until Google or Yahoo! finally give up and decide that the content they're after won't be coming back? Just use the proper status codes to tell them that it is gone, and where it went if it has moved.

Now, you're probably asking yourself, "Why should I pay attention to the HTTP status codes, and is this stuff really important?" (I know you are!) Well, you should ask that question to Toys R Us, who just threw away $5.1 million because they didn't use a proper redirect status code when they should have!

Related Blog Entries

Comments
Ben Nadel's Gravatar Excellent stuff. I rarely think about the status codes that are getting returned since I do mostly non-public-facing work. But this is great to know. I should probably update my blog redirects across the board.
# Posted By Ben Nadel | 8/19/09 11:03 AM
Brian's Gravatar It seems a little presumptuous to know that a URL is 'gone' forever. I can understand if it is a gibberish typo, but sometimes broken links are right filename, wrong folder. Someday, that URL may exist. Many search engines will recognize 404 status and not present the link to end users, so unless you are receiving an email for every broken link--which would indeed be a nuissance--I think the issue is resolved from a search engine user perspective. Search engines *should* be checking back again because the web is dynamic place, which is why the 404 status is so prevalent and the 410 is so obscure. Reading your post I worry folks will run out and replace their 404 cfheader references on their cfm missing file handlers, not realizing that the 410 is not a shotgun solution. Nevertheless, I learned something reading your post. Good to know about the 410--I had never heard about it.
# Posted By Brian | 10/21/09 11:55 AM