Using htaccess to open the door for the Google bot / spider

Search engine bots, commonly known as “spiders”, are standard tools used by search engines to find and index web pages throughout the internet. In order for a spider to find and index a web page it must first discover it through manual submission on their site or more commonly a link from a site that has already been indexed by the spider. A common problem occurring with large scale dynamic sites today is that the links used throughout are query string heavy. This makes it difficult for a search engine spider to index deep within these large-scale sites because of specific precautions the spiders take when dealing with these types of URLs.

An example of a NON search friendly URL

http://www.mysite.com/page.php?v=value1&p=value2&c=value3

An example of a search friendly URL

http://www.mysite.com/page-value1-value2-value3.html

This white paper will detail the use of the . htaccess file for Apache based web servers to take dynamic, query-heavy URLs and turn them into clean, efficient, search engine friendly URL structures.

What are the benefits of search friendly URLs

To put it simply, search friendly URLs equate to more of your pages being indexed which will ultimately result in more search engine traffic as a whole. It is very likely that even some of the most popular large-scale sites you visit today recieve only a fraction of their potential search engine traffic due to the simple problem of non search friendly URLs.

Where do we begin

To begin you must first find out what web server software you are running. This white paper addresses the most popular web server software, Apache. Apache is primarily supported on Unix based systems.

Apache and the . htaccess file

In order to create search friendly URLs for the Apache platform you must use the . htaccess file. The . htaccess file is a distributed configuration file primarily used to make configuration changes on a per-directory basis. In this demonstration we will leverage the functionality of the . htaccess file to change the way incoming URLs are processed.

Creating your . htaccess file

Creating the . htaccess file is fairly easy. First create a blank file at the root of your web site and name it . htaccess (if using Windows to edit files you may get a message not allowing you to save this file without a name before the extension, simply name it htaccess.txt and once saved right click on the file and rename it to just . htaccess).

Once the file has been created it is now time to place in instructions that allow the transformation of query-heavy URLs into clean search friendly URLs. For this demonstration I will use the example URLs listed above, the following . htaccess code will allow the ability to view the search friendly URL despite the fact that the URL is truely dynamic and must recieve multiple query data.

Options +FollowSymlinks
RewriteEngine on
RewriteRule ^page-(.*)-(.*)-(.*).html$ /page.php?v=$1&p=$2&c=$3

That’s it, the 3 lines of code above will turn a dynamic URL into a search friendly URL.

How it works

Although there are 3 lines of code in the . htaccess file only 1 line of code does all of the actual work.

Line 1 contains “Options +FollowSymlinks”, this line must be added to allow any rules within the . htaccess file to work.

Line 2 has the following code “RewriteEngine on”, this line turns on the rewrite engine which gives us the ability to begin creating URL rewrite rules.

Line 3 has the following code “RewriteRule ^page-(.*)-(.*)-(.*).html$ /page.php?v=$1&p=$2&c=$3”, this is the real meat of the file which specifies the format of the search friendly URL and then what the actual code being sent to the server should be. For instance you see the following code “^page-(.*)-(.*)-(.*).html$”, this is the search friendly URL format, each (.*) will grab a query value to pass into the dynamic URL. The following code “/page.php?v=$1&p=$2&c=$3” is the dynamic URL, this is the request that is actually sent to the server when the search engine friendly URL is used, you will notice that for each of the query values is a variable such as $1, $2, or $3, this correlates to the values that was passed into the search friendly URL.

To show an example lets say I went to the following URL “http://www.mysite.com/page-article-business-taxes.html” it would be the equivalent of typing in the URL “http://www.mysite.com/page.php?v=article&p=business&c=taxes”.

Possible Problems

Few things in the tech world ever go without a hitch and URL Rewriting in no exception. Please be sure to check the following items below if problems occur when creating search friendly URLs:

  1. Make sure you are using Apache Web Server
  2. Check to see if your Apache has been configured to allow . htaccess files or if the naming schema is different, by default . htaccess is typically enabled.
  3. Check your . htaccess syntax, one mispelled word or misplaced character can break your . htaccess file, typically you will see a 500 error if this is the case.

I don’t have Apache running my website

This white paper covered the creation of search friendly URLs specifically on the Apache platform. If you have a need to create search friendly URLs on other platforms (ie. IIS, ColdFusion, etc.) please feel free to contact us at the information below. USWeb has extensive experience in increasing search engine visibility on a wide variety of platforms and content management systems.

Get help from experts

If you operate a large data-driven site and have a need to increase search visibility please feel free to contact us. USWeb works with some of the largest publishers in the world and can help your business increase its search traffic.

About the author

Shaun Shull serves as the Vice President of Emerging Technologies for USWeb. With over 7 years of intricate industry experience Shaun has played a vital role in expanding USWeb’s services and intellectual property. Shaun is responsible for managing and implementing strategic tools, products, and properties utilized by USWeb and its clients. During Shaun’s off-time he is a frequent blogger and operates his own personal site at shaunshull.com.

ads
Get our hottest stories to your inbox.
Check your inbox for a confirmation email.

Have something to add to this story? Share it in the comments.

3 thoughts on “Using htaccess to open the door for the Google bot / spider

Have something to add to this story? Share it in the comments.

Leave a Reply

Your email address will not be published.