Wednesday, September 22, 2010

The Definitive Guide to Robots.txt for SEO

A robots.txt file provides restrictions to search engine robots (known as “bots”) that crawl the web. Robots are used to find content to index in the search engine’s database.

Surprisingly, many website owners forget to maintain, let alone create a robots.txt file for their websites, so here’s a guide on what a robots.txt file is and how best to use it for SEO purposes.

These bots are automated, and before they access any sections of a site, they check to see if a robots.txt file exists that prevents them from indexing certain pages.

The robots.txt file is a simple text file (no HTML), that must be placed in your root directory, for example - There are 3 primary reasons for using a robots.txt file on your website:

  • Information you don’t made public through search
  • Duplicate Content
  • Manage bandwidth usage

The robots.txt file is just a simple text file. To create your own robots.txt file, open a new document in a simple text editor (e.g. notepad).

The content of a robots.txt file consists of “records” which tell the specific search engine robots what to index and what not to access.

Each of these records consist of two fields – the user agent line (which specifies the robot to control) and one or more Disallow lines.

For SEO purposes, you’ll generally want all search engines indexing the same content, so using “User-agent: *” is the best strategy.

If you want to check your Robots.txt file is implemented correctly, visit your Google Webmaster Center. It allows you to check your robots.txt. Google will automatically and in real time retrieve the robots.txt from your website.


No comments:

Post a Comment