Posts Tagged ‘google’

Should I use a robots.txt file?

Sunday, July 1st, 2007

Well Yes you should use a robots.txt file for every website that you can.

This is important for it gives you control over what is indexed on your site and what isn’t. It also helps your site seem more normal to the search engines and more professional.

I have worked on some sites that were not being indexed by some search engines for some unknown reason. Once we created a robots.txt file and uploaded it and did some other tweaks their site began to get listed by that search engine. What change was the cause for the indexing was not clear.

But for the amount of time it takes to create a robots.txt file and upload it is definitely worth it.

Many sites do not have one and are indexed well and rank well for their keywords. It is not absolutely necessary but is good practice to create one. It also allows you to tell them what pages you do not want indexed. Not that this is absolute; as many web designers have had search engines ignore the details of the robots.txt file and index the whole site. If you have info you do not want indexed by a search engine like employee names and details, social security numbers and such you should consult with a security consultant about how to make this information available to your employees but hidden from the search engines. Like much of Internet marketing it is an attempt to control the search engines, which continue to be quite mysterious, but guidable with solid proven practices.

How to write a robots.txt for a website

Sunday, June 3rd, 2007

An example of a robots.txt file can be taken directly from a popular auction site Ebay. You can see the file at: http://www.ebay.com/robots.txt. The content of this file is as follows:

### BEGIN FILE ###
#
# allow-all
#
#
# The use of robots or other automated means to access the eBay site
# without the express permission of eBay is strictly prohibited.
# Notwithstanding the foregoing, eBay may permit automated access to
# access certain eBay pages but soley for the limited purpose of
# including content in publicly available search engines. Any other
# use of robots or failure to obey the robots exclusion standards set
# forth at <http://www.robotstxt.org/ wc/ exclusion.html> is strictly
# prohibited.
# v3
#
 
User-agent: *
Disallow: /help/confidence/
Disallow: /help/policies/
Disallow: /disney/
 
### END FILE ###

This is the standard robots.txt that most sites put up in that it allows everything to be indexed except the cgi-bin directory. The beginning 15 lines are remarks and is a good guide for those not used to writing these files. If you are not sure what to have in your robots.txt file the last four lines without the pound sign are what are really read. You can create or edit these files with a simple text editor like the included notepad in Windows. Then just save the file as “robots.txt”. Then upload it to the main directory of your site. This is the directory that has the index.html.

I then recommend trying to go directly to your new file to make sure it is online. You can check this at: http://www.yoursite.com/robots.txt. It will take you directly to the text that you saved hopefully. Few people will actually travel to this page of yours, but search engine spiders will do this the first thing they do when they get to your site.


FireStats icon Powered by FireStats