If you review your website statistics or server logs you may see that some files have not been found when searched for such as favicon.ico or another file robots.txt file. This is often the first time some new web designers have heard of this file is when seeing that it has not been found. Robot.txt is a file that is placed on a web server in the root directory and tells search engines what parts of the site to read and index for their listings. Search engines send out automated processes along the web that are called “spiders”. They are called this because they search the web, and tend to travel along its paths. It follows links from one site to another and from page to page.
The spiders are to review a portion of the World Wide Web and weigh the sites for the importance of the site and the keywords that each page is about. Then comparing all of these it arrives at a ranking system for all keywords. Obviously anything that has to do with search engines and how they rank sites has significance to Internet marketers and site owners. See the next post in this series to know how to write a robots.txt file.
One of the first things that robots look for when searching a site is the robots.txt file. It contains text that often is quite simple but tells the search engine what pages to visit and consider for listing in their search engine listing. The robots.txt file can also list which pages or directories the site owner would prefer not having indexed and that do not contain information that the owners would like directly linked. Many web designers tell the spiders not to index the cgi-bin directory, a photo gallery or maybe a part of a framed page.
Tags: robots.txt, search engines