Home | About Me | Contact Me | Archives | Advertise | Make Money Online

Enter Name:
Enter Email:
What is this all about? | COWpetition Home Page
SEOmoz.org - Learn From SEO Experts. Become an Expert.

What Is A Robots.txt File?

Bookmark and Share
Written by Garry Conn on January 10th, 2008 | 9 Comments


Ok, so you want to know what a robots.txt file is and how to use it? I am sure by now many have heard about robots.txt files, but what in the world is it and most importantly, what does it do and how will it benefit you? All the answers to these questions and more are found in this article.

 

The Simple Explanation

 

A robots.txt is a very small file that resides on your server that gives specific instructions to webcrawlers such as the famous Googlebot on which directories and files are allowed to be crawled and indexed. It is important to use a robots.txt file because it puts you (the publisher/webmaster) in total control of where webcrawlers are allow to visit on your website. A more detailed explanation can be found here on Wikipedia, at www.robotstxt.org, and here at Matt Cutt’s blog.

 

How Is It Useful?

 

Using a robots.txt file is useful (especially for WordPress users) because it gives you the ability to say where webcrawlers are allowed or NOT allowed to visit. By drawing this map for webcrawlers you do two things:

  1. You help the webcrawler by not allowing it to index crazy and stupid things such as files in your wp-includes directory.
  2. You help your important content get indexed the way you want it to and prevent duplicate content from getting indexed, such as disallowing your archives, category or tags section.

Every blog is unique and every author such as yourself, stress a unique importance on the various sections within a blog. It is up to you to decide which sections on your blog as well as within your server that you want to grant or deny access to webcrawlers. I have some blogs where I disallow all access except for the homepage and the individual post pages. Other blogs, I don’t care and put total faith into webcrawlers to crawl everything and index and rank things according to their importance.

 

How Do You Implement It?

If you are a WordPress blogger, your blog already has a robots.txt built in. By default, the files is set to allow access to all directories on your server. The process of overriding the default is easy. Simply create a txt file using NotePad, WordPad, TextPage, etc.. and name it robots.txt. From there, visit this page to learn about how you can quickly start adding instructions to it.

What Sections Will You Allow or Disallow?

Now that you know about the Robots.txt file, which section on your WordPress blog are you going to allow/disallow? If you already use a robots.txt file, I want to invite you to share you input by dropping a comment. I’d like to know what has been most effective for your blog. I look forward to reading your comments. Also, if you should have any questions, feel free to ask I’ll be standing by. Also, I am sure that many of my readers will be happy to pitch in and help address your questions too.

GarryConn

comments

Share and Let Others Enjoy: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • TwitThis
  • co.mments
  • del.icio.us
  • Google
  • StumbleUpon
  • Digg
  • Technorati
  • Reddit
  • Facebook
  • Mixx
  • Blogosphere News
  • Blogsvine
  • Bumpzee
  • e-mail
  • Sphinn
  • Fleck
  • Furl
  • Propeller
  • Spurl

Tags: , ,

    Footnote Ads:

  1. Joel Comm's Free Adsense DVD - Get your very own copy of Joel Comm's FREE DVD. It is packed full of Ready-to-Copy profit secrets of the World's Most Recognized Adsense Experts! (Hurry supplies are limited!)
  2. Search Engine Optimisation - Leading Australian search engine marketing company offering search engine optimisation and paid search management services
  3. Shopping Cart Software - World class shopping cart softwarewhich makes it easy to sell anything online.

RSS feed | Trackback URI

9 Comments »

Comment by Frank H M Subscribed to comments via email
2008-01-10 05:26:17

So far I haven’t used robots.txt on any of my blogs. I know that I have to create one to prevent indexing of the wp* directories, but for the content itself, I have so far relied on using excerpts for tag, category and archives to prevent the same content from showing up in too many places.
If I then switch to using excerpts on the front page as well, I might not need to spend too much time on a robots.txt?

These are my thoughts on robots.txt at the moment. (Being a newbie in blogging, they are of course subject to change, if somebody makes a convincing argument for changing it).

 
Comment by Kyle Eslick
2008-01-10 08:52:35

You can view my Robots.txt file here.

It is actually a pretty standard one, though I disallowed a few files that are unique to my site that the search engines didn’t need to see like redirected affiliate links and such.

One of my News Year’s resolutions was to go through and update it, which hasn’t happened yet.

 
Comment by Jason L
2008-01-10 15:40:56

Just keep in mind that anyone can read your robots.txt file. If you are disallowing directories that entices others to attempt to access them to figure out what you don’t want to share. =) It’s a common practice; which is why I recommend that if you use subdirectories to do development work that you have a separate testing server.

 
Comment by Josh Spaulding
2008-01-10 15:48:36

I’m glad you mentioned it because I need to update mine. I use it to disallow archives, trackback links, images, wp-admin etc. etc.

nofollow is somewhat effective, but not all SE’s acknowledge it.

@ Frank - It’s not just about duplicate content. It’s also to consolidate good pages in order to get the spiders deeper into your blog and to direct the flow of authority to those money pages.

 
Comment by Zath Subscribed to comments via email
2008-01-12 12:13:17

I’ve done a few bit of work on my robots file, I spent a good few months trying to sort it out - I was trying to exclude stuff, but because of the order I placed it in the file, it wasn;t being picked up - once I sorted that out though, it’s worked quite well so far.

My ultimate aim is try and keep content to being shown on the individual pages only and index other pages of interest.

I’ve had a vbulletin forum for years now that my friends and I have mainly used, however I’ve now decided to drop the forum at my next renewal time in March since it doesn’t get enough use for the money it costs per year - with that in mind I’ve now added the forum pages to the robots file in the hope that once I do remove it, Google doesn’t penalise the rest of the site when the hundreds of forum pages suddenly disappear from the site!

Overall, in the last few months I’ve gone from around 1900 indexed pages (with lots of blog and forum duplications) down to around 1000 pages - this should continue to drop now all forum pages are being removed.

How big a difference it makes, I can’t say for sure, but I’ve now got to the point where keeping my Google indexed pages is a project and a challenge in itself! ;)

 
Comment by Garry Conn
2008-01-12 21:06:02

Thanks for the comments and feedback so far everyone, this is great stuff. Kyle thanks for showing us your file. Jason L… you have a very valid point

@ Zath,

Can you contact me by email and let me know where this forum is located? I’d like to check it out and see if I can help you out a little bit before you shut it down.

 
Name (required)
E-mail (required - never shown publicly)
URI
Subscribe to comments via email
Your Comment (smaller size | larger size)
You may use <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong> in your comment.

Trackback responses to this post