How To Use a Robots.txt File To Prevent Duplicate Content In WordPress From Indexing In Google

with 12 comments

WordPress creates a lot of duplicate content. I don’t believe that is too much of an issue today as it was a few years ago because WordPress is obviously very popular and I don’t think Google is going to penalize millions of blogs for something that publisher aren’t aware of. Instead, Google does its best job at trying to figure out which version of the duplicate content is the main copy. Allowing this to happen can produce less than desired results.

In this video, I show you how you can use a Robots.txt file to prevent duplicate content on your WordPress blog. Also, If you would like to view the video in full quality and even download a copy to your computer, I have it available in my membership site.

The Camtasia Studio video content presented here requires a more recent version of the Adobe Flash Player. If you are you using a browser with JavaScript disabled please enable it now. Otherwise, please update your version of the free Flash Player by downloading here.


Here are the footnotes as promised in the video:

  1. http://www.garryconn.com/robots.txt
  2. http://www.robotstxt.org/
  3. http://www.google.com/search?q=how+to+create+a+robots.txt+file
  4. http://www.thesitewizard.com/archive/robotstxt.shtml

Here is a screen shot of an example a /comment-page-1 entry indexed in Google:

comment-page-1

One of the things that have been bugging me the most are these strange /comment-page-1 entries that have been getting indexed in Google. This seems to have started around WP 2.71. At times it is very annoying because the comment-page-1 version of my post will get indexed and NOT the actual post itself.

Written by Garry Conn

June 2nd, 2009 at 10:53 pm

Posted in Seo Tips

  • What Is A Robots.txt File?

    Ok, so you want to know what a robots.txt file is and how to use it? I am sure by now many have heard about robots.txt files, but what in the world is it and most importantly, what does it do and how will it benefit you? All the answers...
  • How Do You Have Your Sitemap.xml and Robots.txt File Configured?

    There is a lot of personal preferences towards how many people configure their sitemap.xml file and their robots.txt file. Some people say its bad to allow Googlebot to index all sections of your blog because it can create duplicate content issues or cause your content to pull rank on a...
  • Search Engine Friendly Wordpress Video

    Here is an excellent video I found of Michael Gray, author and writer of Gray Wolf’s SEO Blog. In this video Gray offers tips on how to avoid duplicate content and set up theming or siloing on your Wordpress blog installation. To summarize what is being communicated here, Gray...
  • Increase Google Rankings With Robot Control

    Do you want to increase your Google rankings? I have discovered an easy way to get your self hosted Wordpress blog and posts a better ranking in Google. A self hosted Wordpress blog can consists of hundreds, if not thousands of individual posts and pages. Google is standing by to rank...

12 Responses to 'How To Use a Robots.txt File To Prevent Duplicate Content In WordPress From Indexing In Google'

Subscribe to comments with RSS or TrackBack to 'How To Use a Robots.txt File To Prevent Duplicate Content In WordPress From Indexing In Google'.

  1. I cant seem to find the video in the membership forum with the other videos. Am I looking in the right place?

    pete

    3 Jun 09 at 9:53 am

  2. I have duplicate pages on my website. (pages repeated in different categories).

    It seems Google is selecting one or the other, will Google penalise me or does it know this is what happens on the Internet and get on with its job?

    DIY

    3 Jun 09 at 10:30 am

  3. I haven’t loaded it yet.

    Garry Conn

    3 Jun 09 at 10:43 am

  4. From what it seems, Google will choose one or the other for indexing as well as ranking. Sometimes they index both and then choose which one to rank. I don’t think there is a MAJOR issue of being penalized because there are millions of blogs on the Internet and most publishers don’t know anything about duplicate content, SEO, or things like that. If you want to control what gets indexed and what doesn’t, then you can do that with your robots.txt file.

    Garry Conn

    3 Jun 09 at 10:45 am

  5. I have been a regular reader of your blog but it disappoints me that you haven’t published the video freely but on the membership site.

    dura

    3 Jun 09 at 1:09 pm

  6. Thanks for that post. I too have duplicate pages on some of the web properties using different keywords each time. Do I need to completely change all the articles I put?

    Jason

    3 Jun 09 at 3:10 pm

  7. What do you mean exactly?

    Garry Conn

    3 Jun 09 at 3:41 pm

  8. OMG… can you like give me maybe a day before you jump my ass. Umm… I get kinda busy too you know. LOL!!!

    :)

    Garry Conn

    3 Jun 09 at 6:42 pm

  9. This information is available for free. Just google the post title and you’ll find tons of information.

    Joe Hernandez

    4 Jun 09 at 3:14 am

  10. Ya Garry has been saying in many posts that he don’t make posts often to sell products but almost just second post made is selling or promoting some kind of product or service.

    dura

    4 Jun 09 at 4:51 pm

  11. Just google the post title and you’ll find tons of information.

    ECommerce

    6 Jun 09 at 3:38 am

  12. Ah ok,

    I think I will leave it for now. Maybe chat to the web designer next time I see him.

    Thanks.

    DIY

    8 Jun 09 at 4:13 am

Leave a Reply