I was in a discussion about “Duplicate Content Issue”, the discussion revolved around what exactly duplicate content is, does Google penalize for it and how to deal with it. After the discussion I felt that there is a lot of ambiguity and mist over duplicate content issue and people have different perception about it.
In this post I would like to clear some of the doubts most people have about duplicate content and some techniques to deal with it.
What exactly is duplicate content?
According to the definition from Google:
Duplicate content generally refers to substantive blocks of content within or across domains that either completely match other content or are appreciably similar.
According to the above definition duplicate content issue can be divided into two broad categories:
- Intra-domain duplicate content: When blocks of duplicate content are found under the same domain. For example printer friendly version of a webpage having similar content as the browser friendly version of the page.
- Inter-domain duplicate content: When content blocks are found duplicated across domains.
There is a key point to note here that all duplicate content is not malicious. As I had just discussed duplicate content in intra-domain category for printer friendly version is a non-malicious example of duplicate content.
Even duplicate content across domains is not always malicious, such as citation and referencing is very common duplicate content found in blogging word. Generally following are the types of duplicate content which are considered non-malicious:
- Discussion forums: In discussion forums multiple pages may have discussion going on (on same as well as across domains). These pages may have similar topics under discussion and participants use citation and references to present their point of view.
- Online shopping websites: Multiple online shopping sites do sell same product. These sites show duplicate content about the product in relation with product specification, pricing etc.
- Printer friendly version of webpages: Websites do have printer friendly versions of the webpage to help better prints of those webpage.
If the duplicate content you are worried about falls under above mentioned non-malicious categories then there is no need to worry. In such a case you will not lose anything on search result pages.
Fix duplicate content for better performance
Google do not penalize for the non-malicious duplicate content as in contrast most of the webmaster believe that Google do. You can read more about it on Google Webmasters Blog’s article demystifying duplicate content penalty.
But you should not ignore it because your position in search results may suffer due to duplicate content. Because Google tries its best to present unique search results to the searcher and if it finds duplicate content pages then it is up to Google which page will be shown on search results.
If you know you have duplicate content then tell Google beforehand which is your preferred URL to be indexed for search result and which is not. Telling search engines about your preferred URL to be indexed is called cannonicalization.
Following are some corrective measure you should adopt if your site has duplicate content:
- Use 301 redirects: If you have restructured your site then use 301 redirects to redirect old indexed URLs to the new URL. Otherwise your old indexed URL and new URL for the same page would be considered duplicate.
- Use consistent URL for linking: While linking webpage within your website (internal linking) you should always be consistent in the format of URL used for linking. For example URLs http://site.com/page and http://site.com/page/ and http://site.com/page/default.htm should not be used interchangeably for internal linking, choose one format and stick to it.
- Tell Google how you want to index your site: Using Google webmaster tools you can specify your preferred domain which Google will use while indexing your webpages. Such as you can tell Google whether you prefer http://www.site.com or http://site.com.
- Avoid duplicate content: Review your site and if you find many page that have duplicate or similar content. Try to elaborate and add value to each of the page having duplicate content to build its distinct face and character or try to merge multiple page into one and combine them.
What if you don’t deal duplicate content right?
If whatever I had just discussed and all Google’s recommendation and best practices for duplicate content issue does not worth to give your precious time and you are going to completely ignore them. Still no worries, Google will do it for you. Google will do it’s best to choose the best version of webpage among the webpages having duplicate content and will list in search results.
Google will help you manage duplicate content till Google finds that your duplicate content is non-malicious. But if Google finds that your site is involved in intentional malicious practice of duplicate content in an attempt to get higher position in search results and get more traffic, then your domain may be removed completely from the search results.
Many webmaster are so worried about duplicate content issue and its fixes, Google has a lot to help about it. And if your site is not involved in any deceptive practice you need not be worried about any penalty from Google. Still if you have duplicate content on your site it is highly recommended to follow Google’s recommended best practice to resolve duplicate content issue.
Please share your insight about duplicate content issue and add value to the post. If you like the post do consider sharing it.