Microsys
        

Duplicate Content and Duplicate URLs in Website and XML Sitemaps

Note: We have a video tutorial:

(A1 Website Analyzer)


Solve Website Pages and URLs with Duplicate Content

Search engines and software like A1 Sitemap Generator will follow all internal links and redirects within your website.

Ways to solve problems with duplicate content and page URLs:

  • Avoid having pages with duplicate content.
  • Avoid linking to pages that duplicate content.
  • Redirect duplicate pages to appropriate URLs, e.g. using mod_rewrite.
  • Redirect www and non-www URLs of same page to same destination.
  • You can canonical, noindex and robots text file filter pages with duplicate content.
  • Filter URLs in XML sitemap creator software using analysis and output filters.
  • The links analysis tools in our software can help you discover and solve website issues.


Problems with Duplicate Page URLs

Even if search engines like Google will not always penalizes you for duplicate content, they will still have to choose one of the URLs / pages to show in search results and ignore all other duplicates. That means all links pointing to the wrong URL will be ignored and no longer weighted in when Google, Yahoo etc. determines importance.


Using Port Numbers and WWW vs Non-WWW in URLs

Do you have inconsistent usage of with-www vs non-www, e.g. http://example.com and http://www.example.com.
Do you have inconsistent usage of port numbers, e.g. http://www.example.com:80 and http://www.example.com.

With our sitemap generator software you can configure root path aliases such as those described. However, even this may cause a problem since all internal URLs in website scan output will all be converted to either with-www or non-www which may differ from the paths search engines such as Google already knows about and have indexed.

Note: Normally you will not need to change settings such as root path aliases. Instead, A1 Sitemap Generator defaults to handle URLs mixed with default HTTP port number and www/non-www through these options:

  • Scan website | Crawler options | Fix "internal" URLs with default port explicitly number defined
  • Scan website | Crawler options | Fix "internal" URLs with ".www" incorrect compared to website root


Index File and Directory Website Paths

Mixing URLs like example/ with example/index.html is often a bad idea. Various search engine tools will sometimes flag that as duplicate URLs. If you want our sitemap generator tool to handle this automically, use option: Scan website | Crawler options | Directory index file names

directory index file

URLs that fit above get response code -9 : RedirectIndexFileDirRoot.


HTML Meta Refresh Redirects

While page meta refresh redirects generally are frowned upon by search engines, they are still sometimes used. Example of meta refresh redirect HTML code:
<meta http-equiv="refresh" content="0;URL=http://www.example.com/new-url.html">

The crawler engine in our sitemap generator software recognizes meta refresh redirects when:
  • The meta refresh redirect wait time is set to 0.
  • The meta refresh redirect goes to a different page.

URLs that fit above get response code -11 : MetaRefreshRedirect.


Mixed Case in Website URLs

If you mix case in URLs and links:
  • Unix servers like Apache often repond with 404 errors if case is different from actual.
  • Windows servers like IIS often respond with the same content giving you lots of duplicate URLs and duplicate content.

Recommendations to solve the problem:
  • It is normally a good idea to fix internal links to a single case in URLs, e.g. to lower case URLs.
  • You may want to setup redirects for those requests that use incorrect case, e.g. from old external backlinks to your website.
  • When building XML sitemaps, you can in A1 Sitemap Generator disable option: Website scan | Crawler options | Consider internal paths case sensitive.

    duplicate urls case

Webmaster and website software tools


Business and desktop software utilities

Website and webmaster guides


Search engine optimization help

 © Copyright 1997-2012 Microsys