Duplicate Content and Duplicate URLs in XML Sitemaps
Websites with multiple URLs that duplicate content is bad for rankings in search engines. Duplicate URLs and content should be resolved when creating and uploading XML sitemaps.
        Note: We have a video tutorial:      
        
      
         
                    
        Solve Website Pages and URLs with Duplicate Content
 
      Search engines and software like
      
TechSEO360 will follow all internal links and redirects within your website.
      
      Ways to solve problems with duplicate content and page URLs:
      
        - Avoid having pages with duplicate content.
- Avoid linking to pages that duplicate content.
- Redirect duplicate pages to appropriate URLs, e.g. using mod_rewrite.
        
- Redirect www and non-www URLs of same page to same destination.
- You can
      
        canonical,
        noindex
        and
        robots text file
        filter pages with duplicate content.
- Filter URLs both during and after crawl using
      
      analysis
      and
    
      output
      filters.
      
- The
        links analysis tools in our software can help you discover and solve website issues.
Problems with Duplicate Page URLs
 
      Even if search engines like Google will not always penalizes you for duplicate content,
      they will still have to choose one of the URLs / pages to show in search results and ignore all other duplicates.
      That means all links pointing to the wrong URL will potentially be ignored and no longer fully counted when search engines like Google and Bing calculates importance.
      
 Problems with Duplicate Content
 
      You can check for duplicate content in
      
title tag,
      
h1 tag, 
h2 tag
      meta tag description
      and
      
meta tag keywords
      across all page URLs in your website.
      
      
      To check for duplicate page titles, descriptions or headers:
      
        - Select the appropriate filter in the dropdown.
- Press the filter button. All URLs with duplicate titles now get shown and grouped together.
 (Note: Screenshot is from A1 Website Analyzer meaning more features are available)
      
        (Note: Screenshot is from A1 Website Analyzer meaning more features are available)
        
      
      You can view the results after applying the quick filtering of the data collected during the website crawl.
      
Duplicate Content and Similarity Analysis
 
        You can often visually see which pages have similar content by enabling the following option before starting the site scan:
        Scan website | Data collection | Perform keyword density analysis of all pages
        
         
      
                                            
        If you have 
language detection 
        working correctly, you can further improve the quality of this.        
        Simply set 
Keyword tools | Content keyword analysis | Select stop words 
        to either match the main language of your website or 
auto 
        if your website uses multiple languages.
        Doing this allows the crawler to exclude all common words and only analyse 
content words.         
      
Using Port Numbers and WWW vs Non-WWW in URLs
 
       Do you have inconsistent usage of with-www vs non-www, e.g. 
http://example.com and 
http://www.example.com.
       
       Do you have inconsistent usage of port numbers, e.g. 
http://www.example.com:80 and 
http://www.example.com.
       
       With our 
technical SEO software you can configure
      
      
root path aliases such as those described.
      However, even this may cause a problem since
      all internal URLs found in the website scan will all be converted to either
      
with-www or 
non-www which
      may differ from the paths search engines such as Google already knows about and have indexed.
      
      
Note:
      Normally you will not need to change settings such as 
root path aliases.
      Instead, TechSEO360 defaults to handle
      URLs mixed with default HTTP port number and www/non-www through these options:
      
        - Scan website | Crawler options | Fix "internal" URLs with default port explicitly number defined
- Scan website | Crawler options | Fix "internal" URLs with ".www" incorrect compared to website root
Index File and Directory Website Paths
 
      Mixing usage of similar URLs with duplicate content is often a bad idea.
      Various search engines will flag that as duplicate URLs and your site rankings may suffer from it.
      You should 
always fix such issues in your website.
      
      Some example issues and the options found in our technical SEO tool to handle them automatically:
      1)
      If mixing 
example/ with 
example/index.html use option:
      
Scan website | Crawler options | Consider non-redirected index file name URLs as "duplicates"
      
      URLs that fit above get response code 
-9 : RedirectIndexFileDirRoot.
      
      
 
      
      2)
      If mixing 
dir/example/ with 
dir/example use option:
      
Scan website | Crawler options | Consider non-redirected with-slash and non-slash URLs as "duplicates"
      
      URLs that fit above get response code 
-14 : rcRedirectNoSlashDirRoot.
      
HTML Meta Refresh Redirects
 
      While page 
meta refresh redirects generally are
      frowned upon by search engines, they are still sometimes used. Example of meta refresh redirect HTML code:
      
      
      <meta http-equiv="refresh" content="0;URL=http://www.example.com/new-url.html">
      
      
      The crawler engine in our technical SEO software recognizes meta refresh redirects when:
      
        - The meta refresh redirect wait time is set to 0.
- The meta refresh redirect goes to a different page.
      URLs that fit above get response code 
-11 : MetaRefreshRedirect.
      
Mixed Case in Website URLs
 
        If you mix case in URLs and links:
        
          - Unix servers like Apache often respond with
      
          fix internal links to a single case in URLs, e.g. to lower case URLs.
- You may want to setup redirects for those requests that use incorrect case, e.g. from old external backlinks to your website.You can disable option: Website scan | Crawler options | Consider internal paths case sensitive in TechSEO360
 
        
      TechSEO360 | 
help | 
previous | 
nextSEO website crawler tool that can find broken links, analyze internal link juice flow, show duplicate titles, perform custom code/text search and much more.
 This help page is maintained by 
Thomas SchulzAs one of the lead developers, his hands have touched most
      of the code in the software from Microsys. 
      If you email
      any questions, chances are that he will be the one answering.