|
|
Duplicate Content and Duplicate URLs in Websites
Websites with multiple URLs having same content is bad for rankings in search engines. Duplicate content issues should be fixed. Even if not directly related to your website search engine usage.
Note: We have a video tutorial:
Even though the video demonstration uses A1 Website Analyzer some of it is also applicable for users of A1 Website Search Engine.

Even though the video demonstration uses A1 Website Analyzer some of it is also applicable for users of A1 Website Search Engine.
Solve Website Pages and URLs with Duplicate Content
Search engines and software like
A1 Website Search Engine will follow all internal links and redirects within your website.
Ways to solve problems with duplicate content and page URLs:
Ways to solve problems with duplicate content and page URLs:
- Avoid having pages with duplicate content.
- Avoid linking to pages that duplicate content.
- Redirect duplicate pages to appropriate URLs, e.g. using mod_rewrite.
- Redirect www and non-www URLs of same page to same destination.
- You can canonical, noindex and robots text file filter pages with duplicate content.
- Filter URLs both during and after crawl using analysis and output filters.
- The links analysis tools in our software can help you discover and solve website issues.
Problems with Duplicate Page URLs
Even if search engines like Google will not always penalizes you for duplicate content,
they will still have to choose one of the URLs / pages to show in search results and ignore all other duplicates.
That means all links pointing to the wrong URL will potentially be ignored and no longer fully counted when search engines like Google and Bing calculates importance.
Problems with Duplicate Content
You can check for duplicate content in
title tag,
meta tag description
and
meta tag keywords
across all page URLs in your website.
Note: Our sibling product A1 Website Analyzer features more options for finding duplicate content and headers.
To check for duplicate page titles, descriptions or headers:
(Note: Screenshot is from A1 Website Analyzer meaning more features are available)
You can view the results after applying the quick filtering of the data collected during the website crawl.
Note: Our sibling product A1 Website Analyzer features more options for finding duplicate content and headers.
To check for duplicate page titles, descriptions or headers:
- Select the appropriate filter in the dropdown.
- Press the filter button. All URLs with duplicate titles now get shown and grouped together.

(
You can view the results after applying the quick filtering of the data collected during the website crawl.
Using Port Numbers and WWW vs Non-WWW in URLs
Do you have inconsistent usage of with-www vs non-www, e.g. http://example.com and http://www.example.com.
Do you have inconsistent usage of port numbers, e.g. http://www.example.com:80 and http://www.example.com.
With our website search engine software you can configure root path aliases such as those described. However, even this may cause a problem since all internal URLs found in the website scan will all be converted to either with-www or non-www which may differ from the paths search engines such as Google already knows about and have indexed.
Note: Normally you will not need to change settings such as root path aliases. Instead, A1 Website Search Engine defaults to handle URLs mixed with default HTTP port number and www/non-www through these options:
Do you have inconsistent usage of port numbers, e.g. http://www.example.com:80 and http://www.example.com.
With our website search engine software you can configure root path aliases such as those described. However, even this may cause a problem since all internal URLs found in the website scan will all be converted to either with-www or non-www which may differ from the paths search engines such as Google already knows about and have indexed.
Note: Normally you will not need to change settings such as root path aliases. Instead, A1 Website Search Engine defaults to handle URLs mixed with default HTTP port number and www/non-www through these options:
- Scan website | Crawler options | Fix "internal" URLs with default port explicitly number defined
- Scan website | Crawler options | Fix "internal" URLs with ".www" incorrect compared to website root
Index File and Directory Website Paths
Mixing usage of similar URLs with duplicate content is often a bad idea.
Various search engines will flag that as duplicate URLs and your site rankings may suffer from it.
You should always fix such issues in your website.
Some example issues and the options found in our website search engine tool to handle them automatically:
1)
If mixing example/ with example/index.html use option:
Scan website | Crawler options | Consider non-redirected index file name URLs as "duplicates"
URLs that fit above get response code -9 : RedirectIndexFileDirRoot.
2)
If mixing dir/example/ with dir/example use option:
Scan website | Crawler options | Consider non-redirected with-slash and non-slash URLs as "duplicates"
URLs that fit above get response code -14 : rcRedirectNoSlashDirRoot.
Some example issues and the options found in our website search engine tool to handle them automatically:
1)
If mixing example/ with example/index.html use option:
Scan website | Crawler options | Consider non-redirected index file name URLs as "duplicates"
URLs that fit above get response code -9 : RedirectIndexFileDirRoot.

2)
If mixing dir/example/ with dir/example use option:
Scan website | Crawler options | Consider non-redirected with-slash and non-slash URLs as "duplicates"
URLs that fit above get response code -14 : rcRedirectNoSlashDirRoot.
HTML Meta Refresh Redirects
While page meta refresh redirects generally are
frowned upon by search engines, they are still sometimes used. Example of meta refresh redirect HTML code:
<meta http-equiv="refresh" content="0;URL=http://www.example.com/new-url.html">
The crawler engine in our website search engine software recognizes meta refresh redirects when:
URLs that fit above get response code -11 : MetaRefreshRedirect.
<meta http-equiv="refresh" content="0;URL=http://www.example.com/new-url.html">
The crawler engine in our website search engine software recognizes meta refresh redirects when:
- The meta refresh redirect wait time is set to 0.
- The meta refresh redirect goes to a different page.
URLs that fit above get response code -11 : MetaRefreshRedirect.
Mixed Case in Website URLs
If you mix case in URLs and links:
- Unix servers like Apache often respond with fix internal links to a single case in URLs, e.g. to lower case URLs.
- You may want to setup redirects for those requests that use incorrect case, e.g. from old external backlinks to your website. You can disable option: Website scan | Crawler options | Consider internal paths case sensitive in A1 Website Search Engine

