Microsys

Sitemap Generator Analysis Filters (Crawler Filters) in Website Scan

Sitemap Generator Website Analysis Filters Overview

Analysis filters determines which pages have their content analyzed for links and other data. You can use analysis filters instead or in conjunction with webmaster filters (robots.txt, noindex, nofollow etc.) and output filters.

Note: If an URL is only linked from pages that are not analyzed due to filters, it will not be found in website scan.

Note: For changes in analysis filters to take effect, you will need to crawl your website again.


Limit Internal URLs to Those in These Directories

  • Links encountered by crawler are normally grouped in categories sitemap and external.
  • With this option, you can decide which pages belong in sitemap.


Analyze Files With File Extension

URLs with file extensions not found in the list will not be analyzed durng website scan.
If you remove all file extensions in the list, the file extension list filtering accepts all files.

Website crawler list extensions

Website crawler list categories


Do Not Analyze URLs That Match Paths / Strings / Regex

  • Strings:
    • blogs matches relative paths that contain "blogs".
    • @ matches relative paths that contain "@".
    • ? matches relative paths that contain "?".
  • Special:
    • : when alone matches relative paths that first has ":" before any "?".
  • Paths:
    • :s matches relative paths that start with "s" such as http://www.microsystools.com/services/ and http://www.microsystools.com/shop/.
    • :blogs/ matches relative paths that start with "blogs/" such as http://www.microsystools.com/blogs/.
  • Subpaths:
    • :blogs/* matches relative paths excluding itself that start with "blogs/" such as http://www.microsystools.com/blogs/sitemap-generator/.
  • Regular expression:
    • ::blog(s?)/ matches relative paths with regex such as http://www.microsystools.com/blogs/ and http://www.microsystools.com/blog/.
    • ::blogs/(2007|2008)/ matches relative paths with regex such as http://www.microsystools.com/blogs/2007/ and http://www.microsystools.com/blogs/2008/.
    • ::blogs/.*?keyword matches relative paths with regex such as http://www.microsystools.com/blogs/category/products/a1-keyword-research/.

From above examples it can be seen that:
  • : alone = special match.
  • : at start = paths match.
  • : at start and * at end = makes paths into subpaths match.
  • :: at start = regular expression match.
  • None of above, normal string text match.

To add list filter item in dropdown: Type it and use the [+] button.
To remove list filter item in dropdown: Select it and use the [-] button.
You can view more information about the user interface controls used by A1 Sitemap Generator.

Website software tools


Business software utilities


Popular freeware programs

Online tools


Webmaster articles


Website promotion resources

 © Copyright 1997-2010 Microsys | about | contact | legal | privacy