Microsys
        

Sitemap Generator Output Filters (List Filters) in Website Scan

Website scan output filter helps you define which pages you want listed in sitemap section after website scan in sitemap generator
Navigate: overview | previous | next

Sitemap Generator Website Output Filters Overview

Output filters are usually applied just after website crawling has finished. This means you can have website scanner crawl pages that are not necessarily included in final output. You can use output filters instead or in conjunction with webmaster filters (robots.txt, noindex, nofollow etc.) and analysis filters.

  • Change behavior of how and when these filters get applied by changing Crawler options | Apply webmaster and output filters after website scan.

  • Exclude URLs in both output filters and analysis filters to minimize crawl time, HTTP requests and memory usage.

  • Note: For changes in output filters to take effect, you will need to scan your website again. The reason is that it is the website crawler that tags all found URLs with flags such as excluded by output filter.


Limit Internal URLs to Those in These Directories

  • Links encountered by crawler are normally grouped in categories sitemap and external.
  • With this option, you can decide which pages belong in sitemap.

Website crawler list categories


List Files With File Extension in Output

URLs with file extensions not found in the list will not be included in website scan results.
If you remove all file extensions in the list, the file extension list filtering accepts all files.

Website crawler list extensions


Do Not List URLs That Match Paths / Strings / Regex

Excluding URLs that fully or partially match a text string, path or regular expression pattern from output is a way to have the data returned narrowed down when the tool has finished crawling the website scanned.

Website crawler list categories

  • Strings:
    • blogs matches relative paths that contain "blogs".
    • @ matches relative paths that contain "@".
    • ? matches relative paths that contain "?".
  • Special:
    • : when alone matches relative paths that first has ":" before any "?".
  • Paths:
    • :s matches relative paths that start with "s" such as http://www.microsystools.com/services/ and http://www.microsystools.com/shop/.
    • :blogs/ matches relative paths that start with "blogs/" such as http://www.microsystools.com/blogs/.
  • Subpaths:
    • :blogs/* matches relative paths excluding itself that start with "blogs/" such as http://www.microsystools.com/blogs/sitemap-generator/.
  • Regular expression:
    • ::blog(s?)/ matches relative paths with regex such as http://www.microsystools.com/blogs/ and http://www.microsystools.com/blog/.
    • ::blogs/(2007|2008)/ matches relative paths with regex such as http://www.microsystools.com/blogs/2007/ and http://www.microsystools.com/blogs/2008/.
    • ::blogs/.*?keyword matches relative paths with regex such as http://www.microsystools.com/blogs/category/products/a1-keyword-research/.
    • ::^$ matches the empty relative path (i.e. the root) with regex such as http://www.microsystools.com/.

You can download a project file demonstrating various output filters.
From above examples it can be seen that:
  • : alone = special match.
  • : at start = paths match.
  • : at start and * at end = makes paths into subpaths match.
  • :: at start = regular expression match.
  • None of above, normal string text match.

To add list filter item in dropdown: Type it and use the [+] button.
To remove list filter item in dropdown: Select it and use the [-] button.
You can view more information about the user interface controls used by A1 Sitemap Generator.


Add URLs to Output Filters The Easy Way

If you do not need any of the advanced options for output filters, you can use the Delete and filter button after you have scanned a website. This is useful in cases where you want or need to scan a website multiple times.

Website crawler list categories
Help page primarily maintained and written by

As one of the lead developers in Microsys, his hands have touched almost all the code in the software available at this website. If you email any questions, chances are he will be the one answering them.
A1 Sitemap GeneratorAbout A1 Sitemap Generator

Build all kinds of sitemaps including text, visual HTML / CSS, RSS, XML, image, video, news and mobile for all your websites no matter the platform they use.
share   LinkedIn   Twitter   Facebook   Pinterest   Google+   YouTube

Webmaster and website software tools


Business and desktop software utilities

Website and webmaster guides


Search engine optimization help

 © Copyright 1997-2014 Microsys
 Usage of this website constitutes an accept of our legal, privacy and cookies information.