Microsys
  

Website Download Analysis Filters (Crawler Filters) in Site Scan

Website scan analysis filter helps you define which pages you want analyzed for content and links during website scan in website download
Help: overview | previous | next

 To see all the options available, you will have to switch off easy mode 

 With options that use a dropdown list, any [+] or [-] button next to adds or removes items in the list itself 

Website Download Website Analysis Filters Overview

Analysis filters determines which pages have their content analyzed for links and other data. You can use analysis filters instead or in conjunction with webmaster filters (robots.txt, noindex, nofollow etc.) and output filters.

  • Exclude URLs in both analysis filters and output filters to minimize crawl time, HTTP requests and memory usage.

  • Note: If an URL is only linked from pages that are not analyzed due to filters, it will not be found during the website scan.

  • Note: For changes in analysis filters to take effect, you will need to crawl your website again.


Limit Internal URLs to Those in These Directories

  • Links encountered by crawler are normally grouped in categories sitemap and external.
  • With this option, you can decide which pages belong in sitemap.


Analyze Files With File Extension

URLs with file extensions not found in the list will not be analyzed durng website scan.
If you remove all file extensions in the list, the file extension list filtering accepts all files.

Website crawler list extensions


Do Not Analyze URLs That Match Paths / Strings / Regex

Excluding URLs that fully or partially match a text string, path or regular expression pattern from being analyzed is often a good way to limit the crawl.

Website crawler list categories

  • Strings:
    • blogs matches relative paths that contain "blogs".
    • @ matches relative paths that contain "@".
    • ? matches relative paths that contain "?".
  • Special:
    • : when alone matches relative paths that first has ":" before any "?".
  • Paths:
    • :s matches relative paths that start with "s" such as http://www.microsystools.com/services/ and http://www.microsystools.com/shop/.
    • :blogs/ matches relative paths that start with "blogs/" such as http://www.microsystools.com/blogs/.
  • Subpaths:
    • :blogs/* matches relative paths excluding itself that start with "blogs/" such as http://www.microsystools.com/blogs/sitemap-generator/.
  • Regular expression:
    • ::blog(s?)/ matches relative paths with regex such as http://www.microsystools.com/blogs/ and http://www.microsystools.com/blog/.
    • ::blogs/(2007|2008)/ matches relative paths with regex such as http://www.microsystools.com/blogs/2007/ and http://www.microsystools.com/blogs/2008/.
    • ::blogs/.*?keyword matches relative paths with regex such as http://www.microsystools.com/blogs/category/products/a1-keyword-research/.
    • ::^$ matches the empty relative path (i.e. the root) with regex such as http://www.microsystools.com/.

From above examples it can be seen that:
  • : alone = special match.
  • : at start = paths match.
  • : at start and * at end = makes paths into subpaths match.
  • :: at start = regular expression match.
  • None of above, normal string text match.

To add list filter item in dropdown: Type it and use the [+] button.
To remove list filter item in dropdown: Select it and use the [-] button.
You can view more information about the user interface controls used


Add URLs to Analysis Filters The Easy Way

If you do not need any of the advanced options for analsys filters, you can use the Delete and filter button after you finished crawling a site. This makes it easy to optimize settings and limit the amount of pages analyzed for the next time you need to crawl the website.

Website crawler list categories
This help page is maintained by

As one of the lead developers, his hands have touched most of the code in the software from Microsys.

If you email any questions, chances are that he will be the one answering them.
A1 Website DownloadAbout A1 Website Download

Download and take complete websites with you to browse on offline media. Copy and store entire sites for backup, archive and documentation purposes. Never loose a web site again.
     
share   LinkedIn   Twitter   Facebook   Pinterest   Google+   YouTube  
 © Copyright 1997-2016 Microsys
 Usage of this website constitutes an accept of our legal, privacy and cookies information.