Using a
firewall
program? You need to configure it if website scan returns few URLs,
and they all have response code -4 : CommError.
Are you mixing www and non-www usage in website links and redirects? Check externals tab to know.
Does your website use website cloaking, i.e. changed content depending on user agent string used by crawler?
Then change sitemapper user agent string in:
Scan website | Crawler identification | User agent ID.
Does your website and/or pages in it redirect to or get content from
(e.g. through <frame>
or <iframe>) another domain?
Check externals tab to know.
Website content has no links into a whole area of pages? In this case, having cross-linked all hidden pages is no help!
To solve this, you can use multiple
start search paths.
Website relies on Javascript or uncommon types of HTML link tags for website navigation, e.g. <iframe>, <form> and <button>?
Solution: Enable checking these things for links in Scan website | Crawler options.
Websites link use // instead of / and webserver does not respond with error or redirect? Problem cascades if the document linked use relative paths?
Solution: Configure Scan website | Crawler options to handle this situation.
Dynamic page generates unique links based on input from GET ? data? This can sometimes cause an endless loop of unique URLs!
Have you configured
analysis
and
output
filters and forgot about them?
Besides URL filtering support, you can also configure when filtered URLs are removed:
Website scan results: Scan website | Crawler options | Apply "webmaster" and "list" filters after website scan.
Building sitemaps: Create sitemap | Document options | Remove URLs excluded by "webmaster" and "list" filters.
Are you scanning a website subdirectory which contains no links to pages within that directory? Check externals tab to know.
Consider if your website is using non-standard file extensions. If you know which, you can add them:
Alternatively, clear all file extensions in
analysis
and
output
filters, but keep the default MIME filters both places. Then try scan again.
Do you have directories with response code 0 : VirtualItem in scan results?
Check the information about
internal website linking.
Are there many URLs with errors in website scan results?
If the webserver is causing some URLs to give error response codes, e.g. because of server bandwidth throttling,
you can try
resume scan
until all errors are gone. This will most likely lead to more found links and pages.
Another solution towards solving URLs with error responses is to experiment with
options found in Scan website | Crawler engine | Advanced engine settings.
Some common settings which often help:
Increasing timeout values, using GET only and enabling/disabling GZip/defalte support.