Microsys
        

Crawl Error Response Code URLs and Pages with Sitemap Generator

Some websites include important links in pages returned for e.g. 404 - not found errors. You can have A1 Sitemap Generator scan error pages for links by checking option: scan website | crawler options | crawl error pages.

Please note that the sitemapper program will ignore links relative to current path when analyzing error pages. It does so to avoid getting caught in an endless crawling loop. To understand the reason, take a look at following example of the process in a naive website crawler:

    • Crawler detects url http://www.example.com/directory/ gives 404 - not found.
    • Crawler finds http://www.example.com/directory/ links to directory/something.
    • Crawler concatenates http://www.example.com/directory/ and directory/something into http://www.example.com/directory/directory/something.
    • Crawler detects url http://www.example.com/directory/directory/ gives 404 - not found.
    • Crawler finds http://www.example.com/directory/directory/ links to directory/something.
    • Crawler concatenates http://www.example.com/directory/directory/ and directory/something into http://www.example.com/directory/directory/directory/something.
    • Classic spider trap that continues forever.

To have error page URLs scanned for links, use one of the following kinds instead:
  • /directory/something
  • http://www.example.com/directory/something

Webmaster and website software tools


Business and desktop software utilities

Website and webmaster guides


Search engine optimization help

 © Copyright 1997-2012 Microsys