Microsys
  

TechSEO360 Obeys Noindex, Nofollow, Canonical and Robots.txt

Desktop technical SEO tool can scan websites. There is optional support for obeying robots text file, noindex and nofollow in meta tags, and nofollow in link tags.
Help: overview | previous | next

TechSEO360 and Webmaster Crawl Filters

The website crawler in TechSEO360 has many tools and options to ensure it can scan complex websites. Some of these include complete support for robots text file, noindex and nofollow in meta tags, and nofollow in link tags.

Tip: Downloading robots.txt will often make webservers and analytics software identify you as a website crawler robot.

crawl robots noindex nofollow

In connection with these, you can also control how they get applied:
  • Disable Scan website | Crawler options | Apply "webmaster" and "output filters" after website scan stops.
  • Enable Create sitemap | Document options | Remove URLs excluded by "webmaster" and "output" filters.

If you use pause and resume crawler functionality you can avoid having the same URLs repeatedly crawled by keeping them all between scans.


HTML Code for Canonical, NoIndex and NoFollow

  • Canonical:
    <link rel="canonical" href="http://www.example.com/list.php?sort=az" />
    Useful in cases where two different URLs give same content. Consider reading about duplicate URLs as there may be better solutions than using canonical instructions, e.g. redirects.

  • NoFollow:
    • <a href="http://www.example.com/" rel="nofollow">bad link</a>
    • <meta name="robots" content="nofollow" />

  • NoIndex:
    <meta name="robots" content="noindex" />



Include and Exclude List and Analysis Filters

You can read more in our online help for TechSEO360 to learn about analysis and output filters.


Match Behavior and Wildcards Support in Robots.txt

The match behavior in the website crawler used by TechSEO360 is similar to that of most search engines.

Support for wildcard symbols in robots.txt file:
  • Standard: Match from beginning to length of filter.
    gre will match: greyfox, greenfox and green/fox.
  • Wildcard *: Match any character until another match becomes possible.
    gr*fox will match: greyfox, grayfox, growl-fox and green/fox.
    Tip: Wildcards filters in robots.txt are often incorrectly configured and a source of crawling problems.

The crawler in our technical SEO tool will obey the following user agent IDs in the robots.txt file:
  • Exact match against user agent selected in: General options | Internet crawler | User agent ID.
  • User-agent: TechSEO360 if the product name is in above mentioned HTTP user agent string.
  • User-agent: miggibot if the crawler engine name is in above mentioned HTTP user agent string.
  • User-agent: *.

All found disallow instructions in robots.txt are internally converted into both analysis and output filters in TechSEO360.


Review Results After Website Scan

See all state flags of all URLs as detected by the crawler - this uses options set in Webmaster filters, Analysis filters and Output filters.

For details of a specific URL, select it and view all information in Core data and related tabs:

crawl filter state flags

For an overview of all URLs you can hide/show the data columns you want including URL content state flags:

show results with URL state flags information

You can also apply a custom filter after scan to only show URLs with a certain combination of URL state flags:

filter results on URL state flags information
This help page is maintained by

As one of the lead developers, his hands have touched most of the code in the software from Microsys.

If you email any questions, chances are that he will be the one answering them.
TechSEO360About TechSEO360

SEO website crawler tool that can find broken links, analyze internal link juice flow, show duplicate titles, perform custom code/text search and much more.
Share this page with friends   LinkedIn   Twitter   Facebook   Pinterest   Google+   YouTube  
 © Copyright 1997-2018 Microsys

 Usage of this website constitutes an accept of our legal, privacy policy and cookies information.