Microsys
  

Complete Guide on Website Audits with A1 Website Analyzer

Analzying your own website is often the first thing to do in any SEO audit. Here is a step-by-step guide on using A1 Website Analyzer for this.
Help: overview | previous | next

 To see all the options available, you will have to switch off easy mode 

 With options that use a dropdown list, any [+] or [-] button next to adds or removes items in the list itself 

Note: We have a video tutorial:





Some Background Information on A1 Website Analyzer



Getting Started - Scan Website

The first screen you see is where you can type in the website address and start the crawl:

finding and configuring scan options

Per default most of the advanced options are hidden, and the software will use its default settings.

However, if you want to change the settings, e.g. to collect more data or increase the crawl speed by increasing the number of max connections, you can make all the options visible by switching off Simplified easy mode.

In the screenshot below, we have turned up worker threads and simultaneous connections to the max:

finding and configuring scan options


After Scan - Controlling Visible Data Columns

Before we do further analysis post-scan of the website, we need to know how to switch-on and switch-off data columns since seeing them all at once can be a little overhvelming.

The image below shows where you can hide and show columns.

controling visible data columns


Discover Internal Linking Errors

When checking for errors on a new website, it can often be fastest to use quick filters. In our following example, we are using the option Only show URLs with filter-text found in "response code" columns, combined with "404" as filter text and clicking the filtering icon.

By doing above, we get a list of URLs with response code 404 as shown here:

inspect internal linking

To the right you can see details of how the 404 URL selected to the left was discovered. You can see all URLs that linked, used (usually the src attribute in HTML tags) or redirected to the URL.

Note: To also have external links checked, enable these options:
  • Scan website | Data collection | Store found external links option
  • Scan website | Crawler options | Verify external URLs exist (and analyze if applicable)

If you want to use this for exports (explained later below) you can also enable columns that will show you the most important internal backlinks and anchor texts.

exporting internal linking data


See Line Numbers, Anchor Texts and Follow / Nofollow for All Links

For all links found in the website scanned, it is possible to see the following information:
  • The line number in the page source where the link resides.
  • The anchor text associated to the link.
  • If the link is follow or nofollow

extended link information

To ensure nofollow links are included during website crawling, uncheck the following options in Scan website | Webmaster filters:
  • Obey meta tag "robots" nofollow
  • Obey a tag "rel" nofollow


See Which Images Are Referenced Without "alt" Text

When using images in websites, it is often an advantage to use markup that describes them, i.e. use the alt attribute in the <img> HTML tag.

A1 Website Analyzer has a built-in report called Show only images where some "linked-by" or "used-by" miss anchors / "alt". This report will list all images that are:
  • Used without an alternative text
  • Linked without an anchor text.

It achieves this by:
  • Only showing relevant data columns.
  • Enable filter: Only show URLs that are images.
  • Enable filter: Only show URLs where "linked-by" or "used-by" miss anchors or "alt".

When viewing results, you can in extended details see where each image is referenced without one of above mentioned text types. In the screenshot below, we are inspecting the used by data which originate from sources like <img src="example.png" alt="example">.

images with missing alternative text


Understand Internal Navigation and Link Importance

Understanding how your internal website structures helps search engines and humans find your content can be very useful.

Humans


To see how many clicks it takes for a human to reach a specific page from the front page, use the data column clicks to navigate.

Search engines


While PageRank sculpting is mostly a day of the past, your internal links and the link juice passed around still help search engines understand what content and pages you find to be the most important within your website.

Our software automatically calculates importance scores for all URLs using these steps:
  1. There is given more weight to links found on pages with many incoming links.
  2. The link juice a page can pass on will be shared among its outgoing links.
  3. The scores are converted to a logarithmic base and scaled to 0...10.

internal linking and link juice

You can affect the algorithm through the menu options Options | Program options | URL importance algorithm:
  • Links "reduce" - weigh repeated links on the same page less and less,
  • Links "noself" - ignore links going to the same page as the link is located at.

To include nofollow links (which are given significantly lower weight than follow links) un-check these options in Scan website | Webmaster filters:
  • Obey meta tag "robots" nofollow
  • Obey a tag "rel" nofollow


See All Redirects, Canonical, NoIndex and Similar

It is possible to see site-wide information on which URLs and pages that are:
  • HTTP Redirects.
  • Meta refresh redirects.
  • Excluded by robots.txt.
  • Marked canonical pointing to itself, canonical pointing to another URL than itself, noindex or nofollow, noarchive, nosnippet.
  • Duplicates of some sort, e.g. index or missing slash page URLs.
  • And more..

Above data is mainly retrieved from meta tags, HTTP headers and program analysis of URLs.

To see all the data, finish the website scan and enable visibility of these columns:
  • Core data | Path
  • Core data | Response code
  • Core data | URL content state flags detected
  • URL references | Redirects to path
  • URL references | Redirects to response code
    (This in particular is useful in making sure your redirect destinations are setup correctly.)

canonical and similar information

Notice that we in the above screenshot have switched off tree view and instead see all URLs in list view mode.

To only list URLs with the specific state canonical, set the quick filter text to "canonical" and use the quick filter option Filter on URL state flags column.


Check for Duplicate Content


Duplicate Page Titles, Headers etc.


It is generally a bad idea to have multiple pages share the same duplicate title, headers and descriptions. To find such pages, you can use the quick filter feature after the initial website crawl has finished.

In the screenshot below, we have limited our quick filter to only show pages with duplicate titles that also contain the string "the game" in one of its data columns.

page titles and duplicate content

Duplicate Page Content


Some tools can perform a simple MD5 hash check of all pages in a website. However, that will only tell you of pages that are 100% the same which is not very likely to happen on most websites.

Instead, A1 Website Analyzer can sort and and group pages with similar content. In addition, you can see a visual representation of the most prominent page elements. Together, this makes a useful combination for finding pages that may have duplicate content. To use this:
  • Enable option Scan website | Data collection | Perform keyword density analysis of all pages before you scan the website.
  • Enable visibility of data column Page Content Similarity.

pages with similar content are grouped together

Before starting a site scan, you can increase the accuracy by setting the following options in Analyze Website | Keyword analysis.
  • Set Select stop words to match the main language of your website or select auto if it uses multiple languages.
  • Set Stop words usage to Removed from content.
  • Set Site analysis | Max words in phrase to 2.
  • Set Site analysis | Max results per count type to a higher value than the default, e.g. 40.

Note: If you use multiple languages in your website, read this about how page language detection works in A1 Website Analyzer.

Duplicate URLs


Many websites contain pages that can be accessed from multiple unique URLs. Such URLs should redirect or otherwise point search engines to the primary source. If you enable visibility of the data column Crawler flags, you can see all page URLs that:
  • Explicitly redirect or point to other URLs using canonical, HTTP redirect or meta refresh.
  • Are similar to other URLs, e.g. example/dir/, example/dir and example/dir/index.html. For these, the primary and duplicate URLs are calculated and shown based on HTTP response codes and internal linking.


Optimize Pages for Better SEO Including Title Sizes

For those who want to do on-page SEO of all pages, there is a built-in report which will show you the most important data columns including:
  • Word count in page content.
  • Text versus code percentage.
  • Title and description length in characters.
  • Title and description length in pixels.
  • Internal linking and page scores.
  • Clicks on links required to reach a page from the domain root.

some of the most SEO relevant data columns

Note: It is possible to filter the data in various ways - e.g. so you only see pages where titles are too long to be shown in Google search results.


Custom Search Website for Text and Code

Before you start the initial website scan, you can configure various text/code patterns you want to search for as pages are analyzed and crawled.

You can configure this in Scan website | Data collection, and it is possible to use both pre-defined patterns and make your own. This can be very useful to see if e.g. Google Analytics has been installed correctly on all pages.

Notice that we have to name each our search patterns, so we later can distinguish among them.

In our screenshot, we have a pattern called ga_new that searches for Google Analytic using a regular expression. (If you do not know regular expressions, simply writing a snippet of the text or code you want to find will often work as well.)

When adding and removing patterns, be sure you have added/removed them from the dropdown list using the [+] and [-] buttons.

dedning custom searches

After the website scan has finished, you will be able to see how many times each added search pattern was found on all pages.

custom search results


View The Most Important Keywords in All Website Content

It is possible to extract the top words of all pages during site crawl.

To do so, tick option Scan website | Data collection | Perform keyword density analysis of all pages.

The algorithm that calculates keyword scores takes the following things into consideration:
  • Tries to detect language and apply the correct list of stop words.
  • The keyword density in the complete page text.
  • Text inside important HTML elements has more importantance than normal text.

The scores you see are formatted in a way that is readable to humans, but which is also easy to do furher analysis on by custom scripts and tools. (Which is useful if you want to export the data.)

site-wide content and keyword analysis

If you rather get a detailed breakdown of keywords on single pages, you can get that as well:

keyword analysis of single pages

This is also where you can configure how keywords scores are calculated. To learn more about this, view the A1 Keyword Research help page about on-page analysis of keywords.


Spell Check Entire Websites

If you in Scan website | Data collection choose to do spell checking, you can also see the number of spelling errors for all pages after the crawl has finished.

To see the specific errors, you can view the source code of the followed by using clicking Tools | Spell check document.

how spelling works overview

As can be seen, the dictionary files in A1 Website Analyzer can not include everything, so you will often benefit from making a preliminary scan where you add common words specific for your website niche to its dictionary.

add words to spelling dictionary


Validate HTML and CSS of All Pages

A1 Website Analyzer can use multiple different HTML/CSS page checkers including W3C/HTML, W3C/CSS, Tidy/HTML, CSE/HTML and CSE/CSS.

Since HTML/CSS validation can slow website crawls, these options are unchecked by default.

List of options used for HTML/CSS validation:
  • Scan website | Data collection | Enable HTML/CSS validation
  • General options | Tool paths | TIDY executable path
  • General options | Tool paths | CSE HTML Validator command line executable path
  • Scan website | Data collection | Validate HTML using W3C HTML Validator
  • Scan website | Data collection | Validate CSS using W3C CSS Validator

When you have finished a website scan with HTML/CSS validation enabled, the result will look similar to this:

HTML and CSS validation


Integration With Online Tools

To ease the day-to-day work flow, the software has a separate tab with various 3rd party online tools.

Depending on the URL selected and data available, you can select one of the online tools in the drop down list, and the correct URL including query parameters will automatically be opened in an embedded browser.

various online tools available


Crawl Sites Using a Custom User Agent ID and Proxy

Sometimes it can be useful to hide user agent ID and IP address used when crawling websites.

Possible reasons can be if a website:
  • Returns different content for crawlers than humans, i.e. website cloaking.
  • Uses IP address ranges to detect country followed by redirecting to a another page/site.

You can configure these things in General options | Internet crawler.

configure proxy and user agent ID


Import List of URLs and Only Check Them

Sometimes it can be useful to import a list of URLs from either the same website or different domains. You can achieve this through the menu File | Import URLs from file.

When you import, all URLs will automatically be placed in either the internal or external tabs.

If most URLs are from one specific domain, those URLs will be placed in the "Sitemap" tab, and the rest will go into the "External" tab.

To start a website crawl from the imported URLs, you can either tick Scan website | Recrawl or Scan website | Recrawl (listed only) where the latter of the two options will avoid including any new URLs to analysis queue and results output.

To furher limit the crawl to only some of the imported URLs in the "Sitemap", select the URLs and click the limit analysis and output button.

limit crawl to imported URLs

To crawl the imported URLs in the "external" tab, tick options:
  • Scan website | Data collection | Store found external links option
  • Scan website | Crawler options | Verify external URLs exist (and analyze if applicable)


Export Data to CSV and Tools Like Excel

Generally speaking, you can export the content of any data control by focusing/clicking it followed by using the File | Export... button.

The default is to export data as standard .CSV files, but incase the program you intend to import the .CSV files into have specific needs, or if you would like to have e.g. column headers, you can adjust settings in menu: Options | Program options | Data import/export

The data you most often will want to export is all the URLs + details found in main URLs tree/list view.

To do this efficiently, simply adjust the visible columns and activate the quick filters you want (e.g. 404 errors, duplicate titles or similar) before exporting. (This is the best method for creating custom reports containing just the data you need.)

Alternatively, you can also use the built-in reporting button that contains various presets:

presets for common reports

Note: You can create many more data views if you learn how to configure filters and visible columns.


See URLs with Page and AJAX Fragments

Quick explanation of fragments in URLs:
  1. Page-relative-fragments: Relative links within a page:
    http://example.com/somepage#relative-page-link
  2. AJAX-fragments: client-side Javascript that queries server-side code and replaces content-in-browser:
    http://example.com/somepage#lookup-replace-data
  3. AJAX-fragments-Google-initiative: Part of the Google initiative Making AJAX Applications Crawlable:
    http://example.com/somepage#!lookup-replace-data
    This solution has since been deprecated by Google themselves.

Before website scan:
  • Standard fragments # are stripped when using default settings. To change this, uncheck:
    • In Scan website | Crawler Options | Cutout "#" in internal links
    • In Scan website | Crawler Options | Cutout "#" in external links
  • Crawlable fragments #! are always included.

After website scan:
  • For an easy way to see all URLs with #, use the quick filter.
  • If you use #! for AJAX as suggested by Google, you can benefit from:
    1. Enable visibility of data column Core data | URL content state flags detected.
    2. You can filter or search for flags "[ajaxbyfragmentmeta]" and "[ajaxbyfragmenturl]"

show all URLs with fragments


Windows, Mac and Linux

A1 Website Analyzer is available as native software for Windows and Mac OS X.

The Windows installer automatically selects the best binary available depending on the Windows version used, e.g. 32 bit versus 64 bit.

On Linux, you can often run A1 Website Analyzer as well by using virtualization/emulation solutions such as WINE.


Free Trial, Price, Upgrades and Installation

To try the 30 days free trial, simply download and install. There are no artificial page or URL limits.

You can also buy now. The price is $69 USD which includes:
  • All releases within version series 7.x.
  • There is no monthly or yearly subscription price.
  • You can continue to use the software for as long as you please.

Depending on when you purchase, it may also include access to 8.x series if the first 8.x version is released within an year of your 7.x purchase. If not, there will be a discounted upgrade price available.

Note: If you already have e.g. version 7.0.0 installed, you can update to the newest release 7.7.0 simply by downloading and installing it.


Sibling Tools

If there is a feature A1 Website Analyzer does not have, chances are we have a sibling tool that does. Some common features for all our A1 tools are:
  • Similar user interface.
  • Can share project files and data.
  • Cross-sale discounts available during the purchase checkout process.

Sitemaps


While A1 Website Analyzer will not create sitemaps, its sibling tool A1 Sitemap Generator will. This includes XML sitemaps, video sitemaps, image sitemaps, HTML sitemaps and some other formats as well.
This help page is maintained by

As one of the lead developers, his hands have touched most of the code in the software from Microsys.

If you email any questions, chances are that he will be the one answering them.
A1 Website AnalyzerAbout A1 Website Analyzer

SEO website crawler tool that can find broken links, analyze internal link juice flow, show duplicate titles, perform custom code/text search and much more.
     
Share this page with friends   LinkedIn   Twitter   Facebook   Pinterest   Google+   YouTube  
 © Copyright 1997-2016 Microsys
 Usage of this website constitutes an accept of our legal, privacy and cookies information.