Microsys
  

TechSEO360 Guide on Website and SEO Audits

Analyzing your own website is often the first thing to do in any SEO audit. Here is a step-by-step guide.

Note: We have a video tutorial:





Overview of TechSEO360



Getting Started - Scan Website

The first screen you see is where you can type in the website address and start the crawl:

finding and configuring scan options

Per default most of the advanced options are hidden, and the software will use its default settings.

However, if you want to change the settings, e.g. to collect more data or increase the crawl speed by increasing the number of max connections, you can make all the options visible by switching off Simplified easy mode.

In the screenshot below, we have turned up worker threads and simultaneous connections to the max:

finding and configuring scan options


Quick Reports

This dropdown shows a list of predefined "quick reports" that can be used after scanning a website.

quick reports

These predefined reports configures the following options:
  1. Which data columns are visible
  2. Which "quick filter options" are active.
  3. The "quick filter text".
  4. Activates quick filtering.

You can also set all these manually to create your own custom reports. This guide includes various example of this as you read through it.


Controlling Visible Data Columns

Before we do further analysis post-scan of the website, we need to know how to switch-on and switch-off data columns since seeing them all at once can be a little overwhelming.

The image below shows where you can hide and show columns.

controlling visible data columns

You may also want to enable or disable the following options:
  • View | Allow big URL lists in data columns
  • View | Allow relative paths inside URL lists in data columns
  • View | Only show page URLs inside URL lists in data columns
.


Discover Internal Linking Errors

When checking for errors on a new website, it can often be fastest to use quick filters. In our following example, we are using the option Only show URLs with filter-text found in "response code" columns, combined with "404" as filter text and clicking the filtering icon.

By doing above, we get a list of URLs with response code 404 as shown here:

inspect internal linking

If you select a 404 URL at the left side, you can at the right see details of how and where it was discovered. You can see all URLs that linked, used (usually the src attribute in HTML tags) or redirected to the 404 URL.

Note: To also have external links checked, enable these options:
  • Scan website | Data collection | Store found external links option
  • Scan website | Data collection | Verify external URLs exist (and analyze if applicable)

If you want to use this for exports (explained later below) you can also enable columns that will show you the most important internal backlinks and anchor texts.

exporting internal linking data


See Line Numbers, Anchor Texts and Follow / Nofollow for All Links

For all links found in the website scanned, it is possible to see the following information:
  • The line number in the page source where the link resides.
  • The anchor text associated to the link.
  • If the link is follow or nofollow

extended link information

To ensure nofollow links are included during website crawling, uncheck the following options in Scan website | Webmaster filters:
  • Obey meta tag "robots" nofollow
  • Obey a tag "rel" nofollow


See Which Images Are Referenced Without "alt" Text

When using images in websites, it is often an advantage to use markup that describes them, i.e. use the alt attribute in the <img> HTML tag.

For this you can use the built-in report called Show only images where some "linked-by" or "used-by" miss anchors / "alt". This report will list all images that are:
  • Used without an alternative text
  • Linked without an anchor text.

It achieves this by:
  • Only showing relevant data columns.
  • Enable filter: Only show URLs that are images.
  • Enable filter: Only show URLs where "linked-by" or "used-by" miss anchors or "alt".

When viewing results, you can in extended details see where each image is referenced without one of above mentioned text types. In the screenshot below, we are inspecting the used by data which originate from sources like <img src="example.webp" alt="example">.

images with missing alternative text


Understand Internal Navigation and Link Importance

Understanding how your internal website structures helps search engines and humans find your content can be very useful.

Humans


To see how many clicks it takes for a human to reach a specific page from the front page, use the data column clicks to navigate.

Search engines


While PageRank sculpting is mostly a day of the past, your internal links and the link juice passed around still help search engines understand what content and pages you find to be the most important within your website.

Our software automatically calculates importance scores for all URLs using these steps:
  1. There is given more weight to links found on pages with many incoming links.
  2. The link juice a page can pass on will be shared among its outgoing links.
  3. The scores are converted to a logarithmic base and scaled to 0...10.

internal linking and link juice

You can affect the algorithm through the menu options:
  • Tools | Importance algorithm option: Links "reduce": Weights repeated links on the same page less and less. Weights links placed further down in content less and less.
  • Tools | Importance algorithm option: Links "noself": Ignore links going to the same page as the link is located at.

To include nofollow links (which are given significantly lower weight than follow links) uncheck these options in Scan website | Webmaster filters:
  • Obey meta tag "robots" nofollow
  • Obey a tag "rel" nofollow


See All Redirects, Canonical, NoIndex and Similar

It is possible to see site-wide information on which URLs and pages that are:
  • HTTP Redirects.
  • Meta refresh redirects.
  • Excluded by robots.txt.
  • Marked canonical pointing to itself, canonical pointing to another URL than itself, noindex or nofollow, noarchive, nosnippet.
  • Duplicates of some sort, e.g. index or missing slash page URLs.
  • And more..

Above data is mainly retrieved from meta tags, HTTP headers and program analysis of URLs.

To see all the data, finish the website scan and enable visibility of these columns:
  • Core data | Path
  • Core data | Response code
  • Core data | URL content state flags detected
  • URL references | Redirects count
  • URL references | Redirects to path
  • URL references | Redirects to response code
  • URL references | Redirects to path (final)
  • URL references | Redirects to response code (final)
    (This in particular is useful in making sure your redirect destinations are setup correctly.)

canonical and similar information

Notice that we in the above screenshot have switched off tree view and instead see all URLs in list view mode.

To set up a comprehensive filter that shows all pages that redirect in any way:
  1. First enable options:
    • View | Data filter options | Only show URLs with all [filter-text] found in "URL state flags" column
    • View | Data filter options | Only show URLs with any filter-text-number found in "response code" column
    • View | Data filter options | Only show URLs that are pages
  2. After that use the following as the quick filter text:
    [httpredirect|canonicalredirect|metarefreshredirect] -[noindex] 200 301 302 307

This configures the filters so URLs are only shown if they match the following conditions:
  • The URL has to be a page - can not e.g. be an image.
  • The URL has to either HTTP redirect or meta refresh or canonical point to another page.
  • The URL can not contain a noindex instruction.
  • The URL HTTP response code has to be either 200, 301, 302 or 307.


Check for Duplicate Content


Duplicate Page Titles, Headers etc.


It is generally a bad idea to have multiple pages share the same duplicate title, headers and descriptions. To find such pages, you can use the quick filter feature after the initial website crawl has finished.

In the screenshot below, we have limited our quick filter to only show pages with duplicate titles that also contain the string "the game" in one of its data columns.

page titles and duplicate content

Duplicate Page Content


Some tools can perform a simple MD5 hash check of all pages in a website. However, that will only tell you of pages that are 100% the same which is not very likely to happen on most websites.

Instead, TechSEO360 can sort and and group pages with similar content. In addition, you can see a visual representation of the most prominent page elements. Together, this makes a useful combination for finding pages that may have duplicate content. To use this:
  • Enable option Scan website | Data collection | Perform keyword density analysis of all pages before you scan the website.
  • Enable visibility of data column Page Content Similarity.

pages with similar content are grouped together

Before starting a site scan, you can increase the accuracy by setting the following options in Analyze Website | Keyword analysis.
  • Set Select stop words to match the main language of your website or select auto if it uses multiple languages.
  • Set Stop words usage to Removed from content.
  • Set Site analysis | Max words in phrase to 2.
  • Set Site analysis | Max results per count type to a higher value than the default, e.g. 40.

Note: If you use multiple languages in your website, read this about how page language detection works in TechSEO360.

Duplicate URLs


Many websites contain pages that can be accessed from multiple unique URLs. Such URLs should redirect or otherwise point search engines to the primary source. If you enable visibility of the data column Crawler flags, you can see all page URLs that:
  • Explicitly redirect or point to other URLs using canonical, HTTP redirect or meta refresh.
  • Are similar to other URLs, e.g. example/dir/, example/dir and example/dir/index.html. For these, the primary and duplicate URLs are calculated and shown based on HTTP response codes and internal linking.


Optimize Pages for Better SEO Including Title Sizes

For those who want to do on-page SEO of all pages, there is a built-in report which will show you the most important data columns including:
  • Word count in page content.
  • Text versus code percentage.
  • Title and description length in characters.
  • Title and description length in pixels.
  • Internal linking and page scores.
  • Clicks on links required to reach a page from the domain root.

some of the most SEO relevant data columns

Note: It is possible to filter the data in various ways - e.g. so you only see pages where titles are too long to be shown in search results.


Custom Search Website for Text and Code

Before you start the initial website scan, you can configure various text/code patterns you want to search for as pages are analyzed and crawled.

You can configure this in Scan website | Data collection, and it is possible to use both pre-defined patterns and make your own. This can be very useful to see if e.g. Google Analytics has been installed correctly on all pages.

Notice that we have to name each our search patterns, so we later can distinguish among them.

In our screenshot, we have a pattern called ga_new that searches for Google Analytic using a regular expression. (If you do not know regular expressions, simply writing a snippet of the text or code you want to find will often work as well.)

When adding and removing patterns, be sure you have added/removed them from the dropdown list using the [+] and [-] buttons.

custom searches

After the website scan has finished, you will be able to see how many times each added search pattern was found on all pages.

custom search results


View The Most Important Keywords in All Website Content

It is possible to extract the top words of all pages during site crawl.

To do so, tick option Scan website | Data collection | Perform keyword density analysis of all pages.

The algorithm that calculates keyword scores takes the following things into consideration:
  • Tries to detect language and apply the correct list of stop words.
  • The keyword density in the complete page text.
  • Text inside important HTML elements has more importance than normal text.

The scores you see are formatted in a way that is readable to humans, but which is also easy to do further analysis on by custom scripts and tools. (Which is useful if you want to export the data.)

site-wide content and keyword analysis

If you rather get a detailed breakdown of keywords on single pages, you can get that as well:

keyword analysis of single pages

This is also where you can configure how keywords scores are calculated. To learn more about this, view the A1 Keyword Research help page about on-page analysis of keywords.


Generate and Manage Keyword Lists

If you ever need to create or maintain keyword lists, TechSEO360 comes built-in with powerfull keyword tools you can use to generate, combine and clean keyword lists.

keyword list tools


Spell Check Entire Websites

If you in Scan website | Data collection choose to do spell checking, you can also see the number of spelling errors for all pages after the crawl has finished.

To see the specific errors, you can view the source code of the followed by using clicking Tools | Spell check document.

how spelling works overview

As can be seen, the dictionary files in can not include everything, so you will often benefit from making a preliminary scan where you add common words specific for your website niche to the dictionary.

add words to spelling dictionary


Validate HTML and CSS of All Pages

TechSEO360 can use multiple different HTML/CSS page checkers including W3C/HTML, W3C/CSS, Tidy/HTML, CSE/HTML and CSE/CSS.

Since HTML/CSS validation can slow website crawls, these options are unchecked by default.

List of options used for HTML/CSS validation:
  • Scan website | Data collection | Enable HTML/CSS validation
  • General options and tools | Tool paths | TIDY executable path
  • General options and tools | Tool paths | CSE HTML Validator command line executable path
  • Scan website | Data collection | Validate HTML using W3C HTML Validator
  • Scan website | Data collection | Validate CSS using W3C CSS Validator

When you have finished a website scan with HTML/CSS validation enabled, the result will look similar to this:

HTML and CSS validation


Integration With Online Tools

To ease the day-to-day work flow, the software has a separate tab with various 3rd party online tools.

Depending on the URL selected and data available, you can select one of the online tools in the drop down list, and the correct URL including query parameters will automatically be opened in an embedded browser.

various online tools available


Crawl Sites Using a Custom User Agent ID and Proxy

Sometimes it can be useful to hide user agent ID and IP address used when crawling websites.

Possible reasons can be if a website:
  • Returns different content for crawlers than humans, i.e. website cloaking.
  • Uses IP address ranges to detect country followed by redirecting to a another page/site.

You can configure these things in General options and tools | Internet crawler.

configure proxy and user agent ID


Import Data from 3rd Party Services

You can import URLs and extended data from 3rd parties through the menu File | Import URLs from text/log/csv.

Depending on what you import, all the URLs will be placed in either the internal or external tabs.

Importing can be used for both adding more information to existing crawl data and for seeding new crawls.

There is additional data imported when the source data originates from:
  • Apache server logs:
    • Which pages have been accessed by GoogleBot. This is shown by [googlebot] in data column URL Flags.
    • Which URLs that are not internally linked or used. This is shown by [orphan] in data column URL Flags.
  • Google Search Console CSV exports:
    • Which pages are indexed by Google. This is shown by [googleindexed] in data column URL Flags.
    • Clicks of each URL in Google Search Results - this is shown in data column Clicks.
    • Impressions of each URL in Google Search Results - this is shown in data column Impressions.
  • Majestic CSV exports:
    • Link score of all URLs - this is shown in data column Backlinks score. When available the data is used to further improve calculations behind the data columns Importance score calculated and Importance score scaled.

To start a website crawl from the imported URLs you can:
  • Check option Scan website | Recrawl.
  • Check option Scan website | Recrawl (listed only) - this will avoid including any new URLs to analysis queue and results output.

To crawl the imported URLs in the external tab, tick options:
  • Scan website | Data collection | Store found external links option
  • Scan website | Data collection | Verify external URLs exist (and analyze if applicable)



Export Data to HTML, CSV and Tools Like Excel

Generally speaking, you can export the content of any data control by focusing/clicking it followed by using the File | Export selected data to file... or File | Export selected data to clipboard... menu items.

The default is to export data as standard .CSV files, but in-case the program you intend to import the .CSV files into have specific needs, or if you would like to have e.g. column headers listed as well, you can adjust settings in menu File | Export and import options.

The data you will usually export is found in the main view. Here you can create custom exports and reports that contain just the information you need. Just select which columns are visible activate the quick filters you want (e.g. only 404 not found errors, duplicate titles or similar) before exporting.

Alternatively, you can also use the built-in reporting buttons that contains various configuration presets:

presets for common reports

Note: You can create many more data views if you learn how to configure filters and visible columns.


How to Create Sitemaps

Use the Quick presets... button to optimize your crawl - e.g. to create a video sitemap for a website with externally hosted videos.

Afterwards, just click the Start scan button to initiate a website crawl.

create sitemap scan website

When the website scan has finished, pick the sitemap file kind you want to create and click the Build selected button.

build website xml sitemap

You can find a complete list of tutorials for each sitemap file kind in our online help.


See URLs with AJAX Fragments and Content

Quick explanation of fragments in URLs:
  1. Page-relative-fragments: Relative links within a page:
    http://example.com/somepage#relative-page-link
  2. AJAX-fragments: client-side Javascript that queries server-side code and replaces content-in-browser:
    http://example.com/somepage#lookup-replace-data
    http://example.com/somepage#!lookup-replace-data
  3. AJAX-fragments-Google-initiative: Part of the Google initiative Making AJAX Applications Crawlable:
    http://example.com/somepage#!lookup-replace-data
    This solution has since been deprecated by Google themselves.

Before website scan:
  • Hash fragments # are stripped when using default settings. To change this, uncheck:
    • In Scan website | Crawler Options | Cutout "#" in internal links
    • In Scan website | Crawler Options | Cutout "#" in external links
  • Hashbang fragments #! are always kept and included.
  • If you want to analyze AJAX content fetched immediately after the initial page load:
    • Windows: In Scan website | Crawler engine select HTTP using WinInet + IE browser
    • Windows: In Scan website | Crawler engine select HTTP using Mac OS API + browser

After website scan:
  • For an easy way to see all URLs with #, use the quick filter.
  • If you use #! for AJAX URLs, you can benefit from:
    1. Enable visibility of data column Core data | URL content state flags detected.
    2. You can filter or search for flags "[ajaxbyfragmentmeta]" and "[ajaxbyfragmenturl]"

show all URLs with fragments


Windows, Mac and Linux

TechSEO360 is available as native software for Windows and Mac.

The Windows installer automatically selects the best binary available depending on the Windows version used, e.g. 32 bit versus 64 bit.

On Linux, you can often instead use virtualization and emulation solutions such as WINE.


TechSEO360
TechSEO360 | help | previous | next
SEO website crawler tool that can find broken links, analyze internal link juice flow, show duplicate titles, perform custom code/text search and much more.
This help page is maintained by
As one of the lead developers, his hands have touched most of the code in the software from Microsys. If you email any questions, chances are that he will be the one answering.
Share this page with friends   LinkedIn   Twitter   Facebook   Pinterest   YouTube  
 © Copyright 1997-2024 Microsys

 Usage of this website constitutes an accept of our legal, privacy policy and cookies information.