Microsys
  

Language Detection, Similar Content and "hreflang" in XML Sitemaps

Correct identification of language used is useful for many things including when creating XML sitemaps with hreflang information.
Help: overview | previous | next

Page Language Detection in Our TechSEO360

Our software determines the primary page language by checking the following things:
  1. Checks if the webserver responds with content-language HTTP response header:
    • PHP pages: Insert this code <?php header("Content-Language: en"); ?>.

  2. The page is checked for content-language META tag:
    <meta http-equiv="content-language" content="en">

  3. The page is checked for lang inside the HTML tag:
    <html lang="en">

  4. The page is searched for Open Graph Protocol attribute property og:locale inside META tags.

  5. The page is checked for xml:lang inside the HTML tag:
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">

  6. The page URL is checked for common language/culture and country codes.

    Note: This requires enabling option Scan website | Data collection | Inspect URLs to detect language. For more info see:

  7. Planned: Compare content against word lists for each language. Select best match.



XML Sitemaps and Hreflang Information

XML sitemaps allow each page URL to include a list of URLs with its alternate language versions.

To enable this featue in TechSEO360 simply check the option Create sitemap | Include "hreflang" alternate URLs in sitemap files before you build your XML sitemap.

create XML sitemaps with hreflang information

When generating the XML sitemaps, page URLs that are variations of each other in different languages get associated by:
  • HTML page markup including <link rel="alternate" href="http://example.com" hreflang="xxx" />.
  • Page language detection based on a variety of methods.
  • URL page similarity between different language variations.


How Language Information Is Used in News Sitemaps

When generating Google News Sitemaps, one of the XML fields for each news and article URL is the language used on the page.;

Options you can set that will help the crawler:
  • Set Select stop words to match the main language of your website or select auto if it uses multiple languages.


Duplicate Content and Similarity Analysis

By ensuring the language is detected correctly, the crawler can exclude common words and only analyse content words. This means content similarity analysis is improved.

Options you can set that will help the crawler:
  • Set Select stop words to match the main language of your website or select auto if it uses multiple languages.
This help page is maintained by

As one of the lead developers, his hands have touched most of the code in the software from Microsys.

If you email any questions, chances are that he will be the one answering them.
TechSEO360About TechSEO360

SEO website crawler tool that can find broken links, analyze internal link juice flow, show duplicate titles, perform custom code/text search and much more.
Share this page with friends   LinkedIn   Twitter   Facebook   Pinterest   Google+   YouTube  
 © Copyright 1997-2018 Microsys

 Usage of this website constitutes an accept of our legal, privacy policy and cookies information.