HREFLang XML Sitemaps and Language Detection

Correct identification of the primary language on a page can be useful for alternate hreflang information and Google news sitemaps.

XML Sitemaps and Hreflang Sitemaps

XML sitemaps allow each page URL to include a list of URLs with its alternate language versions. This specific concept is often referred to as HREFLang sitemaps, and it helps search engines finding language translations of the same content.

To enable this feature in A1 Sitemap Generator simply check the option Create sitemap | Include "hreflang" alternate URLs in sitemap files before you build your XML sitemap.

Note: You can view this tutorial if you are unsure how to create standard XML sitemaps.

The way page URLs are identified as containing the same content, just translated to different languages, are:
  • HTML page markup including <link rel="alternate" href="http://example.com" hreflang="xxx" />.
  • Page language detection based on a variety of methods described underneath.
  • URL similarity between different language variations.

Page Language Detection in A1 Sitemap Generator

Our software determines the primary page language by checking the following things:
  1. Checks if the webserver responds with content-language HTTP response header:
    • PHP pages: Insert this code <?php header("Content-Language: en"); ?>.

  2. The page is checked for content-language META tag:
    <meta http-equiv="content-language" content="en">

  3. The page is checked for lang inside the HTML tag:
    <html lang="en">

  4. The page is searched for Open Graph Protocol attribute property og:locale inside META tags.

  5. The page is checked for xml:lang inside the HTML tag:
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">

  6. The page is checked for alternate / hreflang inside the link tag:
    <link rel="alternate" href="http://example.com/name-of-page.html" hreflang="en">

  7. The page URL is checked for common language/culture and country codes.

    Note: This requires enabling option Scan website | Data collection | Inspect URLs to detect language. For more info see:
  8. Planned: Compare content against word lists for each language. Select best match.

How Language Information Is Used in News Sitemaps

When generating Google News Sitemaps, one of the XML fields for each news and article URL is the language used on the page.

Options you can set that will help the crawler in addition to what was explained about page language detection:
  • Set Keyword tools | Content keyword analysis | Select stop words to match the main language of your website or select auto if your website uses multiple languages.
