|
|
Language Detection and Analysing Pages
You can improve analysis of similar content by ensuring language identification is correct.
Page Language Detection in A1 Website Analyzer
Our software determines the primary page language by checking the following things:
- Checks if the webserver responds with content-language HTTP response header:
- PHP pages: Insert this code <?php header("Content-Language: en"); ?>.
- The page is checked for content-language META tag:
<meta http-equiv="content-language" content="en">
- The page is checked for lang inside the HTML tag:
<html lang="en">
- The page is searched for Open Graph Protocol attribute property og:locale inside META tags.
- The page is checked for xml:lang inside the HTML tag:
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
- The page is checked for alternate / hreflang inside the link tag:
<link rel="alternate" href="http://example.com/name-of-page.html" hreflang="en">
- The page URL is checked for common language/culture and country codes.
Note: This requires enabling option Scan website | Data collection | Inspect URLs to detect language. For more info see:- Language culture codes: https://msdn.microsoft.com/en-us/library/ee825488.aspx
- Country codes: http://ec.europa.eu/eurostat/statistics-explained/index.php/Glossary:Country_codes
- Planned: Compare content against word lists for each language. Select best match.
