Microsys
  

XML Sitemaps Generated in UTF-8 or ASCII Character Format

The sitemaps protocol defines that XML sitemap documents must be UTF-8 and contain no characters outside ASCII range.
Help: overview | previous | next

ASCII is subset of UTF-8

The first 0..127 characters in UTF-8 are the same as in ASCII.


UTF-8 documents and BOM

Some UTF-8 files may start with a socalled BOM (byte order mark) to identify it as a unicode UTF-8 document file.

The BOM is not required for XML or UTF-8 documents. It just helps most unicode tools to handle the unicode text correctly. (Although ASCII only compliant document parsers may choke at it.)

The BOM for UTF-8 looks like this in hexadecimal: $EF $BB $BF. To view the BOM in XML document files such as sitemaps, you will need to use tools such as hex editors.

You can configure how the sitemap generator software creates XML sitemaps.
In Create sitemap | Document options | Character set and type you find options:
  • Always save sitemap files as UTF-8.
  • Save UTF-8 sitemap files with BOM.


URL Encode Characters not ASCII in XML Sitemaps

The sitemaps protocol defines that all non-ASCII characters are to be URL encoded even though the XML sitemap file is defined as UTF-8. That is not a problem as ASCII is a subset of UTF-8. To read more, check our article about XML sitemaps URL encoding.
This help page is maintained by

As one of the lead developers, his hands have touched most of the code in the software from Microsys.

If you email any questions, chances are that he will be the one answering them.
TechSEO360About TechSEO360

SEO website crawler tool that can find broken links, analyze internal link juice flow, show duplicate titles, perform custom code/text search and much more.
Share this page with friends   LinkedIn   Twitter   Facebook   Pinterest   Google+   YouTube  
 © Copyright 1997-2018 Microsys

 Usage of this website constitutes an accept of our legal, privacy policy and cookies information.