Microsys

XML Sitemaps Generated in UTF-8 or ASCII Character Format

ASCII is subset of UTF-8

The first 0..127 characters in UTF-8 are the same as in ASCII.


UTF-8 documents and BOM

Some UTF-8 files may start with a socalled BOM (byte order mark) to identify it as a unicode UTF-8 document file.

The BOM is not required for XML or UTF-8 documents. It just helps most unicode tools to handle the unicode text correctly. (Although ASCII only compliant document parsers may choke at it.)

The BOM for UTF-8 looks like this in hexadecimal: $EF $BB $BF. To view the BOM in XML document files such as sitemaps, you will need to use tools such as hex editors.

You can configure how the sitemap generator software creates XML sitemaps.
In Create sitemap | Document options | Character set and type you find options:
  • Always save sitemap files as UTF-8.
  • Save UTF-8 sitemap files with BOM.


URL Encode Characters not ASCII in XML Sitemaps

The sitemaps protocol defines that all non-ASCII characters are to be URL encoded even though the XML sitemap file is defined as UTF-8. That is not a problem as ASCII is a subset of UTF-8. To read more, check our article about XML sitemaps URL encoding.

Website software tools


Business software utilities


Popular freeware programs

Online tools


Webmaster articles


Website promotion resources

 © Copyright 1997-2010 Microsys | about | contact | legal | privacy