Microsys
        

Sitemaps URL Encode of Characters with Percentage Encoding

Explanation of URL encoding and what the percentage encoding does. Explanation of URL encode including why sitemaps and search engines often convert to URL encoded URLs.
Navigate: overview | previous | next

Quick Explanation of URL Encoding

Characters in URLs are usually URL encoded when:
  • Character appears in a context where its usage is reserved. This can often be seen in GET parameter values.
  • Character is not ASCII, i.e. within 7bits. In such cases, the character is converted into to UTF-8, and all bytes in each character are then encoded into the URL.


URL Encode Uses Hex Percentage Encoding for Characters

With URL encoding, each ASCII character / each byte in each UTF-8 character is converted into HEX number system notation. hexadecimal number system is in URLs presented with % followed by two symbols, each being either in 0-9 or A-F range.

Examples:
  • ASCII space character has byte value 32 which when URL encoded becomes %20:
    • In decimal: 32 = 3*10 + 2*1.
    • In hexadecimal: 20 = 2*16 + 2*0.


URL Encoding in Website and Page HTML Source

If you are unsure if you are using URL encoding, perhaps even unnecessary URL encoding, you should check the output page source first. Most browsers support a view source option.

With link checker and sitemap tools such as A1 Sitemap Generator it can be argued if links with illegal or non-standard URL encoding should be ignored or converted before shown in website scan results. Therefore you can use the following options to control if URLs are percentage encoded during website scan:
  • Scan website | Crawler options | Ensure URL "path" component is percentage encoded.
  • Scan website | Crawler options | Ensure URL "query" component is percentage encoded.

Note: If you are fixing linking errors in your website, remember you can see information about all internal links and redirects.


URL Encoding in XML Sitemaps and Webserver

If you have URLs that require to be URL encoded, it is an error not to URL encode them. Some search engines, web crawlers, browsers, servers etc. are able to correctly understand URLs that are not properly encoded, but it is always safer to have your URLs properly URL encoded / URL escaped.

Quote from official sitemaps protocol website:
In addition, all URLs (including the URL of your Sitemap) must be URL-escaped and encoded for readability by the web server on which they are located.


Further Reading About URL Character Encoding

Before you start reading:
  • Rules for URL encoding varies depending on the place and context in the URL.
  • There are a few inconsistensies in RFC standards due to updates and revisions.

Resources about percent encoding in URLs:
  • RFC 1738 - Functional Recommendations for Internet Resource Locators. RFC 1738 is from February 1995.
  • RFC 2396 - Uniform Resource Identifiers (URI): Generic Syntax. RFC 2396 is from August 1998.
  • RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax. RFC 3986 is from January 2005.
  • Percent Encoding - Wikipedia about percent encoding / hexadecimal % URL encoding.
Help page primarily maintained and written by

As one of the lead developers in Microsys, his hands have touched almost all the code in the software available at this website. If you email any questions, chances are he will be the one answering them.
A1 Sitemap GeneratorAbout A1 Sitemap Generator

Build all kinds of sitemaps including text, visual HTML / CSS, RSS, XML, image, video, news and mobile for all your websites no matter the platform they use.
share   LinkedIn   Twitter   Facebook   Pinterest   Google+   YouTube

Webmaster and website software tools


Business and desktop software utilities

Website and webmaster guides


Search engine optimization help

 © Copyright 1997-2014 Microsys
 Usage of this website constitutes an accept of our legal, privacy and cookies information.