Microsys
  

URL Encode Characters with Percentage Encoding

Learn about URL encoding in sitemaps and what percentage encoding does. Understand why generated XML sitemaps and search engines often convert to URL encoded characters in URLs.

Quick Explanation of URL Encoding

Characters in URLs are usually URL encoded when:
  • Character appears in a context where its usage is reserved. This can often be seen in GET parameter values.
  • Character is not ASCII, i.e. within 7bits. In such cases, the character is converted into to UTF-8, and all bytes in each character are then encoded into the URL.


URL Encode Uses Hex Percentage Encoding for Characters

With URL encoding, each ASCII character / each byte in each UTF-8 character is converted into HEX number system notation. hexadecimal number system is in URLs presented with % followed by two symbols, each being either in 0-9 or A-F range.

Examples:
  • ASCII space character has byte value 32 which when URL encoded becomes %20:
    • In decimal: 32 = 3*10 + 2*1.
    • In hexadecimal: 20 = 2*16 + 2*0.


URL Encoding in Website and Page HTML Source

If you are unsure if you are using URL encoding, perhaps even unnecessary URL encoding, you should check the output page source first. Most browsers support a view source option.

With link checker and sitemap tools such as TechSEO360 it can be argued if links with illegal or non-standard URL encoding should be ignored or converted before shown in the website scan results. Therefore you can use the following options to control if URLs are percentage encoded during website scan:
  • Scan website | Crawler options | Ensure URL "path" component is percentage encoded.
  • Scan website | Crawler options | Ensure URL "query" component is percentage encoded.

Note: If you are fixing linking errors in your website, remember you can see information about all internal links and redirects.



URL Encoding in XML Sitemaps and Webservers

If you have URLs that require to be URL encoded, it is an error not to URL encode them. Some search engines, web crawlers, browsers, servers etc. are able to correctly understand URLs that are not properly encoded, but it is always safer to have your URLs properly URL encoded / URL escaped with percentage encoding.

Quote from official sitemaps protocol website:
In addition, all URLs (including the URL of your sitemap) must be URL-escaped and encoded for readability by the web server on which they are located.


Note: We have seen some tools that erroneously do not properly URL percentage encode with UTF-8 byte values, but instead use bytes values from another document character set or data representation they use internally.


Further Reading About URL Character Encoding

Before you start reading:
  • Rules for URL encoding varies depending on the place and context in the URL.
  • There are a few inconsistencies in RFC standards due to updates and revisions.

Resources about percent encoding in URLs:
  • RFC 1738 - Functional Recommendations for Internet Resource Locators. RFC 1738 is from February 1995.
  • RFC 2396 - Uniform Resource Identifiers (URI): Generic Syntax. RFC 2396 is from August 1998.
  • RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax. RFC 3986 is from January 2005.
  • Percent Encoding - Wikipedia about percent encoding / hexadecimal % URL encoding.
TechSEO360
TechSEO360 | help | previous | next
SEO website crawler tool that can find broken links, analyze internal link juice flow, show duplicate titles, perform custom code/text search and much more.
This help page is maintained by
As one of the lead developers, his hands have touched most of the code in the software from Microsys. If you email any questions, chances are that he will be the one answering.
Share this page with friends   LinkedIn   Twitter   Facebook   Pinterest   YouTube  
 © Copyright 1997-2024 Microsys

 Usage of this website constitutes an accept of our legal, privacy policy and cookies information.