Text, HTML, ROR, RSS and XML Sitemaps Compared
Guide about sitemaps. Comparison of text, HTML, RSS, ROR and XML sitemaps. All sitemap differences explained.
HTML sitemaps can be:
- Viewed by all browsers including FireFox, IE and Opera.
- Crawled by all search engines including Google, Bing and Yahoo.
Some HTML sitemap tips and tricks:
- HTML documents can be generated by PHP, ASP etc. It is the output format that matters.
- Limit yourself to a few hundred links per page for best website results. Makes it easier to find your important pages.
- You can read our article about
creating HTML sitemaps for more detailed information.
Code example of HTML:
<html lang="en">
<head>This is a site map</head>
<body>
<h1>header of HTML site map</h1>
<p>site map paragraph with links
</body>
</html>
|
XHTML is the HTML specification moved into the
XML standard.
Sitemap file with XHTML and HTML differences highlighted:
<?xml version="1.0" encoding="UTF-8">
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>This is a site map</head>
<body>
<h1>header of XHTML site map</h1>
<p>site map paragraph with links</p>
</body>
</html>
|
Text sitemaps contain one website url per line.
Many search engines including Google and Yahoo can scan text sitemaps.
Improve compatibility between text sitemaps and search engines:
- For Yahoo, name the primary text sitemap file urllist.txt.
- Save text file sitemaps as UTF-8 documents. Especially if you have website urls with non-English characters.
- Each text sitemap file should contain no more than 50.000 urls.
Example of text sitemap file:
http://www.example.com/
http://www.example.com/some-directory/
|
Be sure to check our
text sitemap tutorial,
so you can easily generate URL list text files for all your websites.
The
RSS protocol is often used in
feed files for blogs, forums etc.
The RSS file format uses XML and has evolved over multiple versions and names, all fairly compatible with each other:
- Really Simple Syndication (RSS 2.0)
- RDF Site Summary (RSS 1.0 and RSS 0.90)
- Rich Site Summary (RSS 0.91)
After Google and Yahoo adopted RSS feeds as a kind of website sitemaps,
more search engines have followed.
Note: There is no official standard for splitting RSS feed sitemaps into multiple files.
However, if your RSS sitemap feed is too large, you may wish to, instead of just normal sitemap file split, create a RSS feed file per
website category. (If using a sitemap generator tool try use include/exclude filters.)
Example of a RSS feed sitemap file:
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
<channel>
<title>Website title</title>
<link>http://www.example.com</link>
<generator>A1 Sitemap Generator</generator>
<lastBuildDate>Tue, 13 Mar 2007 22:28:20 GMT</lastBuildDate>
<item>
<title>Page 1</title>
<link>http://www.example.com/page1.html</link>
</item>
<item>
<title>Page 2</title>
<link>http://www.example.com/page2.html</link>
</item>
</channel>
</rss>
|
ROR expands on the RSS protocol with its own extensions.
The standard file extension for ROR files is
.ror. All search engines that understand RSS
sitemap files continue to understand the RSS parts of ROR files. However,
no major search engine, if any at all,
currently supports the ROR sitemap extensions. If you know of any major search engine that
states they support ROR sitemaps, please
write.
ROR sitemap file with the ROR namespace extensions of RSS highlighted:
<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:ror="http://rorweb.com/0.1/">
<channel>
<title>Website title</title>
<link>http://www.example.com</link>
<generator>A1 Sitemap Generator</generator>
<lastBuildDate>Tue, 13 Mar 2007 22:28:20 GMT</lastBuildDate>
<item>
<title>Page 1</title>
<link>http://www.example.com/page1.html</link>
<ror:keywords>page1-keyword1, page1-keyword2, page1-keyword3</ror:keywords>
<ror:updatePeriod>day</ror:updatePeriod>
</item>
<item>
<title>Page 2</title>
<link>http://www.example.com/page2.html</link>
<ror:keywords>page2-keyword1, page2-keyword2, page2-keyword3</ror:keywords>
<ror:updatePeriod>day</ror:updatePeriod>
</item>
</channel>
</rss>
|
In 2005 Google started its own sitemaps protocol based on XML. It was called
Google Sitemaps.
Google later convinced more search engines to follow and the standard was renamed to
XML sitemaps protocol. Currently Google, Yahoo, Bing, Ask, IBM and possibly more supports XML sitemaps.
It is likely that more search engines will implement support for XML sitemaps.
The protocol of XML sitemaps also defines autodiscovery, i.e. how search engines can automatically discover website xml sitemaps.
The answer is
linking to the XML sitemap, e.g.
sitemap.xml, from
robots.txt.
User-agent: *
Sitemap: http://www.example.com/sitemap.xml
|
Instead of just pointing to one XML sitemap file for auto discovery, you can list multiple sitemaps:
Sitemap: http://www.example.com/sitemap-1.xml
Sitemap: http://www.example.com/sitemap-2.xml
|
Or point to XML sitemap index file:
Sitemap: http://www.example.com/sitemap-index.xml
|
Information about XML sitemaps protocol:
- Each XML sitemap file can contain max 50.000 urls and be 10 mb in size.
- It is possible to link 1000 XML sitemaps using a sitemap index file.
- You can read our article about
page priorities in XML sitemaps.
- XML sitemap files and sitemap index files have to be stored as UTF-8 documents.
Example of XML sitemaps file:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc></loc>
<priority>1.0</priority>
<changefreq>weekly</changefreq>
<lastmod>2007-06-18</lastmod>
</url>
<url>
<loc>blogs/</loc>
<priority>0.8</priority>
<changefreq>weekly</changefreq>
<lastmod>2007-06-21</lastmod>
</url>
</urlset>
|
You can find many derived formats of the standard XML sitemaps protocol, most created by Google.
If you are interested in creating XML sitemaps or any of its derived formats, check these tutorials: