Website Scraper and Joomla Websites
How to create sitemaps for Joomla websites using our sitemap generator
If your website is built using Joomla, possibly also with some Joomla plugins, you can use A1 Website Scraper.
Note: If you encounte any problems, you can often benefit from using the Joomla website CMS scan preset found in
Scan website | Quick presets button before you start the website crawl.
From feedback received by customers of A1 Website Scraper
and its sibling tools,
it appears that some Joomla installations utilize a crawler throttling system.
This gives 403 Access Forbidden
when crawling the Joomla website.
Here is a list of modules and settings to configure in Joomla:
- Sh404SEF plugin (for SEO): Disable anti-flood configuration.
There are also various Joomla plugins that create duplicate URLs
Scan website | Crawler engine: Set max simultaneous connections/threads to one.
Scan website | Crawler engine: Set number of miliseconds "crawl delay" between connections to 2000.
Scan website | Crawler settings: Check Consider non-redirected with-slash and non-slash URLs as "duplicates".
Scan website | Crawler settings: Check Consider non-redirected index file names as "duplicates".
More settings to configure if you still have problems after doing above:
- Set General options and tools | Internet crawler | User agent ID to Googlebot/2.1 (+http://www.google.com/bot.html).
- Check full list of configuration solutions at our help page for problematic websites.
You may also want to add the following exclusions to