Microsys
  

Website Scraper and Joomla Websites

How to create sitemaps for Joomla websites using our sitemap generator

Joomla Websites

If your website is built using Joomla, possibly also with some Joomla plugins, you can use A1 Website Scraper.

Note: If you encounte any problems, you can often benefit from using the Joomla website CMS scan preset found in Scan website | Quick presets button before you start the website crawl.


Joomla Website Scraper Troubleshooing

From feedback received by customers of A1 Website Scraper and its sibling tools, it appears that some Joomla installations utilize a crawler throttling system. This gives 403 Access Forbidden when crawling the Joomla website.

Here is a list of modules and settings to configure in Joomla:
  • Sh404SEF plugin (for SEO): Disable anti-flood configuration.

There are also various Joomla plugins that create duplicate URLs.


Website Scraper Program Settings for Joomla


  • Scan website | Crawler engine: Set max simultaneous connections/threads to one.
  • Scan website | Crawler engine: Set number of miliseconds "crawl delay" between connections to 2000.
  • Scan website | Crawler settings: Check Consider non-redirected with-slash and non-slash URLs as "duplicates".
  • Scan website | Crawler settings: Check Consider non-redirected index file names as "duplicates".

More settings to configure if you still have problems after doing above:
  • Set General options and tools | Internet crawler | User agent ID to Googlebot/2.1 (+http://www.google.com/bot.html).
  • Check full list of configuration solutions at our help page for problematic websites.

You may also want to add the following exclusions to analysis filters and output filters:
  • ::(^|/)itemlist/tag/
  • ::(^|/)item/[0-9]+

A1 Website Scraper
A1 Website Scraper | help | previous | next
Extract data from sites into CSV files. By scraping websites, you can grab data on websites and transform it into CSV files ready to be imported anywhere, e.g. SQL databases
This help page is maintained by
As one of the lead developers, his hands have touched most of the code in the software from Microsys. If you email any questions, chances are that he will be the one answering.
Share this page with friends   LinkedIn   Twitter   Facebook   Pinterest   YouTube  
 © Copyright 1997-2024 Microsys

 Usage of this website constitutes an accept of our legal, privacy policy and cookies information.