Microsys
  

Firewall Causes Problems for Crawler in Website Scraper

While our website scraper program uses normal HTTP internet connections for crawling websites, some firewall solutions will still block our software unless you take direct action.
Help: overview | previous | next

 To see all the options available, you will have to switch off easy mode 

 With options that use a dropdown list, any [+] or [-] button next to adds or removes items in the list itself 

How Firewalls Interact with Internet Enabled Software

Most firewall software programs default to silently block all internet enabled applications unless explicitly specified otherwise in configuration. This includes most Windows programs such as our website scraper program.
  • If you get no URLs found in website crawl and related tools, firewalls are often the reason.
  • If you get flaky results, odd errors etc. net traffic filtering can be the reason.

One hint of firewall or internet security software being the cause is if you have response codes like these listed in the website scan results:
  • 500 : Internal Server Error
  • 503 : Service Temporarily Unavailable
  • -4 : CommError

Note: Another possible reason for the above problems can be modules installed on the webserver or website that blocks unknown crawlers.


Firewall Solutions to Get Website Scraper Working

NOD32 client version 3:
  • View advanced mode
  • Select Setup
  • Click - Antivirus and antispyware
  • In Web access protection click Configure
  • Expand HTTP and click Web browsers
  • NOD32 will automatically consider" A1 Website Scraper as web browser" (checked) - you must uncheck it for A1 Website Scraper to work.


Norton 360:
  • Whitelist / add the program A1 Website Scraper in Program Rules


ESET Smart Security:
Solution:
  • Set it to learning mode.


Kaspersky anti-virus:
Symptoms:
  • URL timeouts with Indy HTTP engine.
  • URL 404 response codes with WinInet HTTP engine
Solution:
  • Pause it


Other software and hardware firewall solutions:
Symptoms:
  • Various errors and/or no crawling
Solution: Mimic user browser behavior (like some other programs also do):
  • In Scan website | Crawler engine to HTTP using WinInet engine and settings (Internet Explorer)
  • In General Options | Internet Crawler to Mozilla/4.0 (compatible; MSIE 8.0; Win32)
  • In Scan website | Crawler engine lower amount of simultaneous conections, possibly all down to one.
  • In Scan website | Crawler engine increase the amount of time between active connections.
This help page is maintained by

As one of the lead developers, his hands have touched most of the code in the software from Microsys.

If you email any questions, chances are that he will be the one answering them.
A1 Website ScraperAbout A1 Website Scraper

Extract data from sites into CSV files. By scraping websites, you can grab data on websites and transform it into CSV files ready to be imported anywhere, e.g. SQL databases
     
share   LinkedIn   Twitter   Facebook   Pinterest   Google+   YouTube  
 © Copyright 1997-2016 Microsys
 Usage of this website constitutes an accept of our legal, privacy and cookies information.