Abstract: Crawling website for blogs and forums such as SMF, VBulletin etc. can sometimes take a long time. However, proper configuration of website download can speedup website forum scan. Navigate:Website Download | Buy | Download | Help Index
General Website Download Tips for Crawling Forums and Blogs
Forums and blogs are no different from other websites. Rarely will you ever need to configure website download in a special way.
However, here is a list of common topics for large and/or database websites:
Use
resume scan support in sitemap generator tool.
Notice that you can improve resume by disabling option
Scan website | Crawler options | Apply "webmaster" and "output filters" after website scan stops.
Pages marked noindex can still be filtered off when creating the forum XML sitemap.
Including content otherwise only available for subscribers using password protected pages.
Use output filters to exclude certain URLs from being included in generated forum sitemaps.
Use analysis filters to prevent certain URLs in being crawled / analyzed.
Website Download Example Settings for Popular Forums and Blogs
The following settings are for demonstration purposes.
Most likely you will never need to configure these options.
Should you need to configure settings, take time to investigate above
links and what you need. Then possibly look at underneath for inspiration. Remember, few blogs and
forums are exactly the same.
Post form data : vb_login_username=yourusername&vb_login_password=yourpassword&cookieuser=1&s=&do=login&vb_login_md5password=&vb_login_md5password_utf=
Configure crawler/analysis and output/list exclude filters
Necessary
:login.php?logout
Recommended
:profile.php
:login.php
WordPress
Configure login
Login path : http://blog.example.com/wp-login.php
Post form data : log=yourusername&pwd=yourpassword&rememberme=forever&submit=Login+%C2%BB&redirect_to=wp-admin%2F
Configure crawler/analysis and output/list exclude filters
Necessary
:wp-admin/
:wp-login.php?action=logout
Recommended
:wp-login.php
Note
If you do not exclude "admin" section using filters, try avoid edit, delete, logout and related link types.