Microsys
        

Import and Export XML and CSV Data Files in Sitemap Generator

Options for Import and Export Data

You can find general options for importing and exporting website data in menu: Options - Data import/export:
  • Export CSV Data with Headers
  • Export CSV Data with URL
  • Export CSV as UTF8 with BOM


Import and Export Website Data in CSV Files

You can enable Export and Import buttons by clicking/selecting/focusing the control with which you wish too import/export data.
  • Most lists, text boxes, tree and grid views, etc. can have the data they contain be exported as-is to text or CSV files.
  • Some controls also support import of data, e.g. the Website structure you can see in Analyze Website tab.

How to import or export website data to csv and text:
  1. Select the control, e.g. by clicking the mouse cursor on it.
  2. The Export and Import buttons are now enabled if applicable. (Also found in the File menu.)

website data export xml csv
Active/selected control is "tree view" to the left
Notice you can change visible data columns and filter visible URLs


Exporting Website Structure Data to CSV

website data export xml csv
Active/selected control is "tree view" to the left.
You can hide/show columns before export.
Note: Screenshot is from A1 Website Analyzer. Only this program have all shown data columns.


Unicode UTF-8 CSV Files and OpenOffice or Microsoft Office Import

Depending on CSV import/export UTF-8 BOM setting you selected in Options - Data import/export, sitemap generator can output CSV files using UTF-8 without BOM (this can sometimes be the same as pure ASCII text files) or UTF-8 with BOM.

However, even with this, some versions of Open Office and Microsoft Office can have problems importing CSV data. If you experience problems (not likely for e.g. English website data exports), you can use the import dialog in the Office tool.

office import csv unicode utf8


Projects Website Data Saved in XML

Structure data extracted from a resource is often called META data or "data about data". When you save projects in A1 Sitemap Generator, a vast amount of such data is saved into XML files.

Because it is XML, you can easily perform data analysis and datamining (mine the data for more information). There exist wrappers for this in almost all languages, e.g. Java, PHP, C#, Visual Basic, Delphi etc.

website data export xml csv
  • Website project meta data is stored in XML documents perfect for data mining. Some examples:
    • Totals data:
      • Total amount of links within a site
      • Total amount of pages that link within a site
      • Minimum amount of links any page has to it
      • Maximum amount of links any page has to it
      • Minimum amount of pages any page has linking to it
      • Maximum amount of pages any page has linking to it

    • Items collection data:
      • Amount of items found. This can be pages, images, etc.
      • Item data:
        • Page title
        • Response headers
        • Response code
        • Response text
        • Response time
        • Download time
        • Full path
        • Relative path (within site)
        • File extension
        • File kind
        • File size
        • Charset
        • Last modified (HTTP header)
        • Links found list
        • Linked to from list (includes a list and count of all pages and links)
        • Used as source from list (e.g. wherefrom an image or javascript is used)
        • Redirected to from list (view all and full redirection chains)
        • Summary data about what was found within a directory; file types, how many of these not found, etc.
        • Calculated page importance. Raw value and 0-10 scaled. For details, see the "website data" section.

If you have saved your project to c:\projects\myproject.ini, you can find the XML files at c:\projects\myproject\.

If you prefer to have easy-to-read fields and indented XML, you should uncheck Options - Favour save/load XML speed. However, if you have huge websites, and are using software to perform further datamining, you may want to leave this option checked since it decreases the XML document sizes with up to 30%.


XML File Structure and Documentation

Field name Speed config Description
<data>    
 
----<meta>    
--------<version>    
--------<fast>    
--------<dataexrefs>    
----</meta>    
 
----<structure>    
--------<rootpath>    
--------<checkedlevel>    
----</structure>    
 
----<totals>    
--------<linked>    
------------<allpagesto>    
------------<minpagesto>    
------------<maxpagesto>    
------------<allrefersto>    
------------<minrefersto>    
------------<maxrefersto>    
--------<linked>    
----</totals>    
 
----<items>    
 
--------<item> *    
 
------------<imb>   information meta data
----------------<fs_ar>   analysis required
----------------<fs_as>   analysis started
----------------<fs_ac>   analysis completed
------------</imb>    
 
------------<checked>   have we verified address by a "request and response"
------------<title>    
------------<allheaderstext> <allht>  
------------<responsecode> <recode>  
------------<responsetimeouter> <reto>  
------------<downloadtimeouter> <doto>  
------------<downloadtimeouter>    
------------<pathroot>    
------------<pathrela>    
------------<realext>    
------------<kindext>    
------------<valerrs>    
------------<charset>    
------------<sizeexpected> <sizeex>  
------------<sizeconfirmed> <sizeco>  
------------<lastmodified> <lastmo>  
------------<revisitaftermins> <revmins>  
 
------------<linkstotalall> <lksta>  
------------<linkstotalto> <lkstt>  
------------<linkstolist> <lkstl>  
----------------<linkstoitem> * <lksti>  
------------</linkstolist> </lkstl>  
 
------------<linkedtotalall> <lnkta>  
------------<linkedtotalfrom> <lnktf>  
------------<linkedfromlist> <lnkfl>  
----------------<linkedfromitem> * <lnkfi>  
------------</linkedfromlist> </lnkfl>  
 
------------<sourcedtotalall> <srcta>  
------------<sourcedtotalfrom> <srctf>  
------------<sourcedfromlist> <srcfl>  
----------------<sourcedfromitem> * <srcfi>  
------------</sourcedfromlist> </srcfl>  
 
------------<redirectedtotalall> <redta>  
------------<redirectedtotalfrom> <redtf>  
------------<redirectedfromlist> <redfl>  
----------------<redirectedfromitem> * <redfi>  
--------------------<redirectedfromitemfrom> <redfif>  
--------------------<redirectedfromitemtype> <redfit>  
--------------------<redirectedfromitemchain> <redfic>  
------------------------<redirectedfromitemring> * <redfir>  
--------------------</redirectedfromitemchain> </redfic>  
----------------</redirectedfromitem> </redfi>  
------------</redirectedfromlist> </redfl>  
 
------------<importancescore>    
------------<importancescorescaled>    
------------<changefreqscorescaled>    
 
------------<summaryfoundall>    
------------<summaryfoundlist>    
----------------<summaryfounditem> *    
--------------------<summaryfounditemisdir>    
--------------------<summaryfounditemextreal>    
--------------------<summaryfounditemextkind>    
--------------------<summaryfounditemresponsecode>    
--------------------<summaryfounditemcount>    
----------------</summaryfounditem>    
------------</summaryfoundlist>    
--------</item>    
 
----</items>    
 
</data>    

Webmaster and website software tools


Business and desktop software utilities

Website and webmaster guides


Search engine optimization help

 © Copyright 1997-2012 Microsys