Microsys
  

Export XML and CSV Data Files in Website Search Engine

A1 Website Search Engine - Export website data to XML and CSV files
Help: overview | previous | next

 To see all the options available, you will have to switch off easy mode 

 With options that use a dropdown list, any [+] or [-] button next to adds or removes items in the list itself 

Export Website Data into CSV Files

You can enable the Export button by clicking/selecting/focusing the control which contains the data you wish to export.
  • Most lists, text boxes, tree and grid views, etc. can have the data they contain be exported as-is to text or CSV files.

How to export website data to csv and text:
  1. Select the control, e.g. by clicking the mouse cursor on it.
  2. The Export button is now enabled if applicable. (Also found in the File menu.)
  3. Choose between saving as comma value separated .csv, tab value separated .tsv, .html and more.

In the screenshots below you can see:
  • Active/selected control is tree view to the left.
  • Notice you can change visible data columns and filter visible URLs.

website data export xml csv

website data export xml csv
(Note: Screenshot is from A1 Website Analyzer. Only this program have all shown data columns.)


Format Options for CSV Data Export

You can find options for website search engine export of CSV files in Options - Data export:
  • Data included:
    • Export CSV Data with Headers
    • Export CSV Data with URL
    • Wrap cells with line breaks in "" (instead of convertering line breaks to spaces)
  • Character format and encoding:
    • UTF-8 with optional BOM. (ASCII is a subset of UTF-8. Ideal for English documents.)
    • UTF-16 LE (UCS-2) with optional BOM. (Used internally in current Windows systems.)
    • Local ANSI codepage. (May not always be portable to other platforms and languages.)

A1 export csv data files as Unicode or codepage
(Selecting ANSI for CSV export in website search engine)


Unicode CSV Files and OpenOffice or Microsoft Office Import

Some versions of Open Office (Libre Office) and Microsoft Office can have problems importing CSV data since they do not automatically detect character encoding format. If you experience problems (not likely for e.g. English website data exports), you can use the import dialog in the Office tools:

office import csv unicode utf8
(Selecting UTF-8 for CSV Import in Open Office / Libre Office dialog)

MS office import csv ansi
(Selecting ANSI for CSV Import in Microsoft Office dialog)


Project Website Data is Saved as XML

Structure data extracted from a resource is often called META data or "data about data". When you save projects in A1 Website Search Engine a vast amount of such data is saved into the XML files.

Because it is XML, you can easily perform data analysis and datamining (mine the data for more information). There exist wrappers for this in almost all languages, e.g. Java, PHP, C#, Visual Basic, Delphi etc.

website data export xml csv
  • Website project meta data is stored in XML documents perfect for data mining. Some examples:
    • Totals data:
      • Total amount of links within a site
      • Total amount of pages that link within a site
      • Minimum amount of links any page has to it
      • Maximum amount of links any page has to it
      • Minimum amount of pages any page has linking to it
      • Maximum amount of pages any page has linking to it

    • Items collection data:
      • Amount of items found. This can be pages, images, etc.
      • Item data:
        • Page title
        • Response headers
        • Response code
        • Response text
        • Response time
        • Download time
        • Full path
        • Relative path (within site)
        • File extension
        • File kind
        • File size
        • Charset
        • Last modified (HTTP header)
        • Links found list
        • Linked to from list (includes a list and count of all pages and links)
        • Used as source from list (e.g. wherefrom an image or javascript is used)
        • Redirected to from list (view all and full redirection chains)
        • Summary data about what was found within a directory; file types, how many of these not found, etc.
        • Calculated page importance. Raw value and 0-10 scaled. For details, see the "website data" section.

If you have saved your project to c:\projects\myproject.ini, you can find the XML files at c:\projects\myproject\.

If you prefer to have easy-to-read fields and indented XML, you should uncheck Options - Favour save/load XML speed. However, if you have huge websites, and are using software to perform further datamining, you may want to leave this option checked since it decreases the XML document sizes with up to 30%.


XML File Structure and Documentation

Field name Speed config Description
<data>    
 
----<meta>    
--------<version>    
--------<fast>    
--------<dataexrefs>    
----</meta>    
 
----<structure>    
--------<rootpath>    
--------<checkedlevel>    
----</structure>    
 
----<totals>    
--------<linked>    
------------<allpagesto>    
------------<minpagesto>    
------------<maxpagesto>    
------------<allrefersto>    
------------<minrefersto>    
------------<maxrefersto>    
--------<linked>    
----</totals>    
 
----<items>    
 
--------<item> *    
 
------------<imb>   information meta data
----------------<fs_ar>   analysis required
----------------<fs_as>   analysis started
----------------<fs_ac>   analysis completed
------------</imb>    
 
------------<checked>   have we verified address by a "request and response"
------------<title>    
------------<allheaderstext> <allht>  
------------<responsecode> <recode>  
------------<responsetimeouter> <reto>  
------------<downloadtimeouter> <doto>  
------------<downloadtimeouter>    
------------<pathroot>    
------------<pathrela>    
------------<realext>    
------------<kindext>    
------------<valerrs>    
------------<charset>    
------------<sizeexpected> <sizeex>  
------------<sizeconfirmed> <sizeco>  
------------<lastmodified> <lastmo>  
------------<revisitaftermins> <revmins>  
 
------------<linkstotalall> <lksta>  
------------<linkstotalto> <lkstt>  
------------<linkstolist> <lkstl>  
----------------<linkstoitem> * <lksti>  
------------</linkstolist> </lkstl>  
 
------------<linkedtotalall> <lnkta>  
------------<linkedtotalfrom> <lnktf>  
------------<linkedfromlist> <lnkfl>  
----------------<linkedfromitem> * <lnkfi>  
------------</linkedfromlist> </lnkfl>  
 
------------<sourcedtotalall> <srcta>  
------------<sourcedtotalfrom> <srctf>  
------------<sourcedfromlist> <srcfl>  
----------------<sourcedfromitem> * <srcfi>  
------------</sourcedfromlist> </srcfl>  
 
------------<redirectedtotalall> <redta>  
------------<redirectedtotalfrom> <redtf>  
------------<redirectedfromlist> <redfl>  
----------------<redirectedfromitem> * <redfi>  
--------------------<redirectedfromitemfrom> <redfif>  
--------------------<redirectedfromitemtype> <redfit>  
--------------------<redirectedfromitemchain> <redfic>  
------------------------<redirectedfromitemring> * <redfir>  
--------------------</redirectedfromitemchain> </redfic>  
----------------</redirectedfromitem> </redfi>  
------------</redirectedfromlist> </redfl>  
 
------------<importancescore>    
------------<importancescorescaled>    
------------<changefreqscorescaled>    
 
------------<summaryfoundall>    
------------<summaryfoundlist>    
----------------<summaryfounditem> *    
--------------------<summaryfounditemisdir>    
--------------------<summaryfounditemextreal>    
--------------------<summaryfounditemextkind>    
--------------------<summaryfounditemresponsecode>    
--------------------<summaryfounditemcount>    
----------------</summaryfounditem>    
------------</summaryfoundlist>    
--------</item>    
 
----</items>    
 
</data>    
This help page is maintained by

As one of the lead developers, his hands have touched most of the code in the software from Microsys.

If you email any questions, chances are that he will be the one answering them.
A1 Website Search EngineAbout A1 Website Search Engine

By giving your offline or online website a capabale search engine, you can ensure more of your visitors stay on your site. Having a search box helps visitors find what they are searching for.
     
share   LinkedIn   Twitter   Facebook   Pinterest   Google+   YouTube  
 © Copyright 1997-2016 Microsys
 Usage of this website constitutes an accept of our legal, privacy and cookies information.