OpenHarvest

OpenHarvest is an open source web robot or spider which can be used to extract information from a range of data sources. It was initially developed to collect Dublin Core metadata from web pages but has since been enhanced to analyze geospatial information and full text.

In addition to harvesting information from html web pages, OpenHarvest is extensible via our familiar 'plug-in' mechanism. Plug-ins exist for MS Office (e.g. .doc, .xls), OpenOffice and PDF documents.

Status:

Released

License:

GPL