Work on Brit Library's Free Software archival crawler!

Mark sez, "I run the web archiving programme at the British Library and I've just posted a tender for the development of a smart archiving crawler. The smart crawler is to be free software under the GNU General Public License (GPL). The project may be of interest to BB readers in the search, document classification and ranking, digital library, or archiving space."

The British Library and the Bibliothèque Nationale de France are embarking on a programme to archive resources on the World Wide Web in their respective national domains. To achieve this programme, the British Library as lead partner wishes to tender for a contract to multiple suppliers to provide development services and/or software technology for a Smart Archiving Crawler. This will comprise of a framework controlling and interacting with Heritrix, the Internet Archive's open source archiving web crawler, and modules which provide prioritisation capabilities using document thematic analysis and link weighting.

113k Word Link

(Thanks, Mark!)