Introduction
The Metadata Extraction
Tool was developed by the National Library of New Zealand to programmatically
extract preservation metadata from a range of file formats like PDF documents,
image files, sound files Microsoft office documents, and many others.
The tool was initially
developed in 2003 and released as open source softtware in 2007. The current
version can be downloaded from the SourceForge download
page.
Purpose
of the Metadata Extraction Tool
The Tool builds on the
Library's work on digital preservation, and its logical
preservation metadata schema. It is designed to:
The Tool was designed for preservation processes and activities, but can be used to for other tasks, such as the extraction of metadata for resource discovery.
Supported
File Formats
The Metadata Extract
Tool includes a number of 'adapters' that extract metadata from specific file
types. Extractors are currently provided for:
If a file type is unknown the tool applies a generic adapter, which extracts data that the host system 'knows' about any given file (such as size, filename, and date created).
Capabilities
The tool has both
a Microsoft Windows interface and a UNIX command line interface. This enables
work to be automated through batch processing or processed on an individual
basis as required.
The application opens all files as read-only, ensuring the integrity of original files. The tool only reads header information, so the extraction process is quick.
Open Source
Development
The Tool is written
in Java and XML and is distributed under the Apache Public License (version
2).
Developers may be interested in extending some of the key components of the Metadata Extraction Tool such as extending existing adapters or developing new ones to process other file types, or creating new XSLT files to generate different XML output formats.
Please refer to Developers Guide for more information on these components.