The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is an XML-based web-service protocol that allows clients to fetch metadata about the contents of digital repositories.
For a complete description of the protocol please see the official pages and the specification document.
EHRI's OAI-PMH endpoint is located at
The protocol consists of six "verbs":
- show information about the current repository (the default verb)
- list supported record sets (record groupings that can be independently harvested)
- list unique identifiers for records within this repository
- Return a set of records and an optional resumption token to fetch subsequent sets if greater than a given maximum allowable page size (see paging)
- list the metadata formats supported by this repository
- fetch metadata for a specific record given its unique identifier
Some of these verbs require additional parameters. For example, the
verbs all require a
Pagination & Resumption Tokens
The various list-based verbs return only partial data sets if the total size of the set
exceeds a fixed value. If this is the case the response will include a
which can be supplied as the value to the
resumptionToken parameter to retrieve the next set
of data. Note: the resumption token value implicitly includes in its state the value
of all parameters other than the verb, so these must not be supplied in addition to the token itself.
EHRI's OAI-PMH endpoint supports both Dublin Core (DC) and Encoded Archival Description (EAD) 2002 format archival descriptions. While the DC descriptions only return the top-level of the archival hierarchy (e.g. the description of the fonds), EAD descriptions include levels below the fonds, if present. This means that in addition to the typically more extensive and specific information found in EAD relative to DC, a description of a fonds — whilst technically a single document — can in practice contain a very large amount of information and this should be borne in mind when using, for example, harvesting tools which may not expect large XML payloads.
Sets allow you to selectively harvest a portion of a repository's records. Since EHRI is an metadata aggregator, we support two
types of set: country and repository. Country set identifiers consist of lower-case ISO 3166 alpha-2 (2-letter) codes. Repository
set identifiers are compound, consisting of the country code, a colon, and the repository's EHRI ID (which also contains the country
code), for example
Run it as a curl command:
In addition to the standard parameters, the
until parameters to specify UTC dates for selective harvesting in