API V1 Documentation
Overview
The EHRI portal has an experimental web API, intended for searching and retrieving a subset of EHRI data in structured JSON format. While it is intended that the scope of the API will broaden in future, it is intended to prioritise convenience over semantic precision, providing a somewhat simplified view of EHRI's data relative to that offered by the HTML site.
At present, information is only available for the following types of item:
- Countries (type:
Country
) - Institutions (type:
Repository
) - Archival descriptions (type:
DocumentaryUnit
) - Virtual archival descriptions (type:
VirtualUnit
) - Authorities (also known as Historical Agents, type:
HistoricalAgent
) - Keywords (also know as Controlled Vocabulary Concepts, type:
CvocConcept
)
The base API URL is /api/v1.
Actions
Four "actions" are currently available:
- Global search at
/search
: Intended for a simple-text query of all country report, institution, and archival description information in the portal. Optionally, the search can be limited to items of specific types. - Retrieving item info by ID at
/{ID}
: If item's IDs are known in advance (or determined via a search), information about them can be fetched individually. - Item-scoped search at
/{ID}/search
: Intended for searching via simple text query within the "scope" of a particular item, retrieving matching child items. For example, a country can be searched for specific repositories, and repositories and archival descriptions for, respectively, top-level and sub-level descriptions. - Related item search at
/{ID}/related
: Intended for searching via simple text query within the set of items related to a given item. For example, archival descriptions which are related to a particular authority or keyword.
The format of returned data conforms to the
http://jsonapi.org/
specification and has content-type
application/vnd
.api+json
.
Global Search
The Global search action (/search
) allows you to search all available item types.
Five parameters are currently supported:
q
- A text query, following the same rules as searching on the portal site.
type
- One of the available data types. Can be used multiple times.
page
- Since results are paginated, this number selects the desired page.
limit
- The number of results to fetch per page, up to a maximum of 100.
facet
- Enables faceting statistics for one or more of the available facet types:
type
,lang
,country
,holder
,dates
. Liketype
these values can also be used as parameters to filter the search results using the available facet values.
Test it!
Run it as a curl command:
curl "https://portal.ehri-project.eu/api/v1/search"
Retrieve an item
For retrieving individual items (of any type) the /{ID}
action is provided, with the
{ID}
being the global EHRI identifier of the item you want.
Test it!
Run it as a curl command:
curl "https://portal.ehri-project.eu/api/v1/us-005578"
Item Search
The item search action (/{ID}/search
) allows you to search within an individual
item, for example: searching the archival descriptions within a particular repository.
The same five parameters as the Global Search action are supported
q
- A text query, following the same rules as searching on the portal site.
type
- One of the available data types. Can be used multiple times.
page
- Since results are paginated, this number selects the desired page.
limit
- The number of results to fetch per page, up to a maximum of 100.
facet
- Enables faceting statistics for one or more of the available facet types:
type
,lang
,country
,holder
,dates
. Liketype
these values can also be used as parameters to filter the search results using the available facet values.
Test it!
Run it as a curl command:
curl "https://portal.ehri-project.eu/api/v1/us-005578/search"
Related item Search
The related item search action (/{ID}/related
) allows you to search items related to a given
item, for example: searching the archival descriptions related to a particular authority or keyword.
The same five parameters as the Global Search action are supported
q
- A text query, following the same rules as searching on the portal site.
type
- One of the available data types. Can be used multiple times.
page
- Since results are paginated, this number selects the desired page.
limit
- The number of results to fetch per page, up to a maximum of 100.
facet
- Enables faceting statistics for one or more of the available facet types:
type
,lang
,country
,holder
,dates
. Liketype
these values can also be used as parameters to filter the search results using the available facet values.
Test it!
Run it as a curl command:
Structure of responses
The responses from the API conform to the http://jsonapi.org specification, so read the documentation there for an overview of what to expect.
The response is a JSON object with up to four fields:
data
- this contains the main body of the response, and is either a list of items or a single item.
links
- the top level "links" field contains links to API actions related to this one. For example, it contains links to the first, last, next, and previous pages if the data is paginated.
included
- contains a list of related items. For example, when searching within an item , the item itself is included with the response, for convenience.
meta
- contains additional relevant metadata, for example the total number of items, and the total number of pages.
Each item type has a different set of possible fields. The naming of the fields for
DocumentaryUnit
, HistoricalAgent
, Repository
items
respectively generally conform to the ISAD(G), ISAAR, and ISDIAH standards. Look at the example responses
for an idea of what to expect.
Additional Parameters
Attribute Filtering
Some datatypes can include a lot of data in their attributes. If you only care about particular
attributes you can specify them using a query string parameter of the form fields[TYPE]=field1,field2
.
For example, to return only the name
attribute for the Repository
type add
a query parameter of the form fields[Repository]=name
to the request URL. Multiple attributes
can be specified as a comma-separated list.
Note: although omitted for clarity above the square brackets must be percent-encoded, e.g:
fields%5BRepository%5D=name
.
Also note that using an empty parameter, e.g. fields%5BDocumentaryUnit%5D=
, will remove all attributes
from the response.
Geospatial Bounding Box
For datatypes that have associated geospatial information, such as repositories, it is possible
to specify a bounding box to constrain the search results. The bounding box is a query parameter that
takes the form bbox=MIN-LAT,MIN-LON,MAX-LAT,MAX-LON
. For example, to search repositories
just within London, one would use:
bbox=51.28,-0.489,51.686,0.236
Sorting
By default, responses to the /search
actions are sorted by relevance, that is,
the degree to which a given result matches the input query. However, responses can be ordered
differently using the sort
parameter with one of the following values:
id
- Orders items by their alphanumeric identifier string.
name
- Orders items alphabetically by name.
updated
- Orders most recently updated/modified items first.
location
- In combination with the
latlon
parameter (see below), orders items with location information by proximity to a given point.
Distance sort
To specify a latitude and longitude as a point from which location-aware items
should be sorted by distance, provide the latlng
parameter with
a value of the form latlon=LAT,LON
. Note: the sort
parameter with a value of location
must also be given for
this to have any effect. For example, to sort items by proximity to King's College
London's Strand Campus use:
sort=location&latlng=51.51,-0.116
Example usage with Python
The script shown below is a Python script for creating a tab-delimited (TSV) file containing three fields: the EHRI item id, its title, and the scope and content field of EHRI archival descriptions.
You can copy the code to a file called scopecontent.py
and run it with URL for
a search of EHRI archival descriptions, e.g:
python3 scopecontent.py "https://portal.ehri-project.eu/api/v1/search?type=DocumentaryUnit&q=title:Amsterdam"
This will run a search for documentary unit items with "Amsterdam" in the title and download all the subsequent pages of items, transforming the selected data to TSV and printing it to the console.