You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It's possible to efficiently retrieve a large number of records, or even all records through the scroll API.
The implementation is using the Elasticsearch scroll API under the hood, which is referred to in the documentation.
The same concepts apply, most importantly:
A scroll is a temporary view into the data, whose search context is kept open for a limited time only.
A scroll can be consumed in batches.
A scroll is a temporary resource that can be addressed by its scroll ID.
The Te Papa Collections API imposes some additional restrictions to avoid too much strain on the cluster:
A scroll expiry limit of 15 minutes max.
A maximum page size of 10,000 records.
A scroll is requested through any of the _scroll APIs. For example, by using /objects/_scroll the results are pre-filtered to only contain collection objects. To retrieve all object types, use the /search/_scroll API.
A scroll is opened with a POST request to a _scroll API, for example:
This requests a scroll that is kept open for 1 minute (duration) and contains 1 result per page (size).
The result looks like an ordinary search result, with one addition, the _metadata.query.scrollId field which contains the unique scroll ID.
The next page of the scroll can then be retrieved through the GET scroll API. An API-root relative link to the next page is included in the Location header of the initial scroll response, or can be build based on the scroll ID in the response body. Example:
Once the scroll is exhausted the GET scroll API returns an HTTP 204 No Content response.
An arbitrary search request can be use to control which records are included in a scroll result.
For example, to only retrieve objects that have been modified recently, a date range query can be added to the initial request:
Bulk data retrieval
It's possible to efficiently retrieve a large number of records, or even all records through the scroll API.
The implementation is using the Elasticsearch scroll API under the hood, which is referred to in the documentation.
The same concepts apply, most importantly:
The Te Papa Collections API imposes some additional restrictions to avoid too much strain on the cluster:
A scroll is requested through any of the
_scroll
APIs. For example, by using/objects/_scroll
the results are pre-filtered to only contain collection objects. To retrieve all object types, use the/search/_scroll
API.A scroll is opened with a POST request to a
_scroll
API, for example:This requests a scroll that is kept open for 1 minute (duration) and contains 1 result per page (size).
The result looks like an ordinary search result, with one addition, the
_metadata.query.scrollId
field which contains the unique scroll ID.The next page of the scroll can then be retrieved through the GET scroll API. An API-root relative link to the next page is included in the
Location
header of the initial scroll response, or can be build based on the scroll ID in the response body. Example:Once the scroll is exhausted the GET scroll API returns an HTTP 204 No Content response.
An arbitrary search request can be use to control which records are included in a scroll result.
For example, to only retrieve objects that have been modified recently, a date range query can be added to the initial request:
The relevant API documentation is here:
The text was updated successfully, but these errors were encountered: