Skip to content

Search strategies

Douglas Campbell edited this page Sep 5, 2018 · 2 revisions

On this page

Basic searching

Specify your query terms in the q query string parameter. You can run a search against a single API endpoint or use the /search endpoint for a wide search.

GET https://data.tepapa.govt.nz/collection/search/?q=kiwi&size=1

{
  "results": [
    {
      "id": "1514",
      "type": "Category",
      "title": "kahu kiwi",
      "creditLine": "Matauranga Māori Thesaurus",
      "prefLabel": "kahu kiwi",
      "scopeNote": "Kiwi feather cloak",
      "pid": "tepapa:collection/category/1514",
      "iri": "http://tepapa.govt.nz/collection/category/1514",
      "href": "https://data.tepapa.govt.nz/collection/category/1514",
      "_meta": {
        "created": "2011-10-31T05:08:15.000+0000",
        "modified": "2012-05-24T04:29:49.000+0000",
        "qualityScore": 1
      },
      "_api": {
          "score": 18.588493
      }
    }
  ],
  "_metadata": {
    "resultset": {
      "count": 1999,
      "from": 0,
      "size": 1
    }
  }
}

Basic query syntax

Query type Example
Keyword q=kiwi
Phrase q="new plymouth"
Any word q=batch crib
All words q=batch AND crib
Wildcards q=aptery*
Range q=id:[400 TO 500]
q=_meta.created:>2016-03
Field search q=title:crib
q=prefLabel:"new plymouth"
q=title:(john AND smith)
Limit by field search q=kiwi AND collection:TaongaMāori

For more details on searching see Collections Online's search tips

For the full syntax available see the Elastic query string syntax guide.

Filtering

Simple filtering with AND

Simple filtering is supported within the basic search query by using the boolean AND operator to add your filter.

/object?q=kiwi AND collection:TaongaMāori
/search?q=kiwi AND (type:Place OR type:Organisation)

However, this is only possible with a few root-level fields such as: type, collection, species

Rich filtering with a complex POST query

The Collections API accepts more complex query statements in the body of a POST request. See more details on making a POST search request containing a ApiSearchRequest query data format query.

This allows you to filter on facetable fields and nested fields (which isn't possibly in simple filtering with the AND boolean operator). See below for details on facetable fields.

POST https://data.tepapa.govt.nz/collection/search
Content-Type: application/json

{
  "query" : "kiwi",
  "filters": [ 
   {
    "field": "production.facetCreatedDate.decadeOfCentury",
    "keyword": "1970s"
   }
 ]
}

Response for the filtered query:

{
  "results": [
  {
    "id": 376710,

    "type": "Object",

    "title": "LP Record \"Haka and Poi: Maori Concert Parties of Queen Victoria and St. Stephen's Schools\"",
    "production": [
      {
        "title": "Queen Victoria and St. Stephen's Schools; musician; 1974",
        "createdDate": "1974-01-01",
        "verbatimCreatedDate": "1974"
        "facetCreatedDate": {
          "century": "20th century",
          "dayOfWeek": "Tuesday",
          "decadeOfCentury": "1970s",
          "era": "Common Era (CE)",
          "monthOfYear": "January",
          "temporal": "1974-01-01",
          "verbatim": "01 Jan 1974 / 31 Dec 1974",
          "year": "1974"
      }
    ]
  },
  ...
  ],
  "_metadata": { ... }
}

Faceting

Faceting dyanmically groups the data in a field into categories (or "terms"). It summarises all the values that exist in a field to show the most common values. Often this is used as a first step, to show the possible entry points for 'drilling down' further into the query results, and then one of those values is selected to actually filter the query by.

You can see faceting in action in a Collections Online website search. The search results page automatically includes facets for the type and collection field - showing common values found, and how many there are of each.

Type:
Object (732)
Specimen (1225)
Topic (56)
Publication (9)

Faceted query

In addition to performing basic queries, the advanced search interface allows you to perform a faceted search. The faceting implementation utilises Elasticsearch Term aggregations under the hood.

Specify the faceted fields, along with the number of results you want to receive for each facet, in the facets parameter of the search request:

POST https://data.tepapa.govt.nz/collection/search
Content-Type: application/json

{
  "query" : "James Cook",
  "facets": [ {
    "field": "production.facetCreatedDate.decadeOfCentury",
    "size": 3
  }, {
    "field": "production.spatial.title",
    "size": 3
  } ]
}

The response contains the top 3 values for the requested facets along with the number of matching documents in the facets field. NB: Facet values are returned in alphabetical order, not by count numbers.

{
  "results": [ ... ],
  "facets": {
    "production.facetCreatedDate.decadeOfCentury": {
      "1940s": 11751,
      "1960s": 12736,
      "1970s": 11751
    },
    "production.spatial.title.verbatim": {
      "New Zealand": 71945,
      "North Island (New Zealand)": 5171,
      "United Kingdom": 6031
    }
  },
  "_metadata": { ... }
}

Notes on faceted queries

Only some fields are facetable. Faceting only makes sense on fields that have a small finite number of distinct values. If you request a facet on an unfacetable field, for example a long text field, an error is returned instead:

{
  "status": 422,
  "developerMessage": "An exception occurred: Field at 'productionUsedTechnique.scopeNote' is not facetable - type is not facetable: text",
  [..]
}

The name of the facet returned may not exactly match the name you requested.
If a requested field is not facetable then the API may select a suitable sub-field to return. In the example above, the faceting request for production.spatial.title returned a facet for production.spatial.title.verbatim due to the internal field mappings that are used.

Filtering by facet

To retrieve the records that match you chosen facet value, run a new POST query containing a filter.

See Filtering above for an example.

Searching by date

Dates can be tricky for us to record in our catalogue - the dates may be unknown, partially known, fuzzy (e.g. only contain a textual description), or contain mistakes recorded by our curators over 100 years ago.

We have tried to standardise dates as much as possible in the API, using a range of date fields:

Date type Description Example field Example values
Verbatim date Original text as recorded verbatimBirthDate 11 June 1865
Encoded date Converted to ISO8601 (YYYY-MM-DD) birthDate 1865-06-11
Faceted date Each part separated out facetBirthDate year:1865 monthOfYear:June decadeOfCentury:1860s
Nested date Reduced details (no faceted) production.contributor.verbatimBirthDate production.contributor.birthDate

Verbatim and encoded date fields

Verbatim dates are the most accurate. They are human-readable and suitable for display

  • 14 Aug 1940
  • March 2004
  • circa 2011
  • c 1940
  • active 1920
  • 1870-1872
  • circa 132-137 million years ago

Their equivalent encoded dates follow the ISO 8601 date standard, though may only have precision to a month or year. Note that these are auto-generated, so are less accurate than the equivalent verbatim date.

  • 1940-08-14
  • 2004-03
  • 2011
  • 1940
  • 1920
  • 1870
  • 132

In version 1, only a few root-level dates are searchable in basic queries

  • verbatimBirthDate
  • birthDate
  • verbatimDeathDate
  • deathDate
  • publicationDate
GET https://data.tepapa.govt.nz/collection/search?q=verbatimBirthDate:((Aug OR August) AND 1940)
GET https://data.tepapa.govt.nz/collection/search?q=birthDate:"1940-08"

{
  "results": [
    {
      "id": 2533,
      "type": "Person",
      "title": "Dr Alan Baker",
      "birthPlace": "Inglewood",
      "verbatimBirthDate": "14 Aug 1940",
      "birthDate": "1940-08-14",
      "facetBirthDate": {
        "century": "20th century",
        "dayOfWeek": "Wednesday",
        "decadeOfCentury": "1940s",
        "era": "Common Era (CE)",
        "monthOfYear": "August",
        "temporal": "1940-08-14",
        "verbatim": "14 Aug 1940",
        "year": "1940"
      }
      ...
    }
  ]
}

To search other date fields, you will need to search the faceted date fields in an advanced query (see the next section).

Faceted date fields

A faceted date field contains sub-fields describing aspects of the date, such as century, dayOfWeek, etc. Sometimes these values contain Unknown or are just approximations. Usually there are also verbatim and encoded date fields.

Faceted date fields are still experimental:

We have included the faceted date fields in the hope of assisting you in data analysis or approximate categorisation.

However, you should not rely on the faceted date fields to represent the "truth" about a date - display the equivalent verbatim date field instead. Note that the internal verbatim facet field is an auto-generated value, instead use the named field, e.g. verbatimCreatedDate.

An example of a production.facetCreatedDate field

 "createdDate": "1906-01-01",
 "facetCreatedDate": {
    "century": "20th century",
    "dayOfWeek": "Monday",
    "decadeOfCentury": "1900s",
    "era": "Common Era (CE)",
    "monthOfYear": "January",
    "temporal": "1906-01-01",
    "verbatim": "01 Jan 1906 / 31 Dec 1906",
    "year": "1906"
  },
  "verbatimCreatedDate": "1906"

Here the original value from our catalogue is verbatimCreatedDate: 1906. The createdDate is an ISO 8601 date approximation of that value (which may be a poor approximation if there is not enough precision in the original data). The facetCreatedDate contains faceted values for that date, and date range approximations:

  • century - The century that the original date falls into, here 20th century
  • dayOfWeek - The day of the week. Use with caution as we currently based this on our temporal field which may not have the correct precision level. In this case a more accurate label for day of week would have been Unknown as the original verbatim date was just a year, not a particular day
  • decadeOfCentury - The decade of the century, e.g. 1900s
  • era - The era, either Common Era (CE) (a.k.a. 'AD') or Before Common Era (BCE) (a.k.a. 'BC')
  • monthOfYear - The month of the year. Use with caution as we currently base this on our temporal field (see more details in dayOfWeek above)
  • temporal - Usually equal to the "parsed" version of a date, e.g. birthDate. Use with caution
  • verbatim - This is different to the main verbatim date as it is auto-generated by combining two date fields that represent a date range: the "earliest" and "latest" date. In our example, the production date was determined to be between 01 Jan 1906 and 31 Dec 1906
  • year - Approximation of the year.

Faceted date query

Through the advanced search interface you can ask for all available facets on a date (the final size:0 below says: don't show results, just the facets)

POST https://data.tepapa.govt.nz/collection/search

{
  "query" : "poster",
  "facets": [ {
    "field": "production.facetCreatedDate.decadeOfCentury",
    "size": 5
  } ],
  "size": 0
}

This returns a list of the faceted sub-fields of that date in the result set:

  "facets": {
    "production.facetCreatedDate.decadeOfCentury": {
      "2010s": 270,
      "1910s": 295,
      "1940s": 693,
      "1950s": 156,
      "1980s": 368
    }
  },

Filtering by date facet

To search by a particular date facet value, add it as a filter to your query in the advanced search interface.

POST https://data.tepapa.govt.nz/collection/search
Content-Type: application/json

{
  "query" : "poster",
  "filters": [ 
   {
    "field": "production.facetCreatedDate.decadeOfCentury",
    "keyword": "1940s"
   }
 ]
}

See above for more details on Faceting.

Nested date fields

When we include an entity nested inside another entity, we only include a few fields as a summary, so we have excluded faceted date fields. To see the full record, retrieve the entity directly using the href field.

GET https://data.tepapa.govt.nz/collection/search/?q=lomu

  "refersTo": [
    {
      "id": 44748,
      "type": "Person",
      "title": "Jonah Lomu",
      "verbatimBirthDate": "12 May 1975",
      "birthDate": "1975-05-12",
      "verbatimDeathDate": "18 Nov 2015",
      "pid": "tepapa:collection/agent/44748",
      "iri": "http://tepapa.govt.nz/collection/agent/44748",
      "href": "https://data.tepapa.govt.nz/collection/agent/44748"
    }
   ]

GET https://data.tepapa.govt.nz/collection/agent/44748

{
  "id": 44748,
  "type": "Person",
  "title": "Jonah Lomu",
  "birthPlace": "Auckland",
  "verbatimBirthDate": "12 May 1975",
  "birthDate": "1975-05-12",
  "facetBirthDate": {
    "century": "20th century",
    "dayOfWeek": "Monday",
    "decadeOfCentury": "1970s",
    "era": "Common Era (CE)",
    "monthOfYear": "May",
    "temporal": "1975-05-12",
    "verbatim": "12 May 1975",
    "year": "1975"
    },
  "deathPlace": "Auckland",
  "verbatimDeathDate": "18 Nov 2015",
  "deathDate": "2015-11-18",
  "facetDeathDate": {
    "century": "21st century",
    "dayOfWeek": "Wednesday",
    "decadeOfCentury": "2010s",
    "era": "Common Era (CE)",
    "monthOfYear": "November",
    "temporal": "2015-11-18",
    "verbatim": "18 Nov 2015",
    "year": "2015"
    },
  "ethnicity": [
    "Tongan"
    ],
  "nationality": [
    "New Zealander"
    ],
  "familyName": "Lomu",
  "givenName": "Jonah",
  "gender": "Male",
  "pid": "tepapa:collection/agent/44748",
  "iri": "http://tepapa.govt.nz/collection/agent/44748",
  "href": "https://data.tepapa.govt.nz/collection/agent/44748",
  "_meta": {
    "created": "2011-08-31T06:00:49Z",
    "modified": "2018-02-27T20:54:56Z",
    "qualityScore": 1.9
    }
}