Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support keyword sets in record model #70

Closed
tomkralidis opened this issue Jan 21, 2021 · 18 comments
Closed

support keyword sets in record model #70

tomkralidis opened this issue Jan 21, 2021 · 18 comments
Assignees

Comments

@tomkralidis
Copy link
Contributor

Currently in the record model, we have a model which results in the following for keywords:

"properties": {
    ...
    "keywords": ["foo", "bar", "baz"],
    "keywords-codespace": "https://codes.example.org/vocab1"
    ...
 }

It is common for metadata providers to provide "sets" of keywords. For example, in MSC metadata we provide the following sets of keywords:

  • open data freetext type keywords (Canadian core subject thesaurus)
  • WMO vocabulary keywords
  • etc.

ISO 19115 allows for 1..n sets of keywords. The value proposition here is that metadata providers can manage their metadata once and provide keyword sets specific to a given community, thus publishing to multiple catalogues' keyword requirements.

In our MetOcean Best Practices for OGC API - Records work, we have also identified this as key for a given metadata record.

  "properties": {
    ...
    "keywords": [
      {
        "keywords": [ "foo1", "bar1", "baz1"],
        "codespace": "https://codes.example.org/vocab1"
      },
      {
        "keywords": [ "foo2", "bar2", "baz2"],
        "codespace": "https://codes.example.org/vocab2"
      }
    ]
    ...
  }

For an items query, a client querying keywords would be querying across all the given sets.

Is this something we should consider in core or is it asking too much?

We could certainly have this as an extension in our best practice (i.e. metoc:keywords), but it may provide value to other information communities, hence this proposal. For simple requirements, the above would simply be a single set/object.

Thoughts? I'm happy to update the specification/schemas/etc. here if there is consensus that this would be valuable in core.

@m-mohr
Copy link
Contributor

m-mohr commented Jan 21, 2021

In STAC we just have a field with keywords (in Collections), which I think is enough for the core. Codespaces/vocabulary sounds like an extension to me.

@chris-little
Copy link

As there are a number of overlapping authoritative vocabularies used in real-world safety critical regimes, I suggest it is in core, but as a separate conformance class.
Example 1: meteorology&aviation&volcanoes.
Example 2: meteorology&nucleotides&chemicals

@pvgenuchten
Copy link
Contributor

In dcat a reference to a concept-from-thesaurus is called a theme and a keyword is just a string. Maybe it is something to adopt. use keyword as string and allow also a property theme, which is a concept-from-thesaurus. see also https://w3c.github.io/dxwg/dcat/#Property:resource_theme

@mhogeweg
Copy link
Contributor

separating keywords in metadata by some sort of grouping has been around forever and I would support continuing that in the new API. INSPIRE requires specific keywords from a specific vocabulary (GEMET) and ISO has had their topic category codes. Those are typically complemented with organization-specific terms that may come from one or more different vocabularies. Being able to distinguish search between those groups of keywords will be useful.

@ByronCinNZ
Copy link
Contributor

ByronCinNZ commented Jan 22, 2021 via email

@tomkralidis
Copy link
Contributor Author

Good question @ByronCinNZ. In XML metadata, I've typically seen nested keywords with paths as part of a keyword list, like:

<keyword>type/subtype/foo</keyword>

<keyword>type.subtype.foo</keyword>

In this proposal, this could yield, say:

  "properties": {
    ...
    "keywords": [
      {
        "keywords": [
            "type/subtype/foo1",
            "type/subtype/bar1",
            "type/subtype/baz1"
        ],
        "codespace": "https://codes.example.org/vocab1"
      }
    ]
    ...
  }

@pvretano
Copy link
Contributor

25-JAN-2021: The keywords key should remain as is but text should be added to clarify that it is a freeform list of tags. Actions: (1) Remove the keywords-codespace key. (2) Add informative text that keywords is just a freeform list of strings. (3) Rework the theme key to match the requirements that @tomkralidis has expresses here.

@pvretano pvretano self-assigned this Jan 25, 2021
@tomkralidis
Copy link
Contributor Author

From discussions in our MetOcean best practice work, this is acceptable. I'll update the spec accordingly and send a PR.

@mhogeweg
Copy link
Contributor

if we remove the codespace key, how do we distinguish keywords from the marine community from those from the EO community from those from any other community? ISO and FGDC metadata have for years had the keyword thesaurus element. Does that not get lost without a codespace key in the record information model?

@pvretano
Copy link
Contributor

@mhogeweg the discussion at the last SWG meeting was what the keyword key would be a free-form, informal list of keywords associated with the record Keywords/tags/identifiers from formal vocabularies would be specified using the "theme" key which @tomkralidis is updating. The structure of the updated "theme" key would include a reference to some formal vocabulary/taxonomy/classification scheme/etc. and a list of tokens/tags/identifiers taken from the referenced vocabulary. The SWG is trying to balance a simple core with extensibility for the more sophisticated user. What do you think about the approach?

@mhogeweg
Copy link
Contributor

It seems a bit odd that a concept so core to metadata for decades (keyword codespace) is left out of the core of a catalog spec, especially when considering the uptake in semantic ideas across the industry. and while dropping the codespace key, there is inserted some inferred hierarchy logic into the value of the keywords themselves.

@pvretano
Copy link
Contributor

@mhogeweg I'll raise the issue one more time at the next SWG meeting ...

@m-mohr
Copy link
Contributor

m-mohr commented Jan 27, 2021

Maybe the naming is misleading? I understand keywords - as described by Peter above - as an informal list of free-form text, like tags assigned to a WordPress article or so. Also, in my domain keyword codespaces is not a thing and thus not core to metadata. So that seems to be domain-specific and thus sounds like an extension to me.

@mhogeweg
Copy link
Contributor

perhaps the flat list of keywords should be called tags and then the vocabulary-enabled keywords would be keywords. I would not consider INSPIRE a domain fitting in an extension. All of Europe needs to provide metadata with specific thesaurus for at least INSPIRE theme keywords (https://inspire.ec.europa.eu/glossary/MetadataElement-Keyword:1). It is also part of the North American Profile of ISO metadata (https://www.fgdc.gov/nap/metadata/register/registerItems.html#RI_65). etc

@m-mohr
Copy link
Contributor

m-mohr commented Jan 27, 2021

While I think a rename could make sense, it's likely that we can't rename it from "keywords" to "tags" any longer in STAC. That is something to consider if there's a desire to align Records and STAC.

@bradh
Copy link
Contributor

bradh commented Jan 30, 2021

While it would be nice if Records and STAC were identical, trivial relabelling doesn't seem like a big deal. It'd be a concern if there were non-trivial semantic differences.

@pvretano
Copy link
Contributor

pvretano commented Feb 8, 2021

SWG MEETING 08-FEB-2021: The SWG feels that we should proceed with our original proposal. That is (a) remove the keywords-codespace keep and leave keywords as a flat list of significant words. If there is a need to tag the record with keywords from a vocabularied list then you the theme key. See the following references:
. https://www.w3.org/TR/vocab-dcat/#Property:resource_keyword
. https://www.w3.org/TR/vocab-dcat/#Property:resource_theme
. https://www.merriam-webster.com/dictionary/key%20word

@pvretano
Copy link
Contributor

@tomkralidis can this one be closed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants