-
Notifications
You must be signed in to change notification settings - Fork 150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
views: FAIR signposting level 1 support (HTTP Link headers) #2938
base: master
Are you sure you want to change the base?
views: FAIR signposting level 1 support (HTTP Link headers) #2938
Conversation
def _get_signposting_authors(record): | ||
authors = [] | ||
# Limit authors to the first 10. | ||
for creator in islice(record["metadata"]["creators"], 0, 10): | ||
for identifier in creator["person_or_org"].get("identifiers", []): | ||
if identifier["scheme"] == "orcid": | ||
authors.append( | ||
_get_header( | ||
"author", "https://orcid.org/" + identifier["identifier"] | ||
) | ||
) | ||
return authors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lars suggested that we might choose to not include authors at all since the list might be long and the full list can be found in the linkset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could apply some sensible limit? E.g. if less than 50 authors, include, otherwise don't include at all and basically have people rely on the explicit authors linkset?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[ ] Include authors if up to 50, otherwise do not include.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm now relying on invenio_rdm_records/resources/serializers/signposting/schema.py's serialize_author which serializes all the authors.
_get_header( | ||
"author", "https://orcid.org/" + identifier["identifier"] | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we support other schemes like ROR, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be that the safer option would be to use something like idutils.to_url(identifier, scheme)
which will consistently produce a link.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[ ] Useidutils.to_url
for authors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm now relying on invenio_rdm_records/resources/serializers/signposting/schema.py's serialize_author which picks the first linkable ID.
# then try to get the optional `link` from the custom license. | ||
url = right.get("props", {}).get("url") or right.get("link") | ||
if url: | ||
licenses.append(_get_header("license", url)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The FAIR Signposting docs recommends to use SPDX license identifier (e.g. https://spdx.org/licenses/CC0-1.0
).
However, in Zenodo we store URLs like https://creativecommons.org/publicdomain/zero/1.0/legalcode
and not spdx.org
URLs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If props["scheme"] == "spdx"
I think we can safely generate the URL like https://spdx.org/licenses/{right["id"]}
. We might have licenses (or even non-SPDX licenses), in which case just using url
like here would be ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately our IDs are lower-cased (e.g. antlr-pd-fallback
) while the SPDX URLs are are mixed-cased and case-sensitive (e.g. https://spdx.org/licenses/ANTLR-PD-fallback.html).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ouch, I tried in the browser and copy-pasting URLs for some reason kept the original case... Ok, this is a bummer, I think we'll have to add the original spdx
ID with the exact case as a props.spdx_id
field or similar...
I think it would be fine to shelve this and just use the url
, depends on whether we want to spend more time to re-import SPDX and update the existing license vocabulary (funnily, the dump we have is from more than a year ago).
|
||
def _get_signposting_linkset(pid_value): | ||
api_url = record_url_for(_app="api", pid_value=pid_value) | ||
return _get_header("linkset", api_url, "application/linkset+json") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: this is required for level 2 support and was already added in a previous pull request.
Here we only include a link of the type "application/linkset+json"
, but the docs requires to also include a link of type "application/linkset"
.
], | ||
resource_type["id"], | ||
) | ||
url_schema_org = props.get("schema.org") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if there's a better way to do this lookup.
I followed what's done in invenio_rdm_records/resources/serializers/signposting/schema.py
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps just check that it is cached so we don't query db on every landing page request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I see these are indeed cached here, which is also mentioned in get_vocabulary_props
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite a lot of methods added to decorators.py
, should it be moved to a signposting-specific file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Agree, I thought the was already some signposting-related directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[ ] Move the code to a signposting-related file or directory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is now much less code in decorators.py
now that I rely on invenio_rdm_records/resources/serializers/signposting/schema.py
.
0929cec
to
15672de
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, some minor comments only
_get_header( | ||
"author", "https://orcid.org/" + identifier["identifier"] | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be that the safer option would be to use something like idutils.to_url(identifier, scheme)
which will consistently produce a link.
def _get_signposting_authors(record): | ||
authors = [] | ||
# Limit authors to the first 10. | ||
for creator in islice(record["metadata"]["creators"], 0, 10): | ||
for identifier in creator["person_or_org"].get("identifiers", []): | ||
if identifier["scheme"] == "orcid": | ||
authors.append( | ||
_get_header( | ||
"author", "https://orcid.org/" + identifier["identifier"] | ||
) | ||
) | ||
return authors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could apply some sensible limit? E.g. if less than 50 authors, include, otherwise don't include at all and basically have people rely on the explicit authors linkset?
# then try to get the optional `link` from the custom license. | ||
url = right.get("props", {}).get("url") or right.get("link") | ||
if url: | ||
licenses.append(_get_header("license", url)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If props["scheme"] == "spdx"
I think we can safely generate the URL like https://spdx.org/licenses/{right["id"]}
. We might have licenses (or even non-SPDX licenses), in which case just using url
like here would be ok.
tests/ui/test_signposting_ui.py
Outdated
api_url = f"https://127.0.0.1:5000/api/records/{record_with_file.id}" | ||
filename = "article.txt" | ||
|
||
res = client.head(f"/records/{record_with_file.id}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question/comment: I think the HEAD
implementation for Flask/Invenio is that we just treat it as a GET request and skip the body of the response. In that case, we're not saving anything in terms of computation/performance (if that was the original goal of just testing the HEAD response only).
IMHO, it's ok to keep as is, since none of the logic done for generating the header links is that much more complex or adds that big of an overhead compared to the rest of the GET response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Link
header should be included in both GET
and HEAD
, as stated in the FAIR Signposting docs says:
In addition to being available via HTTP GET requests, the HTTP header that contains Link is accessible via the HTTP HEAD request, which only returns transaction metadata not a resource representation. As such machine agents can obtain a map for their journey by issuing a HTTP HEAD even against resources that have access restrictions. All the while saving bandwidth and hence energy.
- Modify the tests to assert not only
HEAD
, but alsoGET
.
…ofileLvl1Serializer)
def add_signposting_content_resources(f): | ||
"""Add signposting links to the content resources view's response headers.""" | ||
|
||
@wraps(f) | ||
def view(*args, **kwargs): | ||
response = make_response(f(*args, **kwargs)) | ||
|
||
# Relies on other decorators having operated before it | ||
pid_value = kwargs["pid_value"] | ||
signposting_link = record_url_for(_app="api", pid_value=pid_value) | ||
|
||
response.headers["Link"] = ( | ||
f'<{signposting_link}> ; rel="linkset" ; type="application/linkset+json"' # fmt: skip | ||
) | ||
signposting_headers = [ | ||
_get_signposting_collection(pid_value), | ||
_get_signposting_linkset(pid_value), | ||
] | ||
|
||
response.headers["Link"] = " , ".join(signposting_headers) | ||
|
||
return response | ||
|
||
return view | ||
|
||
|
||
def add_signposting_metadata_resources(f): | ||
"""Add signposting links to the metadata resources view's response headers.""" | ||
|
||
@wraps(f) | ||
def view(*args, **kwargs): | ||
response = make_response(f(*args, **kwargs)) | ||
|
||
# Relies on other decorators having operated before it | ||
pid_value = kwargs["pid_value"] | ||
|
||
signposting_headers = [ | ||
_get_signposting_describes(pid_value), | ||
_get_signposting_linkset(pid_value), | ||
] | ||
|
||
response.headers["Link"] = " , ".join(signposting_headers) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that, unlike the Landing Page which relies on invenio_rdm_records.resources.serializers.signposting
, the Content Resources and Metadata Resources are not relying on invenio_rdm_records.resources.serializers.signposting
because:
ContentResourceSchema
andContentResourceSchema
expect the record to be passed viacontext={"record_dict"}
which makes it more difficult to reuse here.- The logic is pretty simple to add only the
collection
,describes
andlinkset
headers, so re-implementing it here is not that bad.
# The test record does not have a license. | ||
'<https://schema.org/Photograph> ; rel="type"', | ||
'<https://schema.org/AboutPage> ; rel="type"', | ||
f'<{api_url}> ; rel="linkset" ; type="application/linkset+json"', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic for the landing page is implemented in FAIRSignpostingProfileLvl1Serializer
in invenio-rdm-records
and is already tested there (see inveniosoftware/invenio-rdm-records#1908).
It stills makes sense to at least issue the HTTP call to the endpoint here, to make sure that the decorator is working properly, but maybe the assertion should be less detailed to avoid having to adapt this test every time we modify the other module?
❤️ Thank you for your contribution!
Description
Checklist
Ticks in all boxes and 🟢 on all GitHub actions status checks are required to merge:
Frontend
Reminder
By using GitHub, you have already agreed to the GitHub’s Terms of Service including that: