-
Couldn't load subscription status.
- Fork 154
fix: zenodo adapter for new file paths #1746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Hello, it would be much appreciated if i could get some help regarding the zenodo portal tests! @pdelboca @pierrecamilleri , i saw you were active relatively recently, do you have any advice or contacts to push this further? |
Hi @coroa, thanks for the PR, I am sorry for the delay. I'll have a look at it next week ! |
|
Thank you for the heads-up about the discussion; that is very relevant for my plans. You mention there that the zenodo portal functionality was moved into other tools. Do you remember which tools you were talking about there? |
Sorry if my answer is vague, I am clearly out of my comfort zone here. If I remember well, in the course of our exchange, we mentioned the strategy that consisted in having portals adopt datapackage natively rather than I believe Zenodo has now native support of data packages, but I could not find any hint about that - maybe I misunderstood? @pdelboca do you have more context on that? |
|
Thanks a lot for this contribution, and again really sorry about my late reaction.
Yes, IMO this behavior is indeed problematic for many remote resources (related issue).
Needing an API key to update the tests is already not ideal, but the tests depending on the API key used is indeed not an option. There is no added value to test the exact deposition ID, I think what matters is that the URL is well-formed and the deposition ID matches the URL. I would suggest something like that : # I am not sure of the exact pattern an ID follows, from the examples it looks like its a 7-digit ID, I guess checking it's all digits is already sufficient
ZENODO_URL_PATTERN = r"https://zenodo\.org/deposit/(\d+)"
@pytest.mark.vcr
def test_zenodo_adapter_write(tmp_path):
control = portals.ZenodoControl(
metafn="data/zenodo/metadata.json",
tmp_path=tmp_path,
)
package = Package("data/datapackage.json")
result = package.publish(control=control)
# Check URL matches pattern with ID
match = re.match(ZENODO_URL_PATTERN, result.url)
assert match is not None, f"URL {result.url} does not match expected pattern"
# Extract ID from URL and compare with deposition_id
deposition_id = result.context.get("deposition_id")
url_deposition_id = int(match.group(1))
assert deposition_id == url_deposition_idWhat do you think ? |
Then we are just testing that we reached the end of: https://github.com/coroa/frictionless-py/blob/82a9e8cf4c710725f69aab63140a87d1beba07de/frictionless/portals/zenodo/adapter.py#L137-L141 . Just using |
use title from zenodo package metadata and name from zenodo control, if given.
/record/<recid> became /records/<recid> /deposit/<depid> became /upload/<depid>
|
Write tests are mostly working. Catalog tests need some love. |
Fixes #1682 .
Overview
Description
Hi there,
as already investigated in #1682. The zenodo api has been updated and no longer exhibits the old bucket structure where all the files that have been uploaded are accessable at urls like
but rather take a more intuitive form:
where
file_keyrefers to the name of the file, likecapitals.csv. Without the/contentsuffix one accesses instead a short metadata json, with information like the mimetype, size and last modified dates and further details.That suffixed form unfortunately does not work naturally with the current structure of frictionless-py which often tries to infer the ressource or more generally source type by the path ending. I investigated shortly whether adding a new scheme to the zenodo plugin so that the application would support new links like
zenodo://<record_id>/<file_key>, but this proved to be relatively difficult and also did not help with building a datapackage from a zenodo entry which does have an explicit datapackage.json descriptor (since the source resolution does not reuse the scheme plugins).The easier solution taken here is to use instead the html endpoint for retrieving the files, which have the following url structure, which worked naturally:
This is implemented and working.
I then noticed that tests had been completely disabled for the last two years :(. After removing the cassettes to regenerate them, all the read tests are now working fine again, but the write tests need more work. Writing seems to work fine, but the so-called deposition ids that are being generated (and asserted against) are user dependent and therefore with my personal token, they are failing all the tests.
I am reluctant to update all of them to my deposition ids now, since that would tie the testing to my personal token. That does seem to be the wrong solution!
Could someone advise how they intended those tests to work? @roll ? @shashigharti ?
Thank you