Skip to content

Conversation

@coroa
Copy link

@coroa coroa commented Jun 1, 2025

Fixes #1682 .

Overview

  • Update zenodo cassettes
  • Fix zenodo adapter
  • Write tests not working yet

Description

Hi there,

as already investigated in #1682. The zenodo api has been updated and no longer exhibits the old bucket structure where all the files that have been uploaded are accessable at urls like

https://zenodo.org/api/files/<bucket_id>/<file_key>

but rather take a more intuitive form:

https://zenodo.org/api/record/<record_id>/files/<file_key>/content

where file_key refers to the name of the file, like capitals.csv. Without the /content suffix one accesses instead a short metadata json, with information like the mimetype, size and last modified dates and further details.

That suffixed form unfortunately does not work naturally with the current structure of frictionless-py which often tries to infer the ressource or more generally source type by the path ending. I investigated shortly whether adding a new scheme to the zenodo plugin so that the application would support new links like zenodo://<record_id>/<file_key>, but this proved to be relatively difficult and also did not help with building a datapackage from a zenodo entry which does have an explicit datapackage.json descriptor (since the source resolution does not reuse the scheme plugins).

The easier solution taken here is to use instead the html endpoint for retrieving the files, which have the following url structure, which worked naturally:

https://zenodo.org/record/<record_id>/files/<file_key>

This is implemented and working.

I then noticed that tests had been completely disabled for the last two years :(. After removing the cassettes to regenerate them, all the read tests are now working fine again, but the write tests need more work. Writing seems to work fine, but the so-called deposition ids that are being generated (and asserted against) are user dependent and therefore with my personal token, they are failing all the tests.

I am reluctant to update all of them to my deposition ids now, since that would tie the testing to my personal token. That does seem to be the wrong solution!

Could someone advise how they intended those tests to work? @roll ? @shashigharti ?

Thank you

@coroa
Copy link
Author

coroa commented Jun 20, 2025

Hello, it would be much appreciated if i could get some help regarding the zenodo portal tests!

@pdelboca @pierrecamilleri , i saw you were active relatively recently, do you have any advice or contacts to push this further?

@pierrecamilleri
Copy link
Collaborator

Hello, it would be much appreciated if i could get some help regarding the zenodo portal tests!

@pdelboca @pierrecamilleri , i saw you were active relatively recently, do you have any advice or contacts to push this further?

Hi @coroa, thanks for the PR, I am sorry for the delay. I'll have a look at it next week !
Be aware that the Zenodo adapter may be deprecated in the next version (see discussion)

@coroa
Copy link
Author

coroa commented Jun 23, 2025

Thank you for the heads-up about the discussion; that is very relevant for my plans.

You mention there that the zenodo portal functionality was moved into other tools. Do you remember which tools you were talking about there?

@pierrecamilleri
Copy link
Collaborator

pierrecamilleri commented Jun 23, 2025

Thank you for the heads-up about the discussion; that is very relevant for my plans.

You mention there that the zenodo portal functionality was moved into other tools. Do you remember which tools you were talking about there?

Sorry if my answer is vague, I am clearly out of my comfort zone here.

If I remember well, in the course of our exchange, we mentioned the strategy that consisted in having portals adopt datapackage natively rather than frictionless-py having to specifically interface with them.

I believe Zenodo has now native support of data packages, but I could not find any hint about that - maybe I misunderstood? @pdelboca do you have more context on that?

@pierrecamilleri
Copy link
Collaborator

Thanks a lot for this contribution, and again really sorry about my late reaction.
Your changes to the code looks good, thanks for the documentation.

That suffixed form unfortunately does not work naturally with the current structure of frictionless-py which often tries to infer the ressource or more generally source type by the path ending.

Yes, IMO this behavior is indeed problematic for many remote resources (related issue).

I am reluctant to update all of them to my deposition ids now, since that would tie the testing to my personal token. That does seem to be the wrong solution!

Needing an API key to update the tests is already not ideal, but the tests depending on the API key used is indeed not an option. There is no added value to test the exact deposition ID, I think what matters is that the URL is well-formed and the deposition ID matches the URL.

I would suggest something like that : 

# I am not sure of the exact pattern an ID follows, from the examples it looks like its a 7-digit ID, I guess checking it's all digits is already sufficient  
ZENODO_URL_PATTERN = r"https://zenodo\.org/deposit/(\d+)"

@pytest.mark.vcr
def test_zenodo_adapter_write(tmp_path):
    control = portals.ZenodoControl(
        metafn="data/zenodo/metadata.json",
        tmp_path=tmp_path,
    )
    package = Package("data/datapackage.json")
    result = package.publish(control=control)

    
    # Check URL matches pattern with ID
    match = re.match(ZENODO_URL_PATTERN, result.url)
    assert match is not None, f"URL {result.url} does not match expected pattern"
    
    # Extract ID from URL and compare with deposition_id
    deposition_id = result.context.get("deposition_id")
    url_deposition_id = int(match.group(1))
    assert deposition_id == url_deposition_id

What do you think ?

@coroa
Copy link
Author

coroa commented Jul 26, 2025

What do you think?

Then we are just testing that we reached the end of: https://github.com/coroa/frictionless-py/blob/82a9e8cf4c710725f69aab63140a87d1beba07de/frictionless/portals/zenodo/adapter.py#L137-L141 .

Just using isinstance(result, PublishResult) and isinstance(result.context["deposition_id"], int) achieves that, too, but is a lot easier.

@coroa
Copy link
Author

coroa commented Jul 26, 2025

Write tests are mostly working. Catalog tests need some love.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

get_package in Zenodo adapter out of date?

2 participants