Skip to content

Conversation

@tbkr
Copy link
Contributor

@tbkr tbkr commented Nov 7, 2025

Description

I'm aware that this is a somewhat mid-sized PR 😉 It's also a complete rewrite with tests and documentation.

Porting FDB to PyBind11, making the Python-API more user-friendly and introducing tests for the API layer which are testing core functionality of the FDB in an ephemeral FDB setup.

Documentation is added to the code base as well as the sphinx generated side.

The coverage was tested and is close to 100% testing all provided use-cases and functionalities.

For now I keep this PR in WIP status and I'm open to discuss Implementation details.

Notes for reviewers:

  • Think about the current API design and raise issues, esp. if you are aware of use-cases which aren't currently implemented
  • Read the documentation carefully and check whether the written functionalities are correctly mirrored from the FDB.
  • Feel free to raise potential issues I'm not aware of at this point in time.

Contributor Declaration

By opening this pull request, I affirm the following:

  • All authors agree to the Contributor License Agreement.
  • The code follows the project's coding standards.
  • I have performed self-review and added comments where needed.
  • I have added or updated tests to verify that my changes are effective and functional.
  • I have run all existing tests and confirmed they pass.

🌈🌦️📖🚧 Documentation FDB 🚧📖🌦️🌈
https://sites.ecmwf.int/docs/dev-section/fdb/pull-requests/PR-193

@tbkr tbkr changed the title Feature/pyfdb_integration Porting PyFDB to Pybind 11 WIP: Feature/pyfdb_integration Porting PyFDB to Pybind 11 Nov 7, 2025
@codecov-commenter
Copy link

codecov-commenter commented Nov 7, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.07%. Comparing base (e1d58dd) to head (c21a48d).
⚠️ Report is 16 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #193      +/-   ##
===========================================
+ Coverage    72.93%   73.07%   +0.14%     
===========================================
  Files          362      362              
  Lines        21737    21738       +1     
  Branches      2242     2242              
===========================================
+ Hits         15853    15886      +33     
+ Misses        5884     5852      -32     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tbkr tbkr force-pushed the feature/pyfdb-integration branch 28 times, most recently from 91dfeff to 2be9698 Compare November 14, 2025 10:24
@tbkr tbkr force-pushed the feature/pyfdb-integration branch from 5812380 to 21f1845 Compare January 26, 2026 10:08
tbkr added 3 commits January 29, 2026 15:08
Works now with shutil and there is the option to zero-copy read into a
buffer.
@tbkr tbkr force-pushed the feature/pyfdb-integration branch from 51f9ba9 to 1fea9bb Compare January 30, 2026 12:54
Copy link
Contributor

@simondsmart simondsmart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more comments for you.

Please, where you have dealt with previous comments, can you resolve the relevant comments.

z3fdb/index

:ref:`FDB <FDB_Introduction>` itself is part of `ECMWF <https://www.ecmwf.int>`__’s
high‑performance data infrastructure: it stores each GRIB message as a field,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The FDB is not restricted to GRIBs. I don't want this to be suggested in here.

I note that z3fdb does have ties to gridded data formats. That is a separate discussion.


.. code-block:: python

builder = pyfdb.SelectionBuilder()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced by this builder being the recommended way of doing this.

I don't understand why we can't have the typical case just being a

mars_selection = {
"class": "od",
...
"step": [x for x in range(1, 241)]
}

And hiding all of the grunge internally. That is not to say that we can't have a builder to make it easier to do more complex stuff - but I don't think it should be required.

We should probably also accept a simple string, and let metkit do its thing...

fdb = pyfdb.FDB(fdb_config_path)
filename = data_path / "x138-300.grib"

fdb.archive(filename.read_bytes())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to have an example where the key is supplied along with the data.

"time": "1800",
}

with fdb.retrieve(selection) as data_handle:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the context manager doing here? What happens when we leave the managed section.

I was expecting context managers on the archive pathway, but this is less obvious.

URI[scheme=toc,name=/<path-to-db-store>/ea:0001:oper:20200101:1800:g].


You can see that the ``ControlIdentifier`` with value ``4`` is active for the given entry of the ``FDB``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Erm, I see the thumbs up, but not that anything has changed.

@@ -0,0 +1,148 @@
from pyfdb.pyfdb_type import MarsSelectionMapper

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be removed.

@@ -0,0 +1,181 @@
from pyfdb import FDB

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about range selections? Lists where there isn't data. Lists that select a subset of what is present?

from pyfdb.pyfdb_type import WildcardMarsSelection


def test_index_axis_string(read_only_fdb_setup):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some tests where you explicitly construct the IndexAxis object, and test equality with the returned value from the fdb.axes() call.

def test_from_selection():
fdb_tool_request = FDBToolRequest.from_mars_selection({"key-1": "value-1"})

print(fdb_tool_request)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't test anything except for that the code doesn't crash...

key_values=selection, all=isinstance(selection, WildcardMarsSelection)
)

def ____repr__(self) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't look right (quadruple '_')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants