Open mfdataset enchancement #9955

pratiman-91 · 2025-01-16T15:30:57Z

Closes better handling of invalid files in open_mfdataset #6736
User visible changes (including notable bug fixes) are documented in whats-new.rst

Added new argument in open_mfdataset to better handle the invalid files.

errors : {'ignore', 'raise', 'warn'}, default 'raise'
        - If 'raise', then invalid dataset will raise an exception.
        - If 'ignore', then invalid dataset will be ignored.
        - If 'warn', then a warning will be issued for each invalid dataset.

welcome · 2025-01-16T15:31:01Z

Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient.
If you have questions, some answers may be found in our contributing guidelines.

max-sixty · 2025-01-16T20:35:25Z

I'm not the expert, but this looks reasonable! Any other thoughts?

Assuming no one thinks it's a bad idea, we would need tests.

headtr1ck

I think it is a good idea.

But the way it is implemented here seems overly complicated and repetitive.
I would suggest to revert the logic: first build up the list wrapped in a single try and then handle the three cases in the except block.

xarray/backends/api.py

Co-authored-by: Michael Niklas <[email protected]>

headtr1ck

Almost there.

Also, we should add tests for this.

xarray/backends/api.py

pratiman-91 · 2025-01-19T10:16:28Z

@headtr1ck Thanks for the suggestions. I have added two tests (ignore and warn). Also, while testing, I found that a new argument broke combine="nested" due to invalid ids. I have now modified it to reflect the correct ids, and it is passing the tests. Please review the tests and the latest version.

xarray/backends/api.py

…d warn.

pratiman-91 · 2025-01-20T01:04:51Z

Hi @headtr1ck, I have been thinking about the handling of ids. Current version looks like a patch work (I am not happy with it.). I think we can create ids after removing all the invalid datasets from path1d within the combine==nested block. Please let me know what do you think.
Thanks!

pratiman-91 · 2025-03-31T02:06:15Z

@max-sixty Can you please go through the PR. Thanks!

max-sixty · 2025-03-31T18:20:59Z

I'm admittedly much less familiar with this section of the code. nothing seems wrong though!

I think we should bias towards merging, so if no one has concerns then I'd vote to merge

could we fix the errors in the docs?

xarray/backends/api.py

pratiman-91 · 2025-04-04T02:58:11Z

It seems like one test failed test_sparse_dask_dataset_repr (xarray.tests.test_sparse.TestSparseDataArrayAndDataset) . It is not related to this PR.

for more information, see https://pre-commit.ci

max-sixty · 2025-05-30T06:19:52Z

can we fix the errors in the docs?

pratiman-91 · 2025-05-30T06:31:09Z

can we fix the errors in the docs?

I just rebased the branch. Let us see if this resolves the docs. Ahh.. found the problem.

pratiman-91 · 2025-05-30T10:29:24Z

@max-sixty all tests passed.

kmuehlbauer

@pratiman-91 Thanks for sticking with this. This seams all reasonable to me. I've added a couple comments and suggestions.

For the tests, I think it would be great to also test for "raise".

xarray/backends/api.py

Co-authored-by: Kai Mühlbauer <[email protected]>

for more information, see https://pre-commit.ci

pratiman-91 · 2025-05-30T12:24:28Z

@kmuehlbauer Thank you for the suggestions and for fixing the typos. I have made changes based on your suggestions. Please check.

For the tests, I did not write a test for "raise" since it is the default and already being tested by various other tests. If you think it would be beneficial to add another test, then I can write it.

xarray/backends/api.py

kmuehlbauer · 2025-05-30T12:51:39Z

For the tests, I did not write a test for "raise" since it is the default and already being tested by various other tests. If you think it would be beneficial to add another test, then I can write it.

Yes, you are right, this is and was the default. No need to add another test.

Co-authored-by: Kai Mühlbauer <[email protected]>

for more information, see https://pre-commit.ci

kmuehlbauer · 2025-05-30T13:22:28Z

@pratiman-91 I've thought a bit about the removing of the problematic files for the nested case.

To my understanding we have two corresponding 1D lists (ids and paths1d).

Wouldn't this work to just remember the indices fro the failing files and remove those from the existing lists afterwards? Along the lines:

remove = []
for i, paths in enumerate(paths1d):
   try:
       open(...)
   except:
      if combine="nested":
          remove.append(i)
          
# later in combine nested

if remove:
    for i in sorted(remove, reverse=True):
        del ids[i]
        del paths1d[i]

Update, we could even use remove for the checks. No need for another variable

pratiman-91 · 2025-05-30T13:27:48Z

@kmuehlbauer I have thought about this approach before. However, it creates a problem with a 2x2 nested case. Therefore, I have settled for recreating the IDs and paths. You can also check here: 3bfaaee

kmuehlbauer · 2025-05-30T14:02:12Z

Thanks for the pointer @pratiman-91. Yes, this has a twist.

I see this is a special case for the nested combine. All our good faith doesn't help, for these cases. If the user provides a 2x2, or NxM nested list or whatever and one or more files are missing in one or more of the sublists the whole thing explodes. So I'd rather go for having "raise" in those cases as having the user see ValueError: The supplied objects do not form a hypercube because sub-lists do not have consistent lengths along dimension X.

Maybe others have some opinion here?

kmuehlbauer · 2025-05-30T14:06:03Z

xarray/tests/test_backends.py

+            original.isel(x=slice(5), y=slice(4, 8)).to_netcdf(tmp3)
+            original.isel(x=slice(5, 10), y=slice(4, 8)).to_netcdf(tmp4)
+            with open_mfdataset(
+                [[tmp1, tmp2], ["non-existent-file.nc", tmp3, tmp4]],


If you remove tmp3 here, your test will fail. So it seems that the user has to provide the correct amount of files according to the concat/merges he wants to achieve, or do I miss something?

You are correct @kmuehlbauer, removing tmp3 will cause the test to fail. The current implementation expects the input to align with the intended concat/merge structure. However, if a file is missing or invalid and its removal still results in a valid nested list structure, then the process should still work as expected. Otherwise, raise an error to indicate an inconsistency in the input and concat/merge not possible.

I agree with that. The question is, if xarray should take that maintenance burden for that IMHO unlikely or rare cornercase of nested combines. But, let's hear others opinions.

pratiman-91 · 2025-06-18T15:07:31Z

@kmuehlbauer @max-sixty Any update on this PR?
Thanks!

-Pratiman

pratiman-91 and others added 2 commits January 16, 2025 23:06

GH6736

3ec575d

Updated whats-new.rst

5b95c21

headtr1ck requested changes Jan 16, 2025

View reviewed changes

xarray/backends/api.py Outdated Show resolved Hide resolved

xarray/backends/api.py Outdated Show resolved Hide resolved

pratiman-91 and others added 2 commits January 17, 2025 10:53

Update xarray/backends/api.py

9249bf3

Co-authored-by: Michael Niklas <[email protected]>

Updated logic

1eb6422

headtr1ck reviewed Jan 18, 2025

View reviewed changes

xarray/backends/api.py Outdated Show resolved Hide resolved

xarray/backends/api.py Outdated Show resolved Hide resolved

headtr1ck added the topic-error reporting label Jan 18, 2025

Added tests and modifiede the logic to get correct ids for concat

8005e33

headtr1ck reviewed Jan 19, 2025

View reviewed changes

xarray/backends/api.py Outdated Show resolved Hide resolved

pratiman-91 and others added 2 commits January 19, 2025 22:42

Added new tests and logic to handle 2x2 open_mfdataset with ignore an…

3bfaaee

…d warn.

pre-commit run

f621030

new logic to add nested paths

b9f04c8

pratiman-91 requested a review from headtr1ck February 16, 2025 05:00

max-sixty reviewed Mar 31, 2025

View reviewed changes

xarray/backends/api.py Outdated Show resolved Hide resolved

pratiman-91 and others added 2 commits April 4, 2025 10:25

made remove_path a private function and updated whats-new.rst

0657014

Merge branch 'main' into open_mfdataset_enchancement

4dd6da4

pratiman-91 added 2 commits April 7, 2025 11:02

Merge branch 'main' into open_mfdataset_enchancement

232ab45

Merge branch 'main' into open_mfdataset_enchancement

1110a28

github-actions bot added topic-backends topic-documentation io labels Apr 14, 2025

pratiman-91 and others added 2 commits April 15, 2025 09:59

Updated whats-new.rst

ffc3c53

[pre-commit.ci] auto fixes from pre-commit.com hooks

efe1642

for more information, see https://pre-commit.ci

Merge branch 'main' into open_mfdataset_enchancement

8e970b4

pratiman-91 added 3 commits May 30, 2025 15:10

modify docs

860be1e

modify doc-strings

0451d13

Merge branch 'main' into open_mfdataset_enchancement

e650c22

kmuehlbauer requested changes May 30, 2025

View reviewed changes

xarray/backends/api.py Outdated Show resolved Hide resolved

xarray/backends/api.py Outdated Show resolved Hide resolved

xarray/backends/api.py Outdated Show resolved Hide resolved

xarray/backends/api.py Outdated Show resolved Hide resolved

pratiman-91 and others added 4 commits May 30, 2025 19:55

Update xarray/backends/api.py

05cb2f0

Co-authored-by: Kai Mühlbauer <[email protected]>

Update xarray/backends/api.py

607b6f0

Co-authored-by: Kai Mühlbauer <[email protected]>

catch exception for warn

0b67aa1

[pre-commit.ci] auto fixes from pre-commit.com hooks

3e269ea

for more information, see https://pre-commit.ci

pratiman-91 requested a review from kmuehlbauer May 30, 2025 12:26

kmuehlbauer reviewed May 30, 2025

View reviewed changes

xarray/backends/api.py Outdated Show resolved Hide resolved

pratiman-91 and others added 9 commits May 30, 2025 20:57

Update xarray/backends/api.py

7c98670

Co-authored-by: Kai Mühlbauer <[email protected]>

[pre-commit.ci] auto fixes from pre-commit.com hooks

2567598

for more information, see https://pre-commit.ci

import emit_user_level_warning

e0fa3ba

[pre-commit.ci] auto fixes from pre-commit.com hooks

c24826c

for more information, see https://pre-commit.ci

retry importing emit_user_level_warning

f55644b

[pre-commit.ci] auto fixes from pre-commit.com hooks

a726c4a

for more information, see https://pre-commit.ci

emit_user_level_warning

16819bd

adding import

6105c0d

[pre-commit.ci] auto fixes from pre-commit.com hooks

75468a1

for more information, see https://pre-commit.ci

kmuehlbauer reviewed May 30, 2025

View reviewed changes

Uh oh!

Open mfdataset enchancement #9955

Are you sure you want to change the base?

Open mfdataset enchancement #9955

Conversation

pratiman-91 commented Jan 16, 2025

Uh oh!

welcome bot commented Jan 16, 2025

Uh oh!

max-sixty commented Jan 16, 2025

Uh oh!

headtr1ck left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

headtr1ck left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

pratiman-91 commented Jan 19, 2025

Uh oh!

Uh oh!

pratiman-91 commented Jan 20, 2025

Uh oh!

pratiman-91 commented Mar 31, 2025

Uh oh!

max-sixty commented Mar 31, 2025

Uh oh!

Uh oh!

pratiman-91 commented Apr 4, 2025

Uh oh!

max-sixty commented May 30, 2025

Uh oh!

pratiman-91 commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pratiman-91 commented May 30, 2025

Uh oh!

kmuehlbauer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pratiman-91 commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

kmuehlbauer commented May 30, 2025

Uh oh!

kmuehlbauer commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pratiman-91 commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kmuehlbauer commented May 30, 2025

Uh oh!

kmuehlbauer May 30, 2025

Choose a reason for hiding this comment

Uh oh!

pratiman-91 May 30, 2025

Choose a reason for hiding this comment

Uh oh!

kmuehlbauer May 30, 2025

Choose a reason for hiding this comment

Uh oh!

pratiman-91 commented Jun 18, 2025

Uh oh!

Uh oh!

pratiman-91 commented May 30, 2025 •

edited

Loading

pratiman-91 commented May 30, 2025 •

edited

Loading

kmuehlbauer commented May 30, 2025 •

edited

Loading

pratiman-91 commented May 30, 2025 •

edited

Loading