Datatree import #8656

flamingbear · 2024-01-24T16:17:02Z

This PR imports xarray-contrib/datatree and its history to pydata/xarray/xarray/datatree_

Step one of issue #8572.
This imports the datatree code without exposing DataTree as a public API.

git filter repo https://github.com/newren/git-filter-repo was used to preserve some history for the merge.

Datatree tags were renamed to legacy-datatree-{tag}:

git filter-repo --tag-rename '':'legacy-datatree-'

Links to xarray-contrib/datatree original PRs are preserved by rewriting the messages using a replace-message file.

# standard github style pull request (#39)
regex:\(#(\d+)\)==>https://github.com/xarray-contrib/datatree/pull/\1

# "Merge pull request #11 from TomNicholas/single_datanode_class"
regex:pull request #(\d+)==>https://github.com/xarray-contrib/datatree/pull/\1

Links to the xarray-contrib/datatree Issues are preserved with a replace-text file.

# standard comment change "  # see issue #38"
regex:(\s*?#.*[ ])#(\d+)==>\1https://github.com/xarray-contrib/datatree/issues/\2

# also "    @pytest.mark.xfail(reason="Indexing needs to return whole tree (GH #77)")"
regex:(\(GH #(\d+)\))==>(GH https://github.com/xarray-contrib/datatree/issues/\2)

The datatree repo was relocated to a subdirectory

git-filter-repo --to-subdirectory xarray/datatree_

and the prepared datatree repository was added as a remote and merged into xarray.

git merge prepared-datatree/main --no-commit --allow-unrelated-histories

This should allow work to begin on the rest of the steps in #8572

Define all Dataset properties on DataTree

…finition Add API methods in class definition

Expose dataset reduce operations

Add basic CI setup

* black reformatting * add setup.cfg to configure flake8/black/isort/mypy * add setup.cfg to configure flake8/black/isort/mypy xarray-contrib/datatree#22 * passes flake8 * disabled mypy for now Co-authored-by: Joseph Hamman <[email protected]>

* first attempt at to_netcdf * lint * add test for roundtrip and support empty nodes * Apply suggestions from code review Co-authored-by: Tom Nicholas <[email protected]> * update roundtrip test, improves empty node handling in IO Co-authored-by: Tom Nicholas <[email protected]>

* pseudocode ideas for generalizing map_over_subtree * pseudocode for a generalized map_over_subtree (still only one return arg) + a new mapping.py file * pseudocode for mapping but now multiple return values * pseudocode for mapping but with multiple return values * check_isomorphism works and has tests * cleaned up the mapping tests a bit * remove WIP from oter branch * ensure tests pass * map_over_subtree in the public API properly * linting

* add test for roundtrip and support empty nodes * update roundtrip test, improves empty node handling in IO * add zarr read/write support * support netcdf4 or h5netcdf * netcdf is optional, zarr too! * Apply suggestions from code review Co-authored-by: Tom Nicholas <[email protected]> Co-authored-by: Tom Nicholas <[email protected]>

* pseudocode ideas for generalizing map_over_subtree * pseudocode for a generalized map_over_subtree (still only one return arg) + a new mapping.py file * pseudocode for mapping but now multiple return values * pseudocode for mapping but with multiple return values * check_isomorphism works and has tests * cleaned up the mapping tests a bit * tests for mapping over multiple trees * incorrect pseudocode attempt to map over multiple subtrees * small improvements * fixed test * zipping of multiple arguments * passes for mapping over a single tree * successfully maps over multiple trees * successfully returns multiple trees * filled out all tests * checking types now works for trees with only one node * improved docstring

…datatree#34

keewis · 2024-01-25T19:53:32Z

basically, what I was thinking was that we'd prepare the merge in a branch that contains only the datatree history, and once we're ready to do the merge, we'd create the merge commit that combines the two histories and push to main. Not sure if that's the best strategy, but it would result in a cleaner history.

max-sixty · 2024-01-26T01:29:18Z

.pre-commit-config.yaml

@@ -1,6 +1,7 @@
 # https://pre-commit.com/
 ci:
    autoupdate_schedule: monthly
+exclude: 'xarray/datatree_.*'


What fails here?

Ideally we want to promote exclusions into the tools themselves — like we have for mypy — rather than in the pre-commit-config. Since editors will run many of these by default, and will only pick up on exclusions from the tools' configs...

Though if someone wants to own fixing the issues soon after merging, no need to slow down the initial merge...

ruff is the one that fails. However, pre-commit passes all (changed) files explicitly, so I'm not sure how easy it would be to exclude directly in ruff (we could add an additional exclude to ruff's config, though).

I was unable to generate an exclude for ruff's config only that actually seemed to be respected.

In this exclude block https://github.com/flamingbear/xarray/blob/datatree-import/pyproject.toml#L233
I tried variations on "xarray/datatree_.*", "xarray/datatree_", and "datatree_"

I'm open to suggestions.

edited: It may have been a conflict with project.toml. I'm looking again.

max-sixty · 2024-01-26T01:29:30Z

xarray/datatree_/datatree/py.typed

for more information, see https://pre-commit.ci

flamingbear · 2024-01-29T18:53:22Z

And there it is, I've made a complete hash of this PR by merging in main from xarray.
And the attempt to exclude datatree_ in ruff alone is still failing.

I suspect I will update with the full pre-commit.ci exclusion like I had before, but we may want a cleaner history. I will wait for some advice.

returns the exclusion to all pre-commit.ci

keewis · 2024-01-30T17:05:14Z

I think if we have the branch in this PR in its final state, cleaning it should be pretty easy: replay the merge commit on top of current main, then apply all the additional changes you did on top of that (by cherry-picking). I'd only attempt to do that once we're ready to actually merge, though (just tell me when its ready and I'll have a go).

flamingbear · 2024-01-30T17:14:24Z

I don't think we want the pre-commit.ci changes that happened in 8eb2aa3 though. I am fine doing this again in a new branch (I've already done it in flamingbear:import-datatree-attempt2 to work on open_datatree) with an additional single clean commit that has the information for skipping mypy, doctest and pre-commit.ci for datatree_ but there has been a lot of conversation in this PR that maybe should be preserved?

edit: and actually if you wanted me to push up the final prepared datatree repo, you could work with that. I'm not sure what needs to happen to make this ready for merging.

keewis · 2024-01-30T17:57:01Z

The conversation on this PR will stay, but won't make it into any of the commits. So I don't think we lose much by closing it instead of merging (in the end, PRs are just a way to discuss the changes from branches / commits). We could link to this PR from the merge commit, though.

As far as making it ready for merging, I believe we only need to make sure it doesn't interfere with the rest of the repository (in particular CI and pre-commit / linters, but we should also make sure we don't include it in releases).

So I guess the packaging issue aside this should already be ready?

flamingbear · 2024-01-30T19:42:23Z

Yes, I think this is ready, or I am ready to create a new one that references this one as needed. I am not sure what to do about excluding datatree_ in packaging, but I can look into that.

flamingbear · 2024-01-30T23:06:38Z

So I guess the packaging issue aside this should already be ready?

My naive approach would be to delete the directory before each build in the workflows?

.github/workflows/nightly-wheels.yml, .github/workflows/pypi-release.yaml

probably

      - name: Build tarball and wheels
        run: |
          git clean -xdf
          git restore -SW .
          rm -rf xarray/datatree_
          python -m build

keewis · 2024-01-31T09:57:56Z

it took me a while to figure that out, but a MANIFEST.in file is able to tell setuptools-scm to not include this particular directory (see the most recent commit)

keewis · 2024-01-31T10:26:11Z

I've tried multiple ways to cherry-pick / rebase a merge commit, but I don't think this is going to work very well. So that leaves us with redoing the merge, and cherry-picking the commits you/we did to prepare the final merge on top of it (my goal is to do a fast-forward merge into main).

Would you be up for doing that, @flamingbear? Otherwise I'll do it, but could you confirm that all the preparation you did on the datatree commits are stored in flamingbear/rewritten-datatree?

flamingbear · 2024-01-31T15:05:55Z

Would you be up for doing that, @flamingbear? Otherwise I'll do it, but could you confirm that all the preparation you did on the datatree commits are stored in flamingbear/rewritten-datatree?

I'm prepared to do that again. And I can confirm that rewritten-datatree is ~~NOT~~ now the file version (I updated). I ~~can/will~~ have pushed up the final version into that location, but that version is the first test version I did that does not include the final commits from datatree.

keewis · 2024-01-31T15:15:29Z

Well, if you're prepared to do that yourself I don't even need to know, I'll just verify the result and do the merge (and I've put the decision on whether to merge the cleaned version of this on the agenda of the meeting today)

flamingbear · 2024-01-31T15:29:00Z

@keewis I have updated rewritten-datatree with the repository used in the cleaner PR: #8688
Let me know how that looks. So hopefully, close this, merge that, is my thinking

flamingbear · 2024-01-31T15:34:34Z

it took me a while to figure that out, but a MANIFEST.in file is able to tell setuptools-scm to not include this particular directory (see the most recent commit)

I missed this, I will add this to the skips ci commit in #8688

Edit: ✅ complete

keewis · 2024-01-31T15:53:28Z

I hope you didn't understand this as me saying you have to use the state from rewritten-datatree, I just wanted to know if you had the rewritten and updated state somewhere. So if we have all the commits on datatree this looks fine to me. I'll comment one tiny thing on the other PR, but other than that I think we're ready.

flamingbear · 2024-01-31T15:55:01Z

Nope, I didn't think that, but in the interest of transparency I pushed up the full repo I had prepared locally. I'll go look at the note on the other pr.

TomNicholas and others added 30 commits August 24, 2021 12:15

Merge branch 'main' of https://github.com/TomNicholas/datatree

a5a5428

just do it the manual way for now

478f193

Merge xarray-contrib/datatree#14 from TomNicholas/all_properties

b38e97e

Define all Dataset properties on DataTree

remove list of dataset properties to add

6df675e

Update status of project in readme

0cdcdab

add_api_in_class_definition

448ead4

Merge xarray-contrib/datatree#19 from TomNicholas/add_api_in_class_de…

7c73ea9

…finition Add API methods in class definition

add basic ci setup

51e89d3

add long description from readme

f160cc2

switch to conda

4cd5e2d

refactored to add methods at class definition time

fca03b1

add pytest-cov

5bcb5c0

now also inherits from a mapped version of DataWithCoords

d5035d8

now also inherits from a mapped version of DataWithCoords

e784f91

dont try and import ops that we cant define on a dataset

0c3ceab

lists of methods to define shouldn't be stored as attributes

4eed833

test reduce ops

d765c99

Merge xarray-contrib/datatree#10 from TomNicholas/expose_dataset_ops

5f0fe4e

Expose dataset reduce operations

add developers note on class structure of DataTree

d3bb49e

Merge xarray-contrib/datatree#20 from jhamman/ci

c531432

Add basic CI setup

Linting xarray-contrib/datatree#21

bbdd7fc

* black reformatting * add setup.cfg to configure flake8/black/isort/mypy * add setup.cfg to configure flake8/black/isort/mypy xarray-contrib/datatree#22 * passes flake8 * disabled mypy for now Co-authored-by: Joseph Hamman <[email protected]>

updated developer's note

c1785e0

subtree_nodes -> subtree

75511c7

hotfix + test for bug in DataTree.__init__

afc29f5

don't need to special case root when saving to netcdf

2aea280

skips tests if it doesn't have the correct dependency xarray-contrib/…

5cc4bb1

…datatree#34

max-sixty reviewed Jan 26, 2024

View reviewed changes

xarray/datatree_/datatree/py.typed Outdated

Copy link

Collaborator

max-sixty Jan 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

TomNicholas added the topic-DataTree Related to the implementation of a DataTree class label Jan 26, 2024

flamingbear and others added 4 commits January 29, 2024 10:58

DAS-2060: Exclude xarray/datatree_ from ruff only

e137a35

DAS-2060: Exclude xarray/datatree_ from ruff only

25e8eab

Merge branch 'main' into datatree-import

087400b

[pre-commit.ci] auto fixes from pre-commit.com hooks

8eb2aa3

for more information, see https://pre-commit.ci

flamingbear and others added 3 commits January 29, 2024 14:04

DAS-2060: Sets up pre-commit.ci exclusion for datatree_

5eeef57

returns the exclusion to all pre-commit.ci

Merge branch 'pydata:main' into datatree-import

d9948e5

Merge branch 'main' into datatree-import

3599a14

use the manifest to exclude the datatree_ directory

3de55c6

flamingbear mentioned this pull request Jan 31, 2024

datatree import (clean) #8688

Merged

keewis closed this Jan 31, 2024

flamingbear deleted the datatree-import branch January 31, 2024 18:42

keewis mentioned this pull request Feb 11, 2024

add open_datatree to xarray #8697

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datatree import #8656

Datatree import #8656

flamingbear commented Jan 24, 2024

keewis commented Jan 25, 2024

max-sixty Jan 26, 2024

keewis Jan 26, 2024

flamingbear Jan 29, 2024 •

edited

Loading

max-sixty Jan 26, 2024

flamingbear commented Jan 29, 2024

keewis commented Jan 30, 2024 •

edited

Loading

flamingbear commented Jan 30, 2024 •

edited

Loading

keewis commented Jan 30, 2024 •

edited

Loading

flamingbear commented Jan 30, 2024

flamingbear commented Jan 30, 2024 •

edited

Loading

keewis commented Jan 31, 2024

keewis commented Jan 31, 2024 •

edited

Loading

flamingbear commented Jan 31, 2024 •

edited

Loading

keewis commented Jan 31, 2024

flamingbear commented Jan 31, 2024 •

edited

Loading

flamingbear commented Jan 31, 2024 •

edited

Loading

keewis commented Jan 31, 2024 •

edited

Loading

flamingbear commented Jan 31, 2024

Datatree import #8656

Datatree import #8656

Conversation

flamingbear commented Jan 24, 2024

keewis commented Jan 25, 2024

max-sixty Jan 26, 2024

Choose a reason for hiding this comment

keewis Jan 26, 2024

Choose a reason for hiding this comment

flamingbear Jan 29, 2024 • edited Loading

Choose a reason for hiding this comment

max-sixty Jan 26, 2024

Choose a reason for hiding this comment

flamingbear commented Jan 29, 2024

keewis commented Jan 30, 2024 • edited Loading

flamingbear commented Jan 30, 2024 • edited Loading

keewis commented Jan 30, 2024 • edited Loading

flamingbear commented Jan 30, 2024

flamingbear commented Jan 30, 2024 • edited Loading

keewis commented Jan 31, 2024

keewis commented Jan 31, 2024 • edited Loading

flamingbear commented Jan 31, 2024 • edited Loading

keewis commented Jan 31, 2024

flamingbear commented Jan 31, 2024 • edited Loading

flamingbear commented Jan 31, 2024 • edited Loading

keewis commented Jan 31, 2024 • edited Loading

flamingbear commented Jan 31, 2024

flamingbear Jan 29, 2024 •

edited

Loading

keewis commented Jan 30, 2024 •

edited

Loading

flamingbear commented Jan 30, 2024 •

edited

Loading

keewis commented Jan 30, 2024 •

edited

Loading

flamingbear commented Jan 30, 2024 •

edited

Loading

keewis commented Jan 31, 2024 •

edited

Loading

flamingbear commented Jan 31, 2024 •

edited

Loading

flamingbear commented Jan 31, 2024 •

edited

Loading

flamingbear commented Jan 31, 2024 •

edited

Loading

keewis commented Jan 31, 2024 •

edited

Loading