Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datatree import #8656

Closed
wants to merge 294 commits into from
Closed

Datatree import #8656

wants to merge 294 commits into from

Conversation

flamingbear
Copy link
Member

This PR imports xarray-contrib/datatree and its history to pydata/xarray/xarray/datatree_

Step one of issue #8572.
This imports the datatree code without exposing DataTree as a public API.

git filter repo https://github.com/newren/git-filter-repo was used to preserve some history for the merge.

Datatree tags were renamed to legacy-datatree-{tag}:

git filter-repo --tag-rename '':'legacy-datatree-'

Links to xarray-contrib/datatree original PRs are preserved by rewriting the messages using a replace-message file.

# standard github style pull request (#39)
regex:\(#(\d+)\)==>https://github.com/xarray-contrib/datatree/pull/\1

# "Merge pull request #11 from TomNicholas/single_datanode_class"
regex:pull request #(\d+)==>https://github.com/xarray-contrib/datatree/pull/\1

Links to the xarray-contrib/datatree Issues are preserved with a replace-text file.

# standard comment change "  # see issue #38"
regex:(\s*?#.*[ ])#(\d+)==>\1https://github.com/xarray-contrib/datatree/issues/\2

# also "    @pytest.mark.xfail(reason="Indexing needs to return whole tree (GH #77)")"
regex:(\(GH #(\d+)\))==>(GH https://github.com/xarray-contrib/datatree/issues/\2)

The datatree repo was relocated to a subdirectory

git-filter-repo --to-subdirectory xarray/datatree_

and the prepared datatree repository was added as a remote and merged into xarray.

git merge prepared-datatree/main --no-commit --allow-unrelated-histories

This should allow work to begin on the rest of the steps in #8572

TomNicholas and others added 30 commits August 24, 2021 12:15
Define all Dataset properties on DataTree
…finition

Add API methods in class definition
* black reformatting

* add setup.cfg to configure flake8/black/isort/mypy

* add setup.cfg to configure flake8/black/isort/mypy xarray-contrib/datatree#22

* passes flake8

* disabled mypy for now

Co-authored-by: Joseph Hamman <[email protected]>
* first attempt at to_netcdf

* lint

* add test for roundtrip and support empty nodes

* Apply suggestions from code review

Co-authored-by: Tom Nicholas <[email protected]>

* update roundtrip test, improves empty node handling in IO

Co-authored-by: Tom Nicholas <[email protected]>
* pseudocode ideas for generalizing map_over_subtree

* pseudocode for a generalized map_over_subtree (still only one return arg) + a new mapping.py file

* pseudocode for mapping but now multiple return values

* pseudocode for mapping but with multiple return values

* check_isomorphism works and has tests

* cleaned up the mapping tests a bit

* remove WIP from oter branch

* ensure tests pass

* map_over_subtree in the public API properly

* linting
* add test for roundtrip and support empty nodes

* update roundtrip test, improves empty node handling in IO

* add zarr read/write support

* support netcdf4 or h5netcdf

* netcdf is optional, zarr too!

* Apply suggestions from code review

Co-authored-by: Tom Nicholas <[email protected]>

Co-authored-by: Tom Nicholas <[email protected]>
* pseudocode ideas for generalizing map_over_subtree

* pseudocode for a generalized map_over_subtree (still only one return arg) + a new mapping.py file

* pseudocode for mapping but now multiple return values

* pseudocode for mapping but with multiple return values

* check_isomorphism works and has tests

* cleaned up the mapping tests a bit

* tests for mapping over multiple trees

* incorrect pseudocode attempt to map over multiple subtrees

* small improvements

* fixed test

* zipping of multiple arguments

* passes for mapping over a single tree

* successfully maps over multiple trees

* successfully returns multiple trees

* filled out all tests

* checking types now works for trees with only one node

* improved docstring
@flamingbear
Copy link
Member Author

And there it is, I've made a complete hash of this PR by merging in main from xarray.
And the attempt to exclude datatree_ in ruff alone is still failing.

I suspect I will update with the full pre-commit.ci exclusion like I had before, but we may want a cleaner history. I will wait for some advice.

returns the exclusion to all pre-commit.ci
@keewis
Copy link
Collaborator

keewis commented Jan 30, 2024

I think if we have the branch in this PR in its final state, cleaning it should be pretty easy: replay the merge commit on top of current main, then apply all the additional changes you did on top of that (by cherry-picking). I'd only attempt to do that once we're ready to actually merge, though (just tell me when its ready and I'll have a go).

@flamingbear
Copy link
Member Author

flamingbear commented Jan 30, 2024

I don't think we want the pre-commit.ci changes that happened in 8eb2aa3 though. I am fine doing this again in a new branch (I've already done it in flamingbear:import-datatree-attempt2 to work on open_datatree) with an additional single clean commit that has the information for skipping mypy, doctest and pre-commit.ci for datatree_ but there has been a lot of conversation in this PR that maybe should be preserved?

edit: and actually if you wanted me to push up the final prepared datatree repo, you could work with that. I'm not sure what needs to happen to make this ready for merging.

@keewis
Copy link
Collaborator

keewis commented Jan 30, 2024

The conversation on this PR will stay, but won't make it into any of the commits. So I don't think we lose much by closing it instead of merging (in the end, PRs are just a way to discuss the changes from branches / commits). We could link to this PR from the merge commit, though.

As far as making it ready for merging, I believe we only need to make sure it doesn't interfere with the rest of the repository (in particular CI and pre-commit / linters, but we should also make sure we don't include it in releases).

So I guess the packaging issue aside this should already be ready?

@flamingbear
Copy link
Member Author

Yes, I think this is ready, or I am ready to create a new one that references this one as needed. I am not sure what to do about excluding datatree_ in packaging, but I can look into that.

@flamingbear
Copy link
Member Author

flamingbear commented Jan 30, 2024

So I guess the packaging issue aside this should already be ready?

My naive approach would be to delete the directory before each build in the workflows?

.github/workflows/nightly-wheels.yml, .github/workflows/pypi-release.yaml

probably

      - name: Build tarball and wheels
        run: |
          git clean -xdf
          git restore -SW .
          rm -rf xarray/datatree_
          python -m build

@keewis
Copy link
Collaborator

keewis commented Jan 31, 2024

it took me a while to figure that out, but a MANIFEST.in file is able to tell setuptools-scm to not include this particular directory (see the most recent commit)

@keewis
Copy link
Collaborator

keewis commented Jan 31, 2024

I've tried multiple ways to cherry-pick / rebase a merge commit, but I don't think this is going to work very well. So that leaves us with redoing the merge, and cherry-picking the commits you/we did to prepare the final merge on top of it (my goal is to do a fast-forward merge into main).

Would you be up for doing that, @flamingbear? Otherwise I'll do it, but could you confirm that all the preparation you did on the datatree commits are stored in flamingbear/rewritten-datatree?

@flamingbear
Copy link
Member Author

flamingbear commented Jan 31, 2024

Would you be up for doing that, @flamingbear? Otherwise I'll do it, but could you confirm that all the preparation you did on the datatree commits are stored in flamingbear/rewritten-datatree?

I'm prepared to do that again. And I can confirm that rewritten-datatree is NOT now the file version (I updated). I can/will have pushed up the final version into that location, but that version is the first test version I did that does not include the final commits from datatree.

@keewis
Copy link
Collaborator

keewis commented Jan 31, 2024

Well, if you're prepared to do that yourself I don't even need to know, I'll just verify the result and do the merge (and I've put the decision on whether to merge the cleaned version of this on the agenda of the meeting today)

@flamingbear
Copy link
Member Author

flamingbear commented Jan 31, 2024

@keewis I have updated rewritten-datatree with the repository used in the cleaner PR: #8688
Let me know how that looks. So hopefully, close this, merge that, is my thinking

@flamingbear
Copy link
Member Author

flamingbear commented Jan 31, 2024

it took me a while to figure that out, but a MANIFEST.in file is able to tell setuptools-scm to not include this particular directory (see the most recent commit)

I missed this, I will add this to the skips ci commit in #8688

Edit: ✅ complete

@keewis
Copy link
Collaborator

keewis commented Jan 31, 2024

I hope you didn't understand this as me saying you have to use the state from rewritten-datatree, I just wanted to know if you had the rewritten and updated state somewhere. So if we have all the commits on datatree this looks fine to me. I'll comment one tiny thing on the other PR, but other than that I think we're ready.

@flamingbear
Copy link
Member Author

Nope, I didn't think that, but in the interest of transparency I pushed up the full repo I had prepared locally. I'll go look at the note on the other pr.

@keewis keewis closed this Jan 31, 2024
@flamingbear flamingbear deleted the datatree-import branch January 31, 2024 18:42
@keewis keewis mentioned this pull request Feb 11, 2024
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-DataTree Related to the implementation of a DataTree class
Projects
Development

Successfully merging this pull request may close these issues.