Migrate iterators.py for datatree. #8879

owenlittlejohns · 2024-03-26T18:14:53Z

This PR continues the overall work of migrating DataTree into xarray.

iterators.py does not have direct tests. In discussions with @TomNicholas and @flamingbear, we concurred that other unit tests utilise this functionality.

Closes migration step for iterators.py Track merging datatree into xarray #8572
~~Tests added~~
User visible changes (including notable bug fixes) are documented in whats-new.rst
~~New functions/methods are listed in api.rst~~

owenlittlejohns · 2024-03-26T19:23:50Z

xarray/core/iterators.py

 from abc import abstractmethod
 from collections import abc
-from typing import Callable, Iterator, List, Optional
+from collections.abc import Iterator


This change looked a bit unexpected, but typing.Iterator is a deprecated alias for collections.abc.Iterator

xarray/core/iterators.py

Illviljan · 2024-03-26T20:19:51Z

xarray/core/iterators.py

@@ -11,9 +14,9 @@ class AbstractIter(abc.Iterator):
    def __init__(


Considering this __init__, using @dataclass could be an option here.

@Illviljan - sorry for not replying to this comment before. In the latest PR, the AbstractIter class was removed in favour or importing directly from anytree, so I think this comment is now addressed. Let me know if not, though.

If you want, LevelOrderIter can still use @dataclass.

I wasn't familiar with this decorator prior to this PR, but it looks like it generates some magic methods for the class. I'm a little wary of adding them (for example, do we want LevelOrderIter.__eq__?). Is there a strong argument here for this class needing them?

We could make this a dataclass, but I don't think we need to bother. The main advantage of using dataclass is automatically defining a bunch of methods that you know you want, but here we aren't defining a bunch of property methods / comparison operators so it wouldn't save us many lines of code. It's also nice to be able to just say "this came directly from anytree as-is".

xarray/core/iterators.py

TomNicholas · 2024-03-27T04:37:53Z

As this module comes directly from the anytree library, we might need to copy its license into xarray/licenses.

TomNicholas · 2024-04-09T16:32:22Z

xarray/core/treenode.py


-        return iterators.PreOrderIter(self)
+        return PreOrderIter(self)


So we currently have two patterns of iteration used: depth-first (PreOrderIter) and breadth-first (LevelOrderIter). It looks like PreOrderIter is used in TreeNode.subtree, and LevelOrderIter is used in mapping.diff_treestructure. diff_treestructure is called inside check_isomorphic, and every other time we iterate over the tree it just calls .subtree.

Why the distinction? In diff_treestructure when comparing two trees with multiple points at which their structure deviates, the order of iteration will affect which deviation is raised as an error first. @flamingbear and I think here is it more intuitive to raise errors from the top-level of the tree first, so we agree that LevelOrderIter is the right choice here.

For mapping over the nodes of the tree in .subtree, I had previously thought that there was a good reason to use PreOrderIter. My reasoning was to do with how dt['/a/b/c/d/'] = data would immediately create the nodes /a, /a/b, /a/b/c even if they didn't already exist. But actually I don't think that's relevant here.

Another reason to use PreOrderIter might be around using map_over_subtree to map an operation that then fails. For example taking dt.mean(dim='time') when some of the nodes doesn't have a time dimension. Again the order of iteration affects which node will raise the first error.

However, if we actually think that LevelOrderIter is fine for the .subtree case too, we can potentially simplify this PR considerably by replacing the whole AbstractIter/PreOrderIter/LevelOrderIter framework with just a single iterate_breadth_first function that returns an iterator, and use that in all cases.

(Also then we wouldn't need the anytree license)

@TomNicholas - I've made some updates (or rather nabbed some work @flamingbear did and made some mypy tweaks).

A couple of things here:

This now contains the single class, but hasn't wrapped it in a function. I think it gets at what you were after, but just FYI. (Although, part of me wonders if this now warrants a module all of it's own?)

I left the anytree license in the PR, because the LevelOrderIter is still 99% the code from anytree.

This now contains the single class, but hasn't wrapped it in a function.

That's fine, thanks.

(Although, part of me wonders if this now warrants a module all of it's own?)

If you wanted to move it to treenode.py instead that would also make sense. But I don't have a strong opinion, and it's trivial to change later.

Okay - I'm happy to leave this in it's own module for now.

Illviljan · 2024-04-10T20:20:18Z

xarray/core/iterators.py

+    """Iterate over tree applying level-order strategy starting at `node`.
+       This is the iterator used by `DataTree` to traverse nodes.


Suggested change

"""Iterate over tree applying level-order strategy starting at `node`.

This is the iterator used by `DataTree` to traverse nodes.

"""

Iterate over tree applying level-order strategy starting at `node`.

This is the iterator used by `DataTree` to traverse nodes.

Illviljan · 2024-04-10T20:20:36Z

xarray/core/iterators.py

+        for ``node``.
+    maxlevel : int, optional
+        Maximum level to descend in the node hierarchy.
+    Examples


Suggested change

Examples

Examples

Illviljan · 2024-04-10T20:20:50Z

xarray/core/iterators.py

+
+    """


Suggested change

"""

"""

I made the suggested whitespace changes in this commit.

Well I made 2.5 of them - I left the first line of the class documentation string for LevelOrderIter on the same line, as this seems consistent with other documentation strings in the repository. I did de-dent the second line, though, because that definitely looked a bit off.

Illviljan · 2024-04-10T20:30:27Z

xarray/tests/test_treenode.py

@@ -337,12 +320,12 @@ def test_descendants(self):
        descendants = root.descendants
        expected = [
            "b",
+            "c",


This test has no typing, why doesn't mypy complain?

New test files shouldn't be allowed to stay untyped:

xarray/pyproject.toml

Line 158 in a07e16c

"xarray.tests.*",

With the values of expected, I'm guessing that because it's all just built-in Python types (a list of str values) mypy can understand that (here is a random similar example from test_dataset.py.

For the places where create_test_tree is called (so root in this test), the typing is done within that test function, so my guess is that mypy is pulling the typing from there (see L242 - L250).

Let's leave worrying about mypy on tests for #8926

TomNicholas · 2024-04-11T14:36:32Z

xarray/tests/test_datatree.py

+            "/set2",
+            "/set3",


Nice. Interesting how few tests explicit rely on the order of iteration.

TomNicholas

This looks good to me.

TomNicholas · 2024-04-11T14:37:40Z

xarray/tests/test_treenode.py

@@ -337,12 +320,12 @@ def test_descendants(self):
        descendants = root.descendants
        expected = [
            "b",
+            "c",


Let's leave worrying about mypy on tests for #8926

TomNicholas · 2024-04-11T14:42:23Z

xarray/core/iterators.py

@@ -11,9 +14,9 @@ class AbstractIter(abc.Iterator):
    def __init__(


We could make this a dataclass, but I don't think we need to bother. The main advantage of using dataclass is automatically defining a bunch of methods that you know you want, but here we aren't defining a bunch of property methods / comparison operators so it wouldn't save us many lines of code. It's also nice to be able to just say "this came directly from anytree as-is".

flamingbear · 2024-04-11T15:26:23Z

This looks good to me now too @TomNicholas are you waiting for another approval?

owenlittlejohns added 3 commits March 26, 2024 14:10

Migrate iterators.py for datatree.

6730dd9

Add __future__.annotations for Python 3.9.

d02a83e

Fix documentation typo in GitHub URL.

a9cd6db

owenlittlejohns commented Mar 26, 2024

View reviewed changes

Illviljan reviewed Mar 26, 2024

View reviewed changes

owenlittlejohns added 2 commits March 26, 2024 18:49

Improve type hints and documentation strings.

dc8ad95

Fix DataTree docstring examples.

7172008

Illviljan added the topic-DataTree Related to the implementation of a DataTree class label Mar 27, 2024

owenlittlejohns and others added 4 commits March 28, 2024 18:52

Add anytree license.

cf9d07a

Merge branch 'main' into DAS-2063-migrate-iterators

6aed4b2

Merge branch 'main' into DAS-2063-migrate-iterators

a9d07f1

Merge branch 'main' into DAS-2063-migrate-iterators

fee97b9

TomNicholas reviewed Apr 9, 2024

View reviewed changes

TomNicholas mentioned this pull request Apr 9, 2024

Track merging datatree into xarray #8572

Closed

27 tasks

DAS-2063: Changes to use just LevelOrderIter

2dfafdb

Illviljan reviewed Apr 10, 2024

View reviewed changes

owenlittlejohns and others added 2 commits April 10, 2024 16:46

Minor whitespace tweaks.

4ca7339

Merge branch 'main' into DAS-2063-migrate-iterators

22b6e36

TomNicholas reviewed Apr 11, 2024

View reviewed changes

TomNicholas approved these changes Apr 11, 2024

View reviewed changes

TomNicholas merged commit 1d43672 into pydata:main Apr 11, 2024
31 checks passed

owenlittlejohns deleted the DAS-2063-migrate-iterators branch April 15, 2024 16:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate iterators.py for datatree. #8879

Migrate iterators.py for datatree. #8879

owenlittlejohns commented Mar 26, 2024 •

edited

Loading

owenlittlejohns Mar 26, 2024

Illviljan Mar 26, 2024

owenlittlejohns Apr 10, 2024

Illviljan Apr 10, 2024

owenlittlejohns Apr 11, 2024

TomNicholas Apr 11, 2024

TomNicholas commented Mar 27, 2024

TomNicholas Apr 9, 2024

TomNicholas Apr 10, 2024

owenlittlejohns Apr 10, 2024

TomNicholas Apr 11, 2024

owenlittlejohns Apr 11, 2024

Illviljan Apr 10, 2024

Illviljan Apr 10, 2024

Illviljan Apr 10, 2024

owenlittlejohns Apr 10, 2024

Illviljan Apr 10, 2024

owenlittlejohns Apr 10, 2024 •

edited

Loading

TomNicholas Apr 11, 2024

TomNicholas Apr 11, 2024 •

edited

Loading

TomNicholas left a comment

TomNicholas Apr 11, 2024

TomNicholas Apr 11, 2024

flamingbear commented Apr 11, 2024

		@@ -11,9 +14,9 @@ class AbstractIter(abc.Iterator):
		def __init__(


		return iterators.PreOrderIter(self)
		return PreOrderIter(self)

		"""Iterate over tree applying level-order strategy starting at `node`.
		This is the iterator used by `DataTree` to traverse nodes.

Migrate iterators.py for datatree. #8879

Migrate iterators.py for datatree. #8879

Conversation

owenlittlejohns commented Mar 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomNicholas commented Mar 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

owenlittlejohns Apr 10, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomNicholas Apr 11, 2024 • edited Loading

Choose a reason for hiding this comment

TomNicholas left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

flamingbear commented Apr 11, 2024

owenlittlejohns commented Mar 26, 2024 •

edited

Loading

owenlittlejohns Apr 10, 2024 •

edited

Loading

TomNicholas Apr 11, 2024 •

edited

Loading