-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate iterators.py for datatree. #8879
Migrate iterators.py for datatree. #8879
Conversation
from abc import abstractmethod | ||
from collections import abc | ||
from typing import Callable, Iterator, List, Optional | ||
from collections.abc import Iterator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looked a bit unexpected, but typing.Iterator
is a deprecated alias for collections.abc.Iterator
@@ -11,9 +14,9 @@ class AbstractIter(abc.Iterator): | |||
def __init__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering this __init__
, using @dataclass
could be an option here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Illviljan - sorry for not replying to this comment before. In the latest PR, the AbstractIter
class was removed in favour or importing directly from anytree
, so I think this comment is now addressed. Let me know if not, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want, LevelOrderIter
can still use @dataclass
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't familiar with this decorator prior to this PR, but it looks like it generates some magic methods for the class. I'm a little wary of adding them (for example, do we want LevelOrderIter.__eq__
?). Is there a strong argument here for this class needing them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make this a dataclass, but I don't think we need to bother. The main advantage of using dataclass
is automatically defining a bunch of methods that you know you want, but here we aren't defining a bunch of property methods / comparison operators so it wouldn't save us many lines of code. It's also nice to be able to just say "this came directly from anytree as-is".
As this module comes directly from the anytree library, we might need to copy its license into xarray/licenses. |
xarray/core/treenode.py
Outdated
|
||
return iterators.PreOrderIter(self) | ||
return PreOrderIter(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we currently have two patterns of iteration used: depth-first (PreOrderIter
) and breadth-first (LevelOrderIter
). It looks like PreOrderIter
is used in TreeNode.subtree
, and LevelOrderIter
is used in mapping.diff_treestructure
. diff_treestructure
is called inside check_isomorphic
, and every other time we iterate over the tree it just calls .subtree
.
Why the distinction? In diff_treestructure
when comparing two trees with multiple points at which their structure deviates, the order of iteration will affect which deviation is raised as an error first. @flamingbear and I think here is it more intuitive to raise errors from the top-level of the tree first, so we agree that LevelOrderIter
is the right choice here.
For mapping over the nodes of the tree in .subtree
, I had previously thought that there was a good reason to use PreOrderIter
. My reasoning was to do with how dt['/a/b/c/d/'] = data
would immediately create the nodes /a
, /a/b
, /a/b/c
even if they didn't already exist. But actually I don't think that's relevant here.
Another reason to use PreOrderIter
might be around using map_over_subtree
to map an operation that then fails. For example taking dt.mean(dim='time')
when some of the nodes doesn't have a time dimension. Again the order of iteration affects which node will raise the first error.
However, if we actually think that LevelOrderIter
is fine for the .subtree
case too, we can potentially simplify this PR considerably by replacing the whole AbstractIter/PreOrderIter/LevelOrderIter
framework with just a single iterate_breadth_first
function that returns an iterator, and use that in all cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Also then we wouldn't need the anytree license)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomNicholas - I've made some updates (or rather nabbed some work @flamingbear did and made some mypy
tweaks).
A couple of things here:
- This now contains the single class, but hasn't wrapped it in a function. I think it gets at what you were after, but just FYI. (Although, part of me wonders if this now warrants a module all of it's own?)
- I left the
anytree
license in the PR, because theLevelOrderIter
is still 99% the code fromanytree
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now contains the single class, but hasn't wrapped it in a function.
That's fine, thanks.
(Although, part of me wonders if this now warrants a module all of it's own?)
If you wanted to move it to treenode.py
instead that would also make sense. But I don't have a strong opinion, and it's trivial to change later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay - I'm happy to leave this in it's own module for now.
xarray/core/iterators.py
Outdated
"""Iterate over tree applying level-order strategy starting at `node`. | ||
This is the iterator used by `DataTree` to traverse nodes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Iterate over tree applying level-order strategy starting at `node`. | |
This is the iterator used by `DataTree` to traverse nodes. | |
""" | |
Iterate over tree applying level-order strategy starting at `node`. | |
This is the iterator used by `DataTree` to traverse nodes. |
for ``node``. | ||
maxlevel : int, optional | ||
Maximum level to descend in the node hierarchy. | ||
Examples |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Examples | |
Examples |
xarray/core/iterators.py
Outdated
|
||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
""" | |
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the suggested whitespace changes in this commit.
Well I made 2.5 of them - I left the first line of the class documentation string for LevelOrderIter
on the same line, as this seems consistent with other documentation strings in the repository. I did de-dent the second line, though, because that definitely looked a bit off.
@@ -337,12 +320,12 @@ def test_descendants(self): | |||
descendants = root.descendants | |||
expected = [ | |||
"b", | |||
"c", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test has no typing, why doesn't mypy complain?
New test files shouldn't be allowed to stay untyped:
Line 158 in a07e16c
"xarray.tests.*", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the values of expected
, I'm guessing that because it's all just built-in Python types (a list
of str
values) mypy
can understand that (here is a random similar example from test_dataset.py.
For the places where create_test_tree
is called (so root
in this test), the typing is done within that test function, so my guess is that mypy
is pulling the typing from there (see L242 - L250).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave worrying about mypy on tests for #8926
"/set2", | ||
"/set3", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Interesting how few tests explicit rely on the order of iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me.
@@ -337,12 +320,12 @@ def test_descendants(self): | |||
descendants = root.descendants | |||
expected = [ | |||
"b", | |||
"c", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave worrying about mypy on tests for #8926
@@ -11,9 +14,9 @@ class AbstractIter(abc.Iterator): | |||
def __init__( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could make this a dataclass, but I don't think we need to bother. The main advantage of using dataclass
is automatically defining a bunch of methods that you know you want, but here we aren't defining a bunch of property methods / comparison operators so it wouldn't save us many lines of code. It's also nice to be able to just say "this came directly from anytree as-is".
This looks good to me now too @TomNicholas are you waiting for another approval? |
This PR continues the overall work of migrating DataTree into xarray.
iterators.py
does not have direct tests. In discussions with @TomNicholas and @flamingbear, we concurred that other unit tests utilise this functionality.iterators.py
Track merging datatree into xarray #8572Tests addedwhats-new.rst
New functions/methods are listed inapi.rst