-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Open
Description
What happened?
I'm not sure if this is a bug or a feature but I was expecting this example to work since the new coord is just a slight rewrite of the original dimension coordinate:
import xarray as xr
ds = xr.tutorial.open_dataset("air_temperature")
# Change the first time value:
ds["air_new"] = ds.air.copy()
air_new_changed = ds.air_new[{"time": 0}] * 3
ds.air_new.loc[air_new_changed.coords] = air_new_changed # Works! :)
# Add a another coord along time axis and change
# the first time value:
ds["air_new"] = ds.air.copy().assign_coords(
{"time_float": ds.time.astype(float)}
)
air_new_changed = ds.air_new[{"time": 0}] * 4
ds.air_new.loc[air_new_changed.coords] = air_new_changed # Error! :(
Traceback (most recent call last):
Cell In[25], line 5
ds.air_new.loc[air_new_changed.coords] = air_new_changed
File ~\AppData\Local\mambaforge\envs\jw\lib\site-packages\xarray\core\dataarray.py:222 in __setitem__
dim_indexers = map_index_queries(self.data_array, key).dim_indexers
File ~\AppData\Local\mambaforge\envs\jw\lib\site-packages\xarray\core\indexing.py:182 in map_index_queries
grouped_indexers = group_indexers_by_index(obj, indexers, options)
File ~\AppData\Local\mambaforge\envs\jw\lib\site-packages\xarray\core\indexing.py:144 in group_indexers_by_index
raise KeyError(f"no index found for coordinate {key!r}")
KeyError: "no index found for coordinate 'time_float'"
Metadata
Metadata
Assignees
Type
Projects
Milestone
Relationships
Development
Select code repository
Activity
benbovy commentedon Aug 2, 2023
I'd say it is rather a feature. Despite being very similar to the original one, the new coord is not a dimension coordinate, which is thus not indexed by default so
loc
or other label-based indexing won't work with it.benbovy commentedon Aug 2, 2023
That said, we could probably make the error message a bit more user-friendly and verbose than just saying that no index is found, e.g.,
"No index found for coordinate 'x', which therefore cannot be used with
.loc
or.sel
(label-based indexing). Set an index first for that coordinate using eitherset_index
orset_xindex
. Note that you can still use.iloc
and.isel
(integer-based indexing) when no index is set."Illviljan commentedon Aug 3, 2023
Would you mind showing the intended way for my example? Because I don't find this intuitive at all.
Other tests:
benbovy commentedon Aug 3, 2023
Haha yes agreed that is confusing. So improving the error messages will require more coordination across the codebase and thinking more about the possible cases. That is not an easy task.
Regarding your example, just setting a single index for "time_float" does not work: currently you cannot call
loc
orsel
and pass labels at once for two or more independently indexed coordinates that share the same dimension (the error message mostly makes senseexcept the "dimension" term that is not accurate here). This because it is easier to raise now when such case occurs and maybe figure out later how we can merge results from different index queries along the same dimension.(My bad, Xarray has no
iloc
indeed... I'm mostly using isel/sel).I'm not sure what exactly you're trying to achieve in your example, actually.
item assignment on label selection using both the "time" and "time_float" coordinates? Then set a multi-index from both of them and call
loc
. Or maybe chain two calls toloc
each with one of these coordinates but I'm not sure at all that it will work.item assignment on label selection using only the "time" coordinate? Then be sure to not pass any mapping to
loc
that contains a key referring to an unindexed coordinate (i.e., do not include "time_float" there).Illviljan commentedon Aug 9, 2023
I simply want to modify a few values in an array. :) A similar example in numpy-land:
Now in xarray land this
ds["air_new"]
happens to have a few helper coordinates. This is what I came up with and it took longer than I'd like to admit:Is this the most elegant way? How would you have done it?
Using masks doesn't have this issue for some reason:
Why would I want to pass around helper coordinates?
ds["relative_time"] = ds.time - ds.time[0]
ds["time_float"] = ds.time.astype(float)
benbovy commentedon Aug 10, 2023
Yes helper coordinates are useful. However I don't really understand why do you want to pass labels to
.loc
for all of those coordinates (or all the indexed ones) at once?Since your helper coordinates are all derived from the same original coordinate, couldn’t you just select data using the one that is the most convenient to you ? E.g.,
This should yield the same results (and it is much more elegant) than
Both the "time" and "time_float" coordinates share the same dimension and should be subset / propagated the same way.
I doubt that you want to update values on a selection using arbitrary (possibly non-overlapping) labels for each of those coordinates, do you? E.g.,
We don’t support that (yet) because it is more complicated. How should we join the integer array indices selected on the "time" dimension from the given "time" and "time_float" labels? By union, intersection or difference? Or do we want an exact match and raise otherwise? It might get even more complicated when multiple indexes, coordinates and/or dimensions are involved. In theory, we could support it but it would represent quite a lot of work and the implementation would quickly get much more complicated.
I can imagine how just passing all coordinates like
would be convenient as (with a carefully chosen default behavior) it removes some manual steps or thinking, similarly to Xarray auto-alignment. However, after we start considering more use cases we will quickly see that there is no easy generic solution, like for alignment (#7045). Note that we also don't support alignment when multiple (independent) indexes are found along the same dimension (example here) for the same reasons.
xindex
? #9703