Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

overriding coordinates with assign_coords does not drop indexes #8056

Closed
4 tasks done
rj678 opened this issue Aug 8, 2023 · 5 comments · Fixed by #8094
Closed
4 tasks done

overriding coordinates with assign_coords does not drop indexes #8056

rj678 opened this issue Aug 8, 2023 · 5 comments · Fixed by #8094

Comments

@rj678
Copy link

rj678 commented Aug 8, 2023

What happened?

hello - I have the code snippet shared below that was tested a while back with version 2022.3:

with the current version of xarray, it throws the following error on the last line of the snippet:

xarray ValueError: cannot re-index or align objects with conflicting indexes found for the following dimensions

I'm able to run the code with version 2022.3, but I've recently started using xarray and it'd help me greatly if someone would be kind enough to help with the required edit to the code so I can use the current version of xarray

I'm running the code on some local sample data that I have shared below, but I believe this error is pretty general and should be reproducible with a different sample dataset - some folks have encountered a similar error before:

#6881

please let me know if I can provide any further information - thank you!

What did you expect to happen?

I expected/would like the code to run with the most recent version of xarray the way it does with version 2022.3

Minimal Complete Verifiable Example

import xarray as xr

BOX = [-45, -15, 55, 65]
month = 2

# file available here: https://github.com/iuryt/NorthAtlanticBloom/blob/main/data/external/bioargo/GL_PR_PF_6901647.nc

fname = './temp.nc' 

dsi = xr.open_dataset(fname)
dsi = dsi.assign_coords(LONGITUDE=("TIME",dsi.LONGITUDE.values), LATITUDE=("TIME",dsi.LATITUDE.values))
dsi = dsi.drop("POSITION_QC")

dsi = dsi.where(dsi.TIME.dt.month==month, drop=True)[["CPHL_ADJUSTED", "PRES"]]

MVCE confirmation

  • Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
  • Complete example — the example is self-contained, including all data and the text of any traceback.
  • Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
  • New issue — a search of GitHub Issues suggests this is not a duplicate.

Relevant log output

ValueError                                Traceback (most recent call last)
Cell In[1], line 12
      9 dsi = dsi.assign_coords(LONGITUDE=("TIME",dsi.LONGITUDE.values), LATITUDE=("TIME",dsi.LATITUDE.values))
     10 dsi = dsi.drop("POSITION_QC")
---> 12 dsi = dsi.where(dsi.TIME.dt.month==month, drop=True)[["CPHL_ADJUSTED", "PRES"]]

File ~/anaconda3/envs/env_1/lib/python3.9/site-packages/xarray/core/common.py:1125, in DataWithCoords.where(self, cond, other, drop)
   1120 if not isinstance(cond, (Dataset, DataArray)):
   1121     raise TypeError(
   1122         f"cond argument is {cond!r} but must be a {Dataset!r} or {DataArray!r}"
   1123     )
-> 1125 self, cond = align(self, cond)  # type: ignore[assignment]
   1127 def _dataarray_indexer(dim: Hashable) -> DataArray:
   1128     return cond.any(dim=(d for d in cond.dims if d != dim))

File ~/anaconda3/envs/env_1/lib/python3.9/site-packages/xarray/core/alignment.py:787, in align(join, copy, indexes, exclude, fill_value, *objects)
    591 """
    592 Given any number of Dataset and/or DataArray objects, returns new
    593 objects with aligned indexes and dimension sizes.
   (...)
    777 
    778 """
    779 aligner = Aligner(
    780     objects,
...
ValueError: cannot re-index or align objects with conflicting indexes found for the following dimensions: 'TIME' (3 conflicting indexes)
Conflicting indexes may occur when
- they relate to different sets of coordinate and/or dimension names
- they don't have the same type
- they may be used to reindex data along common dimensions

Anything else we need to know?

the netCDF file can be downloaded from here:

https://github.com/iuryt/NorthAtlanticBloom/blob/main/data/external/bioargo/GL_PR_PF_6901647.nc

Environment

INSTALLED VERSIONS

commit: None
python: 3.9.16 | packaged by conda-forge | (main, Feb 1 2023, 21:39:03)
[GCC 11.3.0]
python-bits: 64
OS: Linux
OS-release: 5.15.90.1-microsoft-standard-WSL2
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.1
libnetcdf: 4.9.2

xarray: 2023.7.0
pandas: 2.0.3
numpy: 1.25.2
scipy: 1.11.1
netCDF4: 1.6.4
pydap: None
h5netcdf: None
...
pytest: None
mypy: None
IPython: 8.8.0
sphinx: None

@rj678 rj678 added bug needs triage Issue that has not been reviewed by xarray team member labels Aug 8, 2023
@benbovy
Copy link
Member

benbovy commented Aug 9, 2023

Problem

For some reason (bug?),

dsi = dsi.assign_coords(LONGITUDE=("TIME",dsi.LONGITUDE.values), LATITUDE=("TIME",dsi.LATITUDE.values))

keeps an index for both the LONGITUDE and LATITUDE coordinates.

Then, your example doesn't work because xarray currently doesn't support aligning two datasets that have multiple (independent) indexed coordinates sharing common dimensions (cf. the error message "they may be used to reindex data along common dimensions").

Workaround

You could drop the indexes for LONGITUDE and LATITUDE first to make it work:

dsi = xr.open_dataset(fname)
dsi = dsi.assign_coords(LONGITUDE=("TIME",dsi.LONGITUDE.values), LATITUDE=("TIME",dsi.LATITUDE.values))
dsi = dsi.drop("POSITION_QC")
dsi = dsi.drop_indexes(["LATITUDE", "LONGITUDE"])  # drop the indexes here!

dsi.where(dsi.TIME.dt.month==month, drop=True)[["CPHL_ADJUSTED", "PRES"]]

Smaller example

Without any data to load from a file.

ds = xr.Dataset(coords={"x": [1, 2, 3], "y": [4, 5, 6]})
ds = ds.assign_coords(y=("x", ds.y.values))

ds
# <xarray.Dataset>
# Dimensions:  (x: 3)
# Coordinates:
#   * x        (x) int64 1 2 3
#   * y        (x) int64 4 5 6
# Data variables:
#     *empty*

ds.xindexes
# Indexes:
#     x        PandasIndex
#     y        PandasIndex

xr.align(ds, ds)
# ValueError: cannot re-index or align objects with conflicting indexes found for the following dimensions: 'x' (2 conflicting indexes)
# -> x and y are both indexed coordinates along the x dimension!

@benbovy benbovy changed the title Request to fix breaking changes post version 2022.3 overriding coordinates with assign_coords does not drop indexes Aug 9, 2023
@benbovy benbovy added topic-indexing and removed needs triage Issue that has not been reviewed by xarray team member labels Aug 9, 2023
@rj678
Copy link
Author

rj678 commented Aug 9, 2023

thank you so much for the detailed response and the workaround @benbovy - I'll try your suggestion

do you have any quick thoughts on why this is only throwing an error post version 2022.3?

I'm going to spend more time working with xarray and will hopefully get more familiar with not just using it, but the source code as well

@benbovy
Copy link
Member

benbovy commented Aug 9, 2023

do you have any quick thoughts on why this is only throwing an error post version 2022.3?

Because of a big refactor on how we deal with indexes in Xarray, Before that version it was not possible to have multiple indexed coordinates along the same dimension (only one indexed coordinate - and potentially many non-indexed coordinates - was allowed for each dimension).

It is still not fully documented a bit rough around the edges, though.

@rj678
Copy link
Author

rj678 commented Aug 9, 2023

thank you again for your feedback - I'll spend some time to understand this better

@rj678
Copy link
Author

rj678 commented Aug 29, 2023

thank you for fixing this issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants