Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

missing dask-elem imputation for concat not working with differently sized input objects #1842

Open
3 tasks done
ilan-gold opened this issue Jan 27, 2025 · 0 comments · May be fixed by #1843
Open
3 tasks done

missing dask-elem imputation for concat not working with differently sized input objects #1842

ilan-gold opened this issue Jan 27, 2025 · 0 comments · May be fixed by #1843
Assignees
Labels
Milestone

Comments

@ilan-gold
Copy link
Contributor

ilan-gold commented Jan 27, 2025

Please make sure these conditions are met

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of anndata.
  • (optional) I have confirmed this bug exists on the master branch of anndata.

Report

Introduced by #1780
Code:

import dask.array as da
import anndata as ad

ad1 = ad.AnnData(X=np.ones((5, 10)))
ad2 = ad.AnnData(X=np.zeros((5, 5)), layers={"a": da.ones((5, 5))})

result1 = ad.concat([ad1, ad2], join="outer")

Traceback:

src/anndata/_core/merge.py:1356: in concat
    layers = concat_aligned_mapping(
src/anndata/_core/merge.py:978: in outer_concat_aligned_mapping
    result[k] = concat_arrays(
src/anndata/_core/merge.py:846: in concat_arrays
    f(x, fill_value=fill_value, axis=1 - axis)
src/anndata/_core/merge.py:528: in __call__
    return self.apply(el, axis=axis, fill_value=fill_value)
src/anndata/_core/merge.py:545: in apply
    return self._apply_to_dask_array(el, axis=axis, fill_value=fill_value)
src/anndata/_core/merge.py:568: in _apply_to_dask_array
    sub_el = _subset(el, make_slice(indexer, axis, len(shape)))
../../../miniconda3/lib/python3.12/functools.py:909: in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
src/anndata/_core/index.py:180: in _subset_dask
    return a[subset_idx]
venv/lib/python3.12/site-packages/dask/array/core.py:1999: in __getitem__
    index2 = normalize_index(index, self.shape)
venv/lib/python3.12/site-packages/dask/array/slicing.py:839: in normalize_index
    check_index(axis, i, d)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

axis = 1, ind = array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), dimension = 5

    def check_index(axis, ind, dimension):
        """Check validity of index for a given dimension
    
        Examples
        --------
        >>> check_index(0, 3, 5)
        >>> check_index(0, 5, 5)
        Traceback (most recent call last):
        ...
        IndexError: Index 5 is out of bounds for axis 0 with size 5
    
        >>> check_index(1, 6, 5)
        Traceback (most recent call last):
        ...
        IndexError: Index 6 is out of bounds for axis 1 with size 5
    
        >>> check_index(1, -1, 5)
        >>> check_index(1, -6, 5)
        Traceback (most recent call last):
        ...
        IndexError: Index -6 is out of bounds for axis 1 with size 5
    
        >>> check_index(0, [1, 2], 5)
        >>> check_index(0, [6, 3], 5)
        Traceback (most recent call last):
        ...
        IndexError: Index is out of bounds for axis 0 with size 5
    
        >>> check_index(1, slice(0, 3), 5)
    
        >>> check_index(0, [True], 1)
        >>> check_index(0, [True, True], 3)
        Traceback (most recent call last):
        ...
        IndexError: Boolean array with size 2 is not long enough for axis 0 with size 3
        >>> check_index(0, [True, True, True], 1)
        Traceback (most recent call last):
        ...
        IndexError: Boolean array with size 3 is not long enough for axis 0 with size 1
        """
        if isinstance(ind, list):
            ind = np.asanyarray(ind)
    
        # unknown dimension, assumed to be in bounds
        if np.isnan(dimension):
            return
        elif is_dask_collection(ind):
            return
        elif is_arraylike(ind):
            if ind.dtype == bool:
                if ind.size != dimension:
                    raise IndexError(
                        f"Boolean array with size {ind.size} is not long enough "
                        f"for axis {axis} with size {dimension}"
                    )
            elif (ind >= dimension).any() or (ind < -dimension).any():
>               raise IndexError(
                    f"Index is out of bounds for axis {axis} with size {dimension}"
E                   IndexError: Index is out of bounds for axis 1 with size 5

venv/lib/python3.12/site-packages/dask/array/slicing.py:902:  @IndexError

Versions

session_info    1.0.0
----    ----
parso   0.8.4
stdlib-list     0.11.0
rich    13.9.4
h5py    3.12.1
pandas  2.2.3
traitlets       5.14.3
msgpack 1.1.0
setuptools      75.8.0
prompt_toolkit  3.0.50
legacy-api-wrap 1.4.1
hatchling       1.27.0
toolz   1.0.0
zarr    2.18.4
stack-data      0.6.3
setuptools-scm  8.1.0
jaraco.collections      5.1.0
Jinja2  3.1.5
pyarrow 19.0.0
pure_eval       0.2.3
asttokens       3.0.0
cloudpickle     3.1.1
python-dateutil 2.9.0.post0
scipy   1.15.1
numcodecs       0.15.0
wcwidth 0.2.13
asciitree       0.3.3
fsspec  2024.12.0
wrapt   1.17.2
MarkupSafe      3.0.2
natsort 8.4.0
jaraco.context  5.3.0
pluggy  1.5.0
charset-normalizer      3.4.1
session-info2   0.1.2
pytz    2024.2
jedi    0.19.2
awkward_cpp     43
more-itertools  10.3.0
ipython 8.31.0
jaraco.text     3.12.1
pathspec        0.12.1
packaging       24.2
numpy   2.1.3
psutil  6.1.1
jaraco.functools        4.0.1
Deprecated      1.2.16
PyYAML  6.0.2
executing       2.2.0
Pygments        2.19.1
attrs   25.1.0
dask    2025.1.0
awkward 2.7.2
tblib   3.0.0
six     1.17.0
hatch-vcs       0.4.0
decorator       5.1.1
crc32c  2.7.1
----    ----
Python  3.12.3 | packaged by Anaconda, Inc. | (main, May  6 2024, 14:46:42) [Clang 14.0.6 ]
OS      macOS-15.1-arm64-arm-64bit
Updated 2025-01-27 15:53
@ilan-gold ilan-gold added this to the 0.11.4 milestone Jan 27, 2025
@ilan-gold ilan-gold self-assigned this Jan 27, 2025
@ilan-gold ilan-gold linked a pull request Jan 27, 2025 that will close this issue
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant