Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG-REPORT] df.count error with selections and no limits #2151

Open
arunpersaud opened this issue Aug 4, 2022 · 2 comments · May be fixed by #2152
Open

[BUG-REPORT] df.count error with selections and no limits #2151

arunpersaud opened this issue Aug 4, 2022 · 2 comments · May be fixed by #2152
Labels

Comments

@arunpersaud
Copy link
Contributor

Description
df.count results in an error when called with a list of selections and no limits, but works when limits are given.

Software information

  • Vaex version (import vaex; vaex.__version__):
    {'vaex': '4.11.1', 'vaex-core': '4.11.1', 'vaex-viz': '0.5.2', 'vaex-hdf5': '0.12.3', 'vaex-server': '0.8.1', 'vaex-astro': '0.9.1', 'vaex-jupyter': '0.8.0', 'vaex-ml': '0.18.0'}
  • Vaex was installed via: pip
  • OS: OS X 12.4 (M1)

Additional information
Here is some example code:

import vaex as vx
df = vx.example()

# this works
hist_x = df.count("*", binby="x", shape=1024, selection=None)

# this gives an IndexError (see below
hist_x = df.count("*", binby="x", shape=1024, selection=[None])

# this works again
hist_x = df.count("*", binby="x", shape=1024, selection=[None], limits=(1, 10))

The error I'm getting is:

>>> hist_x = df.count("*", binby="x", shape=1024, selection=[None])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/homebrew/lib/python3.9/site-packages/vaex/dataframe.py", line 965, in count
    return self._compute_agg('count', expression, binby, limits, shape, selection, delay, edges, progress, array_type=array_type)
  File "/opt/homebrew/lib/python3.9/site-packages/vaex/dataframe.py", line 939, in _compute_agg
    return self._delay(delay, progressbar.exit_on(var))
  File "/opt/homebrew/lib/python3.9/site-packages/vaex/dataframe.py", line 1779, in _delay
    return task.get()
  File "/opt/homebrew/lib/python3.9/site-packages/aplus/__init__.py", line 170, in get
    raise self._reason
  File "/opt/homebrew/lib/python3.9/site-packages/vaex/promise.py", line 121, in callAndReject
    ret.fulfill(failure(r))
  File "/opt/homebrew/lib/python3.9/site-packages/vaex/progress.py", line 91, in error
    raise arg
  File "/opt/homebrew/lib/python3.9/site-packages/vaex/promise.py", line 121, in callAndReject
    ret.fulfill(failure(r))
  File "/opt/homebrew/lib/python3.9/site-packages/vaex/delayed.py", line 38, in _wrapped
    raise exc
  File "/opt/homebrew/lib/python3.9/site-packages/vaex/promise.py", line 121, in callAndReject
    ret.fulfill(failure(r))
  File "/opt/homebrew/lib/python3.9/site-packages/vaex/delayed.py", line 38, in _wrapped
    raise exc
  File "/opt/homebrew/lib/python3.9/site-packages/vaex/promise.py", line 121, in callAndReject
    ret.fulfill(failure(r))
  File "/opt/homebrew/lib/python3.9/site-packages/vaex/delayed.py", line 38, in _wrapped
    raise exc
  File "/opt/homebrew/lib/python3.9/site-packages/vaex/promise.py", line 106, in callAndFulfill
    ret.fulfill(success(v))
  File "/opt/homebrew/lib/python3.9/site-packages/vaex/delayed.py", line 82, in call
    return f(*args_real, **kwargs_real)
  File "/opt/homebrew/lib/python3.9/site-packages/vaex/dataframe.py", line 5575, in create_binner
    return self._binner_scalar(expression, limits, shape)
  File "/opt/homebrew/lib/python3.9/site-packages/vaex/dataframe.py", line 5581, in _binner_scalar
    return BinnerScalar(expression, limits[0], limits[1], shape, dtype)
IndexError: index 1 is out of bounds for axis 0 with size 1
@JovanVeljanoski
Copy link
Member

JovanVeljanoski commented Aug 4, 2022

Well.. i can admit that this is technically a bug.. but this is also you abusing the system..

In principle selection = [None] should the be same as selection=None I suppose.

Edit: although I see that count crashes for any list of selections without limits.. ok let's see if we can fix it. Thank you for the report!

@arunpersaud
Copy link
Contributor Author

I just used selection=[None] as an example, a more realistic one would perhaps be

# works
df.count("*", binby="x", shape=1024, selection=[df.x>0, df.x<0], limits=[-10,10])

# doesn't work
df.count("*", binby="x", shape=1024, selection=[df.x>0, df.x<0])

The error for the second one that I'm getting is:

>>> df.count("*", binby="x", shape=1024, selection=[df.x>0, df.x<0])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/arun/.local/lib/python3.8/site-packages/vaex/dataframe.py", line 962, in count
    return self._compute_agg('count', expression, binby, limits, shape, selection, delay, edges, progress, array_type=array_type)
  File "/home/arun/.local/lib/python3.8/site-packages/vaex/dataframe.py", line 936, in _compute_agg
    return self._delay(delay, progressbar.exit_on(var))
  File "/home/arun/.local/lib/python3.8/site-packages/vaex/dataframe.py", line 1775, in _delay
    self.execute()
  File "/home/arun/.local/lib/python3.8/site-packages/vaex/dataframe.py", line 417, in execute
    self.executor.execute()
  File "/home/arun/.local/lib/python3.8/site-packages/vaex/execution.py", line 308, in execute
    for _ in self.execute_generator():
  File "/home/arun/.local/lib/python3.8/site-packages/vaex/execution.py", line 345, in execute_generator
    tasks = _merge(tasks)
  File "/home/arun/.local/lib/python3.8/site-packages/vaex/execution.py", line 137, in _merge
    tasks_merged.extend(_merge_tasks_for_df(tasks_df, df))
  File "/home/arun/.local/lib/python3.8/site-packages/vaex/execution.py", line 151, in _merge_tasks_for_df
    tasks_agg_per_grid[task.binners].append(task)
  File "/home/arun/.local/lib/python3.8/site-packages/vaex/dataframe.py", line 7199, in __hash__
    return hash((self.__class__.__name__, self.expression, self.minimum, self.maximum, self.count, self.dtype))
TypeError: unhashable type: 'numpy.ndarray'

@JovanVeljanoski JovanVeljanoski linked a pull request Aug 4, 2022 that will close this issue
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants