Why result different between groupby and count? #1989

tommyhj217 · 2022-03-25T12:35:34Z

tommyhj217
Mar 25, 2022

Hi

Thanks for make surprise lib.

I tested count() function and compare with group by result.

I used below data.
taxi_path = 's3://vaex/taxi/yellow_taxi_2012_zones.hdf5?anon=true'

The other values are all the same, but the last value of each of the last values is output as zero.
Is there a way to solve this problem?

Thanks
Hyunjun

Answered by maartenbreddels

Apr 11, 2022

This is expected, since binby will use regular bins between a min and max, with half open intervals, e.g. [0, 1), ... [5, 6).
Then the bins is calculated as int(data / (max - min) * shape) (e.g. 6 => int(6 / 6 * 7) = int(7.0) = 7, which falls out of the last bin, and 5 => int(5/7*7) = int(5/6*8) = int(5.8333) = 5. So no data falls in bin 6.

If you pass in the limits, e.g.:

df.count('*', binby=['pickup_hour', 'pickup_day'], shape=[24,7], limits=([-0.5, 23.5], [-0.5, 6.5]))

You get:

Does that make sense?

View full answer

maartenbreddels · 2022-04-11T11:40:13Z

maartenbreddels
Apr 11, 2022
Maintainer

This is expected, since binby will use regular bins between a min and max, with half open intervals, e.g. [0, 1), ... [5, 6).
Then the bins is calculated as int(data / (max - min) * shape) (e.g. 6 => int(6 / 6 * 7) = int(7.0) = 7, which falls out of the last bin, and 5 => int(5/7*7) = int(5/6*8) = int(5.8333) = 5. So no data falls in bin 6.

If you pass in the limits, e.g.:

df.count('*', binby=['pickup_hour', 'pickup_day'], shape=[24,7], limits=([-0.5, 23.5], [-0.5, 6.5]))

You get:

Does that make sense?

1 reply

tommyhj217 Jul 8, 2022
Author

Thanks for reply my question.
It seems to be one of the best tools available on a single computer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why result different between groupby and count? #1989

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Why result different between groupby and count? #1989

tommyhj217 Mar 25, 2022

Replies: 1 comment · 1 reply

maartenbreddels Apr 11, 2022 Maintainer

tommyhj217 Jul 8, 2022 Author

tommyhj217
Mar 25, 2022

Replies: 1 comment 1 reply

maartenbreddels
Apr 11, 2022
Maintainer

tommyhj217 Jul 8, 2022
Author