Skip to content

Why result different between groupby and count? #1989

Answered by maartenbreddels
tommyhj217 asked this question in Q&A
Discussion options

You must be logged in to vote

This is expected, since binby will use regular bins between a min and max, with half open intervals, e.g. [0, 1), ... [5, 6).
Then the bins is calculated as int(data / (max - min) * shape) (e.g. 6 => int(6 / 6 * 7) = int(7.0) = 7, which falls out of the last bin, and 5 => int(5/7*7) = int(5/6*8) = int(5.8333) = 5. So no data falls in bin 6.

If you pass in the limits, e.g.:

df.count('*', binby=['pickup_hour', 'pickup_day'], shape=[24,7], limits=([-0.5, 23.5], [-0.5, 6.5]))

You get:

Does that make sense?

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@tommyhj217
Comment options

Answer selected by tommyhj217
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants