You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I just ran into this somewhat obscure bug: If you use integers to label your sets, under certain circumstances, UpSet.plot() returns wrong intersection sizes. I tracked the issue down to the reformat._aggregate_data function and how it handles integer set names. Here's a minimal example:
Notice how set 2 has no overlap with any of the other sets, yet agg reports 3 items shared between 2 and 20. Also it wrongly reports 2 items exclusive to set 10. Note: If you change the set names to strings, i.e. '2','10','20', it works out fine.
It all has to do with the fact that I used integers to label the set, and in particular one of them (2) is $\le$ the number of sets present.
Here's the relevant line in reformat._aggregate_data:
We're grouping by level=[0,1,2]. Notice how 2 is ambiguous here: It's supposed to refer to the level (in this case the 3rd set, i.e. set 20), but it is ALSO the name of a set (the 1st set, i.e. set2)! The way groupby() seems to work is to give priority to the setname, rather than the level, and we're basically intersecting the set with itself.
Two options to fix this:
disallow integer set names.
fix the groupby operation, e.g. groupby() on actual column names, rather than level indices:
Hi,
I just ran into this somewhat obscure bug: If you use integers to label your sets, under certain circumstances,
UpSet.plot()
returns wrong intersection sizes. I tracked the issue down to thereformat._aggregate_data
function and how it handles integer set names. Here's a minimal example:Notice how set 2 has no overlap with any of the other sets, yet
agg
reports 3 items shared between 2 and 20. Also it wrongly reports 2 items exclusive to set 10. Note: If you change the set names to strings, i.e.'2','10','20'
, it works out fine.It all has to do with the fact that I used integers to label the set, and in particular one of them (2) is$\le$ the number of sets present.
Here's the relevant line in
reformat._aggregate_data
:We're grouping by
level=[0,1,2]
. Notice how2
is ambiguous here: It's supposed to refer to the level (in this case the 3rd set, i.e. set 20), but it is ALSO the name of a set (the 1st set, i.e. set2)! The waygroupby()
seems to work is to give priority to the setname, rather than the level, and we're basically intersecting the set with itself.Two options to fix this:
groupby
operation, e.g. groupby() on actual column names, rather than level indices:Not entirely sure if 2) would cause any trouble with the other functionality (weighted aggregates, summing categories etc).
The text was updated successfully, but these errors were encountered: