Compact printing in cut #381

bkamins · 2022-02-25T14:15:32Z

Currently we have:

julia> cut(1:10, 3)
10-element CategoricalArray{String,1,UInt32}:
 "Q1: [1.0, 4.0)"
 "Q1: [1.0, 4.0)"
 "Q1: [1.0, 4.0)"
 "Q2: [4.0, 6.999999999999999)"
 "Q2: [4.0, 6.999999999999999)"
 "Q2: [4.0, 6.999999999999999)"
 "Q3: [6.999999999999999, 10.0]"
 "Q3: [6.999999999999999, 10.0]"
 "Q3: [6.999999999999999, 10.0]"
 "Q3: [6.999999999999999, 10.0]"

which is not nice. It would be better to use compact printing.

Though we should make sure to correctly do this case:

julia> cut(1:10^-12:1+10^-11, 3)
11-element CategoricalArray{String,1,UInt32}:
 "Q1: [1.0, 1.0000000000033333)"
 "Q1: [1.0, 1.0000000000033333)"
 "Q1: [1.0, 1.0000000000033333)"
 "Q1: [1.0, 1.0000000000033333)"
 "Q2: [1.0000000000033333, 1.0000000000066667)"
 "Q2: [1.0000000000033333, 1.0000000000066667)"
 "Q2: [1.0000000000033333, 1.0000000000066667)"
 "Q3: [1.0000000000066667, 1.00000000001]"
 "Q3: [1.0000000000066667, 1.00000000001]"
 "Q3: [1.0000000000066667, 1.00000000001]"
 "Q3: [1.0000000000066667, 1.00000000001]"

andreasnoack · 2024-08-14T07:34:26Z

I just hit a similar case hwere the many digits made plotting labels look ugly. What about supporting rounding in the cut(array, ngroups) method instead of just changing the printing? I guess it might be confusing if the string here don't reflect the actual cuts.

nalimilan · 2024-12-29T18:57:12Z

This is tricky. Rounding could easily change radically the size of the groups if values are very close. AFAIK other packages don't do that. What R's cut does is that it tries with 3 digits by default, and if some breaks end up represented the same it increases the number of digits up to 12. This can be tweaked via the dig.lab argument. This sounds reasonable to me at least for quantiles, as you probably don't care about the exact value in that case. OTOH it could be more surprising when specifying breaks manually.

At the very least we could add that dig.lab argument.

andreasnoack · 2024-12-30T12:52:55Z

Rounding could easily change radically the size of the groups if values are very close.

I don't see the issue if it's just an option. Then the user can decide if the rounding is acceptable.

nalimilan · 2024-12-30T13:01:24Z

If you want quantiles and rouding ends up creating classes with different sizes, you no longer get quantiles. ;-) So the printing "Q1: ..." would be misleading. Rounding only the display seems less dramatic. Though of course we could support both as options.

andreasnoack · 2024-12-30T14:30:50Z

The Qx label doesn't specify the probability of the quantile anyway. Actually, I'd prefer a label without Qx part. I'm mostly interested in the interval information. It is easy to incorrectly think that Q1 is the first quartile.

nalimilan · 2024-12-30T16:57:50Z

I added Qx because when allowempty=true some intervals may be identical (in the presence of many duplicates), but we can't have levels with equal names. But that's a corner case so we could stop doing that by default. Still, returning quantiles that are not real quantiles due to rounding would be confusing IMO.

nalimilan · 2024-12-30T20:42:32Z

This is related to a discussion we had in 2020 at #245. @bkamins Do you have an opinion? I see several options:

Keep adding Qx.
Provide a formatter that adds Qx and advise using it when we throw an error because of duplicate intervals.
Support an argument doing that.
Automatically use that formatter by default when allowempty=true is passed.

bkamins · 2024-12-30T21:33:52Z

Given the discussion - I think adding an argument is most flexible.

nalimilan · 2024-12-30T21:45:45Z

Can you elaborate why you prefer an argument over a formatter? I find it hard to decide which is best.

bkamins · 2024-12-31T06:19:16Z

Ah - now I understand an issue. I understand that you want to predefine a function that you would pass to labels and it would provide a different way of handling this. I think this is OK. For me the key thing is to give some predefined to a user (i.e. do not just say that user could do this themselves by writing a proper formatte function)

nalimilan mentioned this issue Dec 30, 2024

another take at cut #314

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compact printing in cut #381

Compact printing in cut #381

bkamins commented Feb 25, 2022 •

edited by andreasnoack

Loading

andreasnoack commented Aug 14, 2024

nalimilan commented Dec 29, 2024

andreasnoack commented Dec 30, 2024

nalimilan commented Dec 30, 2024

andreasnoack commented Dec 30, 2024

nalimilan commented Dec 30, 2024

nalimilan commented Dec 30, 2024

bkamins commented Dec 30, 2024 •

edited

Loading

nalimilan commented Dec 30, 2024

bkamins commented Dec 31, 2024

Compact printing in cut #381

Compact printing in cut #381

Comments

bkamins commented Feb 25, 2022 • edited by andreasnoack Loading

andreasnoack commented Aug 14, 2024

nalimilan commented Dec 29, 2024

andreasnoack commented Dec 30, 2024

nalimilan commented Dec 30, 2024

andreasnoack commented Dec 30, 2024

nalimilan commented Dec 30, 2024

nalimilan commented Dec 30, 2024

bkamins commented Dec 30, 2024 • edited Loading

nalimilan commented Dec 30, 2024

bkamins commented Dec 31, 2024

bkamins commented Feb 25, 2022 •

edited by andreasnoack

Loading

bkamins commented Dec 30, 2024 •

edited

Loading