Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with summary values if exponentials are used #1004

Open
danyx23 opened this issue Mar 13, 2020 · 3 comments
Open

Problem with summary values if exponentials are used #1004

danyx23 opened this issue Mar 13, 2020 · 3 comments

Comments

@danyx23
Copy link

danyx23 commented Mar 13, 2020

Hey there! I really like guesstimate and I use it to estimate some values related to the current coronavirus outbreak. I noticed however, that if you use a value from a distribution as an exponent that the summary values of that cell are not the correct values form the 50th, 5th and 95th percentile.

To reproduce: create a cell (10 to 20). Create another cell referencing this cell with (2^othercell). Check the summary values against the percentils in the expanded view. Below is a screenshot.

Let me know if I can help in fixing this. Thanks for a great app!

Screenshot_2020-03-13 Rough model to estimate true SARS-COV-2 infections from deaths for a given region (read the reasoning

@danyx23
Copy link
Author

danyx23 commented Mar 13, 2020

I realized that what is shown in the big summary number is the mean, not the median. I think the median is the better number in this case - would you agree?

@OAGr
Copy link
Member

OAGr commented Mar 13, 2020

Hi danyx23!

Glad to hear you like Guesstimate.

Correct that the number is the mean. I'm really not sure the median is better. In many cases people are interested in expected values and similar. I think the mean is generally more preferable, minus the fact that in cases with long tails (like the one above), the mean is pretty random, which is unfortunate.

Changing it without a big announcement would also be quite confusing to users who are used to it as is, so I'm hesitant to do so.

I recommend trying to organize things to not use long tails, Guesstimate doesn't handle them particularly well at this point.

@danyx23
Copy link
Author

danyx23 commented Mar 14, 2020

Thanks for your quick reply! I thought a bit more about this and I'm pretty sure the median is the more correct number to report - for gaussians etc it doesn't matter much but for skewed distributions the expected value is the one in the middle of all samples which is the median.

I can understand a hesitation to make this change for all old guesstimate documents, but maybe you could add a setting per guesstimate document that indicates which value is show? Or at least a preference if the scientific view should be enabled by default if people view this document?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants