Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to turn off CIs for check_model? #642

Open
jbohenek opened this issue Oct 25, 2023 · 4 comments
Open

How to turn off CIs for check_model? #642

jbohenek opened this issue Oct 25, 2023 · 4 comments
Labels
3 investigators ❔❓ Need to look further into this issue

Comments

@jbohenek
Copy link

jbohenek commented Oct 25, 2023

I am using check_model() for a course, and it's a wonderful teaching tool. However, sometimes the confidence interval bands generated from the LOESS fit from geom_smooth() are ridiculously large, thereby expanding axes and making any potential pattern in the residuals unnoticeable. Is there a quick and easy way to turn off CIs so the y-axis behaves? I wasn't able to a discover a quick fix to this besides extracting components from check_model() and replotting with geom_smooth(se=F) or forcing it linear with geom_smooth(method="lm").

reprex:

library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 4.2.3
#> Warning: package 'ggplot2' was built under R version 4.2.3
#> Warning: package 'tibble' was built under R version 4.2.3
#> Warning: package 'readr' was built under R version 4.2.3
#> Warning: package 'purrr' was built under R version 4.2.3
#> Warning: package 'dplyr' was built under R version 4.2.3
#> Warning: package 'lubridate' was built under R version 4.2.3
library(easystats)
#> Warning: package 'easystats' was built under R version 4.2.3
#> # Attaching packages: easystats 0.6.0 (red = needs update)
#> ✔ bayestestR  0.13.1   ✔ correlation 0.8.4 
#> ✔ datawizard  0.9.0    ✔ effectsize  0.8.6 
#> ✖ insight     0.19.5   ✔ modelbased  0.8.6 
#> ✔ performance 0.10.5   ✔ parameters  0.21.2
#> ✔ report      0.5.7    ✔ see         0.8.0 
#> 
#> Restart the R-Session and update packages in red with `easystats::easystats_update()`.
df<-read_csv("https://raw.githubusercontent.com/jbohenek/biol_5130/main/viagra_data.csv") |> mutate(dose=factor(dose))
#> Rows: 30 Columns: 3
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> dbl (3): dose, libido, partnerLibido
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

lm(libido ~ dose, data=df) |> 
  check_model()

Created on 2023-10-25 with reprex v2.0.2

@strengejacke strengejacke added the 3 investigators ❔❓ Need to look further into this issue label Oct 25, 2023
@strengejacke
Copy link
Member

strengejacke commented Feb 16, 2024

@IndrajeetPatil We could add a ci argument or similar to performance::check_model(), save as attribute, and then in see we would set SE = FALSE in geom_smooth().

@bwiernik
Copy link
Contributor

That would be good.

I wonder if we should also try to detect if number of discrete fitted values is small/all the predictors are categorical and then omit the linearity plot and change the homogeneity plot to be something better for categorical regression ?

@jbohenek
Copy link
Author

jbohenek commented Feb 17, 2024

Upon further testing, it's not always just the SE that causes issues with visualization. Sometimes the LOESS curve bends in odd ways between discrete values that produces the same effect as seen above. So in addition to SE=F, maybe also method="lm"? I know a linear fit isn't ideal when evaluating these things, but it's better than nothing when the y-axis goes haywire. For example, see the check_model() outlier plot of a 2x2 factorial anova below (this can of course happen with any of the plots with a LOESS curve).


library(tidyverse)
library(easystats)
#> Warning: package 'easystats' was built under R version 4.2.3
#> # Attaching packages: easystats 0.7.0
#> ✔ bayestestR  0.13.2   ✔ correlation 0.8.4 
#> ✔ datawizard  0.9.1    ✔ effectsize  0.8.6 
#> ✔ insight     0.19.8   ✔ modelbased  0.8.7 
#> ✔ performance 0.10.9   ✔ parameters  0.21.5
#> ✔ report      0.5.8    ✔ see         0.8.2
df<-read_csv("https://raw.githubusercontent.com/jbohenek/biol_5130/main/opsin.csv")
#> Rows: 33 Columns: 3
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: ","
#> chr (2): population, water
#> dbl (1): sws1
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
fit<-lm(sws1 ~ water*population, data=df)
check_model(fit)

image
Created on 2024-02-17 with reprex v2.1.0

@bwiernik
Copy link
Contributor

We should probably just have alternative visualizations for categorical models. The current plots really only work for models with continuous predictors so that there are numerous fitted values on the x axis

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 investigators ❔❓ Need to look further into this issue
Projects
None yet
Development

No branches or pull requests

3 participants