Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for influential observations of GLM w/o numeric variables #735

Open
arodionoff opened this issue Jun 25, 2024 · 0 comments
Open

Check for influential observations of GLM w/o numeric variables #735

arodionoff opened this issue Jun 25, 2024 · 0 comments
Labels
3 investigators ❔❓ Need to look further into this issue Bug 🐛 Something isn't working

Comments

@arodionoff
Copy link

Your performance::check_outliers() function does not allow you to check for influential observations in a logistic regression, the objective function of which is a factor, and there is not a single numeric feature among the predictors.

If there are numerical variable (at least as a target), then everything is fine:

# install.packages(c("smbinning", "randomForest", "performance"))
# Load library and its dataset
library(smbinning)
# Sampling
pop=smbsimdf1 # Population
train=subset(pop,rnd<=0.7) # Training sample
# Generate binning object to generate variables
smbcbs1=smbinning(train,x="cbs1",y="fgood")
smbcbinq=smbinning.factor(train,x="cbinq",y="fgood")
pop=smbinning.gen(pop,smbcbs1,"g1cbs1")
pop=smbinning.factor.gen(pop,smbcbinq,"g1cbinq")
# Resample
train=subset(pop,rnd<=0.7) # Training sample
test=subset(pop,rnd>0.7) # Testing sample
# Run logistic regression with factors
modlogisticsmb=glm(fgood ~ cbinq + cbterm + inc, data = train, family = binomial())
summary(modlogisticsmb)

library(performance)
plot( performance::check_outliers(modlogisticsmb) )

3c14350b-a0b4-45a6-93cd-b221f5255073

But as soon as there are no more of them left, replacing them with a factor, we get an error:

train$fgood <- as.factor(train$fgood)
# Run logistic regression with factors
modlogisticsmb=glm(fgood ~ cbinq + cbterm + inc, data = train, family = binomial())
summary(modlogisticsmb)

# Error in performance::check_outliers()
plot( performance::check_outliers(modlogisticsmb) )

Error: No numeric variables found. No data to check for outliers.

However, such an analysis can be carried out by calling the performance::check_model function:

performance::check_model(modlogisticsmb, check = c('outliers'), residual_type = 'normal')

b206ddd8-07e5-4e28-8483-96bdebdd9696

The only annoying thing is that in this case the graph appears only on the left side of the screen, and not on the entire screen.

@strengejacke strengejacke added Bug 🐛 Something isn't working 3 investigators ❔❓ Need to look further into this issue labels Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 investigators ❔❓ Need to look further into this issue Bug 🐛 Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants