Skip to content

compare_performance: logloss and score issue #917

Description

@LIHO42

It seems that there are two related bugs affecting compare_performance() and performance_score().

  • Bug 1: logloss is ranked in wrong direction:
    Currently the best model (lowest log-loss) receives the lowest Performance_Score and the worst model receives 100%.
    logloss should be added to the score with opposite order (like RMSE)

reprex:

  library(performance)
m1 <- glm(vs ~ 1 , data = mtcars, family = "binomial")
m2 <- glm(vs ~ wt , data = mtcars, family = "binomial")
m3 <- glm(vs ~ wt + mpg, data = mtcars, family = "binomial")
#correct order:
performance::compare_performance(m1,m2,m3, metrics="RMSE",rank = TRUE)
#> # Comparison of Model Performance Indices
#> 
#> Name | Model |  RMSE | Performance-Score
#> ----------------------------------------
#> m3   |   glm | 0.359 |           100.00%
#> m2   |   glm | 0.410 |            62.75%
#> m1   |   glm | 0.496 |             0.00%
performance::compare_performance(m1,m2,m3, metrics="R2",rank = TRUE)
#> # Comparison of Model Performance Indices
#> 
#> Name | Model | Tjur's R2 | Performance-Score
#> --------------------------------------------
#> m3   |   glm |     0.478 |           100.00%
#> m2   |   glm |     0.328 |            68.59%
#> m1   |   glm |     0.000 |             0.00%
#wrong order:
performance::compare_performance(m1,m2,m3, metrics="LOGLOSS",rank = TRUE)
#> # Comparison of Model Performance Indices
#> 
#> Name | Model | Log_loss | Performance-Score
#> -------------------------------------------
#> m1   |   glm |    0.685 |           100.00%
#> m2   |   glm |    0.490 |            32.69%
#> m3   |   glm |    0.395 |             0.00%

Created on 2026-06-22 with reprex v2.1.1

  • Bug 2: scale and order of score_log/score_spherical:
    Again the worst model gets 100% performance, also values seem off, most likely due to line 175 of the performance_score function of the package (quadrat_p <- sum(p_y^2)) which should probably be an average. Then in [quadratic = mean(2 * p_y + quadrat_p),
    spherical = mean(p_y / sqrt(quadrat_p))] it seems that quadratic grows with the sample size, while spherical goes to zero.

reprex:

  library(performance)
  #again m1 is best, also weird results
m1 <- glm(vs ~ 1 , data = mtcars, family = "binomial")
m2 <- glm(vs ~ wt , data = mtcars, family = "binomial")
m3 <- glm(vs ~ wt + mpg, data = mtcars, family = "binomial")
performance::compare_performance(m1,m2,m3, metrics="SCORE",rank = TRUE)
#> # Comparison of Model Performance Indices
#> 
#> Name | Model | Score_log | Score_spherical | Performance-Score
#> --------------------------------------------------------------
#> m1   |   glm |    -7.010 |           0.130 |           100.00%
#> m2   |   glm |    -9.834 |           0.067 |            32.11%
#> m3   |   glm |   -14.903 |           0.095 |            21.71%
mtcars2<-rbind(mtcars,mtcars,mtcars)
m1 <- glm(vs ~ 1 , data = mtcars2, family = "binomial")
m2 <- glm(vs ~ wt , data = mtcars2, family = "binomial")
m3 <- glm(vs ~ wt + mpg, data = mtcars2, family = "binomial")
performance::compare_performance(m1,m2,m3, metrics="SCORE",rank = TRUE)
#> # Comparison of Model Performance Indices
#> 
#> Name | Model | Score_log | Score_spherical | Performance-Score
#> --------------------------------------------------------------
#> m1   |   glm |   -22.640 |           0.070 |           100.00%
#> m2   |   glm |   -31.966 |           0.031 |            31.72%
#> m3   |   glm |   -48.156 |           0.037 |             7.16%

Created on 2026-06-22 with reprex v2.1.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions