Fix handling single class in chunk for CBPE #384

michael-nml · 2024-05-02T10:23:30Z

This PR fixes an error when calculating business value, confusion matrix & specificity for binary classification problems where a chunk only contains 1 class.

Previously this would fail with:

nannyml.exceptions.CalculatorException: failed while fitting nannyml.performance_estimation.confidence_based.cbpe.CBPE.
not enough values to unpack (expected 4, got 1)

This happens because the sklearn.metrics.confusion_matrix function NannyML uses internally bases its output on the number of classes present in the input. If only a single class is present, only 1 value is returned where we normally expect 4 for a binary classification problem. This PR resolves this by explicitly providing the expected classes in the labels argument. These expected classes are currently hard-coded as [0, 1] but we may want to change this to derive values from the input if/when we support string-based classes for binary classification.

Additionally, this PR resolves an issue with F1 sampling error calculation when there are no positive cases present in the input. This previously resulted in a ZeroDivisionError. Now it resolves the NaN sampling error.

The `confusion_matrix` function used in various CBPE metrics returns values for each class/label present in the input. For binary classification this means we expect 4 values (TP, FP, FN, TN). However if only one class is represented in the input, the function will only return one value. This commit addresses that failure case by explicitly providing the expected labels to the `confusion_matrix` function. Currently these values are hard-coded for binary classification, but we may want to derive them from the input later on if we were to support string-based pass/fail classes.

codecov · 2024-05-02T10:32:25Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.67%. Comparing base (13ace29) to head (730ae35).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #384      +/-   ##
==========================================
+ Coverage   78.52%   78.67%   +0.15%     
==========================================
  Files         110      110              
  Lines        8562     8567       +5     
  Branches     1522     1523       +1     
==========================================
+ Hits         6723     6740      +17     
+ Misses       1476     1468       -8     
+ Partials      363      359       -4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

michael-nml added 3 commits May 2, 2024 11:53

Add test case for single class CBPE fitting

a793cf4

Fix F1 sampling error when no positive cases

730ae35

michael-nml requested review from nnansters and nikml as code owners May 2, 2024 10:23

nnansters approved these changes May 2, 2024

View reviewed changes

nnansters merged commit d064916 into main May 2, 2024
7 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix handling single class in chunk for CBPE #384

Fix handling single class in chunk for CBPE #384

michael-nml commented May 2, 2024

codecov bot commented May 2, 2024

Fix handling single class in chunk for CBPE #384

Fix handling single class in chunk for CBPE #384

Conversation

michael-nml commented May 2, 2024

codecov bot commented May 2, 2024

Codecov Report