Skip to content

fix: handle mixed str/int label types in calculate_matrix#1840

Open
Aftabbs wants to merge 1 commit intoevidentlyai:mainfrom
Aftabbs:fix/classification-metrics-mixed-type-label-sort
Open

fix: handle mixed str/int label types in calculate_matrix#1840
Aftabbs wants to merge 1 commit intoevidentlyai:mainfrom
Aftabbs:fix/classification-metrics-mixed-type-label-sort

Conversation

@Aftabbs
Copy link

@Aftabbs Aftabbs commented Mar 1, 2026

Problem

Closes #1085.

When a DataFrame with dtype="string" contains numeric-looking label values (e.g. "101", "102"), newer NumPy can coerce those strings to integers during np.union1d / np.unique, producing a labels list that mixes Python str and int. Python 3 does not support < between str and int, so sorted(labels) raises TypeError inside calculate_matrix:

File .../evidently/calculations/classification_performance.py, in calculate_matrix
    sorted_labels = sorted(labels)
TypeError: '<' not supported between instances of 'str' and 'int'

Reproduction (from the issue):

import pandas as pd
from evidently.report import Report
from evidently.metrics import ClassificationQualityMetric, ClassificationConfusionMatrix

label_target  = ['foo', 'bar', 'fun', 'foo', 'fun', 'foo', '101', '102']
label_predict = ['foo', 'bar', 'fun', 'bar', 'fun', 'fun', '101', '101']
data_df = pd.DataFrame({'target': label_target, 'prediction': label_predict}, dtype="string")

report = Report(metrics=[ClassificationQualityMetric(), ClassificationConfusionMatrix()])
report.run(reference_data=None, current_data=data_df)  # TypeError before fix

Root cause

Newer NumPy (≥ 2.0) uses hash-based deduplication in np.unique, which can process object arrays with mixed types without raising an error. The resulting Python list from .tolist() then contains both str and int objects. sorted() on that list fails in Python 3.

Additionally, sklearn.metrics.confusion_matrix also calls np.sort on the labels internally, so simply using key=str in our sort is not sufficient — the internal numpy sort would still fail on a mixed-type array.

Fix

Catch the TypeError and fall back to:

  1. Converting all labels to str for a consistent, sortable representation.
  2. Casting the target and prediction Series to str so that sklearn.metrics.confusion_matrix receives a homogeneous label set.
# Before
sorted_labels = sorted(labels)

# After
try:
    sorted_labels = sorted(labels)
except TypeError:
    sorted_labels = sorted(str(label) for label in labels)
    target = target.astype(str)
    prediction = prediction.astype(str)

The homogeneous (all-str or all-int) paths are unchanged.

Tests

Five new tests added to CalculateMatrixMixedTypeLabelsTest in tests/calculations/test_classification_performance.py:

Test What it verifies
test_all_string_labels_return_correct_matrix Regression guard — pure-string path unchanged
test_all_integer_labels_return_correct_matrix Regression guard — pure-integer path unchanged
test_mixed_str_int_labels_do_not_raise Mixed labels no longer raise TypeError
test_mixed_str_int_labels_confusion_matrix_shape Confusion matrix has the correct shape for mixed labels
test_string_dtype_dataframe_end_to_end Exact reproduction of issue #1085 passes end-to-end

All 9 tests in the file pass.

When a DataFrame with dtype="string" contains numeric-looking label
values (e.g. "101", "102"), newer NumPy can coerce those strings to
integers during np.union1d / np.unique, producing a labels list that
mixes Python str and int.  Python 3 does not support '<' between str
and int, so sorted(labels) raises TypeError inside calculate_matrix.

The fix catches the TypeError and falls back to:
  - Converting all labels to str for consistent sorting.
  - Casting the target and prediction Series to str so that
    sklearn.metrics.confusion_matrix receives a homogeneous label set
    (sklearn calls np.sort internally, which also fails on mixed types).

This restores the expected behaviour for string-typed columns with
numeric-looking label names without changing the code path for
all-string or all-integer label sets.

Adds five tests to CalculateMatrixMixedTypeLabelsTest:
  - all-string labels: regression guard
  - all-integer labels: regression guard
  - mixed str/int labels do not raise TypeError
  - confusion matrix has the correct shape for mixed labels
  - end-to-end test reproducing the exact scenario from issue evidentlyai#1085

Fixes evidentlyai#1085
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Classification metrics do not support label names containing numbers

1 participant