Error using TestSuites for numerical data #1122

jeric250 · 2024-05-23T11:39:34Z

Hi there, first time opening an issue so bear with me (and let me know if more info is needed).

Basic information:
Package version used: 0.4.20
Operating system and version: macOS VSCode
Programming language and version used: Python 3.12.2

Code snippet:

from evidently.calculations.stattests import StatTest
from evidently.test_suite import TestSuite
from evidently.tests import *

data_drift_dataset_tests = TestSuite(tests=[
    TestShareOfDriftedColumns(stattest='psi'),
])

# ref_df: represents reference pandas DataFrame data (only numerical features)
# curr_df: represents current pandas DataFrame data (only numerical features)
data_drift_dataset_tests.run(reference_data=ref_df, current_data=curr_df)
data_drift_dataset_tests

The above code is based on Evidently documentation: https://github.com/evidentlyai/evidently/blob/main/examples/how_to_questions/how_to_specify_stattest_for_a_testsuite.ipynb

Error message:

The above code snippet takes in only numerical data in a pandas DataFrame (data type of 'float64', 'int64'). When I use the exact same code for only categorical data (data type of 'object','category'), the above code works fine with a report generated.

I checked whether the numerical data used contain any weird values, and it doesn't seem to be the case. For example, to find records with non-numeric values:
ref_df[~ref_df.applymap(np.isreal).all(1)]

What am I missing? Any advice?

The text was updated successfully, but these errors were encountered:

elenasamuylova · 2024-05-23T11:58:29Z

Hi @jeric250, could you try to run pd.to_numeric on your input columns?

jeric250 · 2024-05-23T12:23:43Z

Thanks @elenasamuylova for responding so quickly. Forgot to mention, I did try pd.to_numeric as well, something like:
ref_df = ref_df.apply(pd.to_numeric, errors='coerce')
However, the same error still occurred. There's also no null values in the dataset as well.

When I tried to test on a single numerical column, I get the same error as well.

# test on AGE column, represent age of people (e.g. 32, 40)
data_drift_column_report = Report(metrics=[
    ColumnDriftMetric('AGE'),
    ColumnValuePlot('AGE'),  
])

data_drift_column_report.run(reference_data=ref_df, current_data=curr_df)
data_drift_column_report

Error:
UFuncTypeError: ufunc 'multiply' did not contain a loop with signature matching types (dtype('<U14'), dtype('float64')) -> None

Same error when I tried DataDriftTable:

data_drift_dataset_report = Report(metrics=[
    DataDriftTable(num_stattest='wasserstein', cat_stattest='psi'),    
])

data_drift_dataset_report.run(reference_data=ref_df, current_data=curr_df)
data_drift_dataset_report

When I limit DataDriftTable to just categorical columns, it works fine with a report generated.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error using TestSuites for numerical data #1122

Error using TestSuites for numerical data #1122

jeric250 commented May 23, 2024

elenasamuylova commented May 23, 2024

jeric250 commented May 23, 2024 •

edited

Loading

Error using TestSuites for numerical data #1122

Error using TestSuites for numerical data #1122

Comments

jeric250 commented May 23, 2024

elenasamuylova commented May 23, 2024

jeric250 commented May 23, 2024 • edited Loading

jeric250 commented May 23, 2024 •

edited

Loading