Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pandas + pandera + structlog + rich + raised exception = infinite loop #679

Open
hackermandh opened this issue Nov 20, 2024 · 0 comments
Open

Comments

@hackermandh
Copy link

As silly as the title is, I think it's about the most condensed way to describe my problem.

  1. If I have a Pandas dataframe
  2. And I use a Pandera DataFrameModel class to validate said dataframe
  3. And I try to log the exception

Then the script will never stop (or at least run beyond the 5 minutes I was willing to wait)

For your ease, I already prepared a minimal example that triggers this behavior, and I come bringing a uv script:

Run this with uv run --script script.py, if you save this as script.py. Normally I wouldn't do this, but I know you're a fan of uv, as I am, so this should make both our lives easier 😉.

# /// script
# requires-python = ">=3.11" # python version does not seem to matter (3.7, 3.11 and 3.13 tested)
# dependencies = [
#     "pandas>=2.2.0", # 2.2.0 minimal version
#     "pandera>=0.20.0", # 0.20.0 minimal version
#     "rich>=13.9.4",  # if you disable `rich`, it'll run as expected, or:
#     "structlog>=24.4.0", # version 21.1.0 works as well, anything after it breaks.
# ]
# # run this file with "uv run --script script.py"
# ///
print("1. loading imports")
import pandas as pd
import pandera as pa
from pandera.typing import Series
from pandera.errors import SchemaError
from structlog.stdlib import get_logger

print("2. loading logger")
logger = get_logger(__name__)


print("3. loading schema")
class MySchema(pa.DataFrameModel):
    my_floats: Series[float] = pa.Field(
        alias="my_floats", check_name=True, nullable=False
    )

    class Config:
        coerce = True


print("4. loading dict")
MY_DICT = {
    "my_floats": {
        1: "tHiS iS nOt A fLoAt",
    },
}


print("5. dict to dataframe")
df = pd.DataFrame.from_dict(MY_DICT)

try:
    print("6. validation")
    MySchema.validate(df)
except SchemaError as schema_error:
    print(
        "7. logging the exception (cancel the script after 30 seconds, as it'll run forever)"
    )
    logger.exception("ingestion-validation-unsuccessful")
    print("8. you'll never reach this point")
    raise schema_error
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant