Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hash() list of string PanicException #21523

Open
2 tasks done
cmdlineluser opened this issue Feb 28, 2025 · 1 comment
Open
2 tasks done

hash() list of string PanicException #21523

cmdlineluser opened this issue Feb 28, 2025 · 1 comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars

Comments

@cmdlineluser
Copy link
Contributor

Checks

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of Polars.

Reproducible example

import polars as pl

pl.select(pl.lit(["foo"]).hash())
# PanicException: Hashing a list with a non-numeric inner type not supported. Got dtype: List(String)

Log output

thread '<unnamed>' panicked at crates/polars-ops/src/chunked_array/list/hash.rs:49:9:
Hashing a list with a non-numeric inner type not supported. Got dtype: List(String)

Issue description

I think it's just a matter of propagating the error message.

We can cast to Categorical before hashing.

pl.select(pl.lit(["foo"]).cast(pl.List(pl.Categorical)).hash())
# shape: (1, 1)
# ┌─────────────────────┐
# │ literal             │
# │ ---                 │
# │ u64                 │
# ╞═════════════════════╡
# │ 2451233427067226091 │
# └─────────────────────┘

Expected behavior

No panic.

Installed versions

--------Version info---------
Polars:              1.23.0
Index type:          UInt32
Platform:            macOS-13.6.1-arm64-arm-64bit-Mach-O
Python:              3.13.0 (main, Oct  7 2024, 05:02:14) [Clang 15.0.0 (clang-1500.1.0.2.5)]
LTS CPU:             False

----Optional dependencies----
adbc_driver_manager  <not installed>
altair               <not installed>
boto3                <not installed>
cloudpickle          <not installed>
connectorx           <not installed>
deltalake            <not installed>
fastexcel            0.12.0
fsspec               <not installed>
gevent               <not installed>
google.auth          <not installed>
great_tables         0.14.0
matplotlib           <not installed>
nest_asyncio         <not installed>
numpy                2.1.3
openpyxl             3.1.5
pandas               2.2.3
pyarrow              18.0.0
pydantic             <not installed>
pyiceberg            <not installed>
sqlalchemy           <not installed>
torch                <not installed>
xlsx2csv             <not installed>
xlsxwriter           3.2.0
@cmdlineluser cmdlineluser added bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars labels Feb 28, 2025
@cmdlineluser cmdlineluser changed the title hash list of string PanicException hash() list of string PanicException Feb 28, 2025
@mcrumiller
Copy link
Contributor

We can cast to Categorical before hashing.

You can, but I'd be careful with that, the hash of the categorical is simply the hash of the underlying physical representation and is completely independent of the string.

>>> import polars as pl
>>> pl.Series(["test"], dtype=pl.Categorical).hash()
Series: '' [u64]
[
        16718381540940362211
]
>>> pl.Series(["Completely different string"], dtype=pl.Categorical).hash()
Series: '' [u64]
[
        16718381540940362211
]
>>> pl.Series(["yet another string"], dtype=pl.Categorical).hash()
Series: '' [u64]
[
        16718381540940362211
]

These series were not created within the same global string cache, and so they each have independent physical representations that start with 0. So you get this:

s1 = pl.Series(["first string"], dtype=pl.Categorical)
s2 = pl.Series(["a new different string"], dtype=pl.Categorical)
s1.hash() == s2.hash()
Series: '' [bool]
[
        true
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs triage Awaiting prioritization by a maintainer python Related to Python Polars
Projects
None yet
Development

No branches or pull requests

2 participants