Summary
Large integral string values lose precision in two runtime conversion paths because they round-trip through float before converting to int.
This causes valid 64-bit values to be corrupted, including LongType.max.
Affected paths
StringLiteral.to(LongType) in pyiceberg/expressions/literals.py
partition_to_py for LongType, TimestampType, TimestampNanoType, TimestamptzType, and TimestamptzNanoType in pyiceberg/conversions.py
Minimal repro
from pyiceberg.expressions.literals import literal
from pyiceberg.conversions import partition_to_py
from pyiceberg.types import LongType, TimestampNanoType
print(literal(str(LongType.max)).to(LongType()))
print(literal("9007199254740993").to(LongType()).value)
print(partition_to_py(LongType(), "9007199254740993"))
print(partition_to_py(TimestampNanoType(), "9007199254740993"))
Actual
LongAboveMax
9007199254740992
9007199254740992
9007199254740992
Expected
LongLiteral(9223372036854775807)
9007199254740993
9007199254740993
9007199254740993
Why this happens
Both code paths currently do int(float(...)), which loses precision for large integer values above the IEEE-754 exact integer range.
In particular, this makes literal(str(LongType.max)).to(LongType()) return LongAboveMax even though LongType.max is valid by definition.
Cross-implementation note
I checked the same behavior in the other Iceberg implementations:
- Java uses exact integer parsing for partition strings via
Long.valueOf(asString) / Integer.valueOf(asString), and its core StringLiteral.to(...) path does not support string-to-int/long conversion.
- Go uses
strconv.ParseInt(..., 10, 64) for string-to-int32 / string-to-int64 literal conversion.
- Rust parses string integers with
parse::<i128>() and then narrows with explicit above/below-range handling.
- C++ does not expose the same string-to-int/long cast path in its expression literal API; numeric literals are already typed.
So this looks Python-specific rather than an implementation detail shared across Iceberg clients.
Notes
This looks like one underlying bug rather than two separate issues, since both paths have the same precision-loss mechanism.
Summary
Large integral string values lose precision in two runtime conversion paths because they round-trip through
floatbefore converting toint.This causes valid 64-bit values to be corrupted, including
LongType.max.Affected paths
StringLiteral.to(LongType)inpyiceberg/expressions/literals.pypartition_to_pyforLongType,TimestampType,TimestampNanoType,TimestamptzType, andTimestamptzNanoTypeinpyiceberg/conversions.pyMinimal repro
Actual
Expected
Why this happens
Both code paths currently do
int(float(...)), which loses precision for large integer values above the IEEE-754 exact integer range.In particular, this makes
literal(str(LongType.max)).to(LongType())returnLongAboveMaxeven thoughLongType.maxis valid by definition.Cross-implementation note
I checked the same behavior in the other Iceberg implementations:
Long.valueOf(asString)/Integer.valueOf(asString), and its coreStringLiteral.to(...)path does not support string-to-int/long conversion.strconv.ParseInt(..., 10, 64)for string-to-int32/ string-to-int64literal conversion.parse::<i128>()and then narrows with explicit above/below-range handling.So this looks Python-specific rather than an implementation detail shared across Iceberg clients.
Notes
This looks like one underlying bug rather than two separate issues, since both paths have the same precision-loss mechanism.