Skip to content

Large integral string values lose precision in literal and partition conversion #3404

@fallintoplace

Description

@fallintoplace

Summary

Large integral string values lose precision in two runtime conversion paths because they round-trip through float before converting to int.

This causes valid 64-bit values to be corrupted, including LongType.max.

Affected paths

  • StringLiteral.to(LongType) in pyiceberg/expressions/literals.py
  • partition_to_py for LongType, TimestampType, TimestampNanoType, TimestamptzType, and TimestamptzNanoType in pyiceberg/conversions.py

Minimal repro

from pyiceberg.expressions.literals import literal
from pyiceberg.conversions import partition_to_py
from pyiceberg.types import LongType, TimestampNanoType

print(literal(str(LongType.max)).to(LongType()))
print(literal("9007199254740993").to(LongType()).value)
print(partition_to_py(LongType(), "9007199254740993"))
print(partition_to_py(TimestampNanoType(), "9007199254740993"))

Actual

LongAboveMax
9007199254740992
9007199254740992
9007199254740992

Expected

LongLiteral(9223372036854775807)
9007199254740993
9007199254740993
9007199254740993

Why this happens

Both code paths currently do int(float(...)), which loses precision for large integer values above the IEEE-754 exact integer range.

In particular, this makes literal(str(LongType.max)).to(LongType()) return LongAboveMax even though LongType.max is valid by definition.

Cross-implementation note

I checked the same behavior in the other Iceberg implementations:

  • Java uses exact integer parsing for partition strings via Long.valueOf(asString) / Integer.valueOf(asString), and its core StringLiteral.to(...) path does not support string-to-int/long conversion.
  • Go uses strconv.ParseInt(..., 10, 64) for string-to-int32 / string-to-int64 literal conversion.
  • Rust parses string integers with parse::<i128>() and then narrows with explicit above/below-range handling.
  • C++ does not expose the same string-to-int/long cast path in its expression literal API; numeric literals are already typed.

So this looks Python-specific rather than an implementation detail shared across Iceberg clients.

Notes

This looks like one underlying bug rather than two separate issues, since both paths have the same precision-loss mechanism.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions