Skip to content

Conversation

sanjay20m
Copy link

@sanjay20m sanjay20m commented Sep 19, 2025

This PR improves Python integer type inference in Arrow.

Previously, TypeInferrer would always default to int64, which caused
an OverflowError when encountering values larger than 2**63 - 1.

Changes in this PR:

  • Introduced InferIntegerType helper to select the smallest fitting type
    (int8, int16, int32, int64, uint8, uint16, uint32, uint64).
  • Extended TypeInferrer to track min_int_ and max_int_ while processing integers.
  • Replaced the unconditional int64 fallback with InferIntegerType.
  • Added safe PyObject reference handling for min/max tracking.

Impact:

  • Fixes incorrect inference for large positive integers by enabling uint64.
  • Prevents overflows and improves efficiency by choosing smaller integer types
    when possible.

- Added InferIntegerType helper to determine the smallest fitting integer
  or unsigned integer type based on observed min/max Python integer values.
- Updated TypeInferrer to track min_int_ and max_int_ during inference.
- Replaced default int64 inference with InferIntegerType to prevent
  incorrect narrowing and to support uint64 for values > 2**63 - 1.
- Ensured correct reference management of tracked PyObjects.
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@sanjay20m
Copy link
Author

@raulcd @AlenkaF @rok Please review

@AlenkaF AlenkaF changed the title Improve integer type inference with min/max tracking GH-47607: [C++][Python] Improve integer type inference with min/max tracking Sep 23, 2025
Copy link

⚠️ GitHub issue #47607 has been automatically assigned in GitHub to PR creator.

Copy link
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.
Could you add tests that replicate the behavior we are trying to solve with this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants