Skip to content

fix: trim before parsing numbers#9537

Open
aryan-212 wants to merge 1 commit intoapache:mainfrom
aryan-212:trim-num-str
Open

fix: trim before parsing numbers#9537
aryan-212 wants to merge 1 commit intoapache:mainfrom
aryan-212:trim-num-str

Conversation

@aryan-212
Copy link

@aryan-212 aryan-212 commented Mar 11, 2026

Which issue does this PR close?

Rationale for this change

The Parser::parse implementations for numeric types did not trim whitespace before parsing. This caused values like " 42 " or " 1.5 " to fail parsing and return None, even though they represent valid numbers.

What changes are included in this PR?

  • Added .trim() calls before parsing in FloatType Parser implementations.
  • Added string.trim() at the top of the parser_primitive! macro, which covers all integer and duration types.

Are these changes tested?

Yes. Added test_parse_trimmed_whitespace covering:

  • Float types with leading/trailing spaces and tabs/newlines
  • Signed and unsigned integer types with whitespace
  • Negative integers with whitespace
  • Whitespace-only strings returning None

Datafusion changes
For the following SQL :-

 SELECT
    substring('Suite 28', 6) AS extracted,
    length(substring('Suite 28', 6)) AS extracted_length,
    CAST(substring('Suite 28', 6) AS INT) AS extracted_int,
    CAST(substring('Suite 28', 6) AS INT) + 1 AS plus_one;

in datafusion we used to get

extracted extracted_length extracted_int plus_one
28 3 null null

now after these changes, we get

extracted extracted_length extracted_int plus_one
28 3 28 29

this behaviour is now aligned with Databricks

Are there any user-facing changes?

Yes. Numeric parsing now accepts strings with leading/trailing whitespace. This is a relaxation of the previous behaviour (previously None, now Some(value)), so it is not a breaking change.

@tustvold
Copy link
Contributor

Have you run the benchmarks for this?

@aryan-212
Copy link
Author

Have you run the benchmarks for this?

sorry, new here, could you tell me how do I run them? 😅

@Rafferty97
Copy link
Contributor

Have you run the benchmarks for this?

sorry, new here, could you tell me how do I run them? 😅

I think "cargo bench -p arrow-cast" should be sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

arrow-cast numeric parsers fail to parse whitespace-padded strings

3 participants