Skip to content

ORC-2166: [C++] Guard C++ integer arithmetic against overflow#2622

Open
ffacs wants to merge 3 commits into
apache:mainfrom
ffacs:string-direct-length-overflow
Open

ORC-2166: [C++] Guard C++ integer arithmetic against overflow#2622
ffacs wants to merge 3 commits into
apache:mainfrom
ffacs:string-direct-length-overflow

Conversation

@ffacs
Copy link
Copy Markdown
Contributor

@ffacs ffacs commented May 12, 2026

What changes were proposed in this pull request?

This PR adds reusable checked integer arithmetic helpers for the C++ code: addWithOverflow and multiplyWithOverflow.

The patch uses these helpers to reject overflow-prone arithmetic in reader and buffer paths, including direct string length accumulation, list/map length accumulation, dictionary offsets, float/double skip byte counts, union child counts, DataBuffer allocation sizes, BlockBuffer size/capacity calculations, and list/map vector offset capacity calculations.

Regression tests were added for the overflow helpers and malformed string/list/map length streams.

Why are the changes needed?

Malformed ORC files can provide invalid or extremely large values in length streams. Negative lengths or overflowing length sums can cause unchecked integer wraparound, incorrectly sized allocations, wrapped offsets, or unsafe skip/copy behavior.

Centralizing checked arithmetic makes these cases fail cleanly before unsafe allocation or memory access. Reader-facing malformed input now fails with ParseError.

How was this patch tested?

Ran the full C++ test target:

 cmake --build build --target test-out

Result:

1/2 Test #1: orc-test .... Passed
2/2 Test #2: tool-test .... Passed

100% tests passed, 0 tests failed out of 2

Also ran targeted tests during development for the new overflow checks.

Was this patch authored or co-authored using generative AI tooling?

Yes. Generated with OpenAI Codex.

Comment thread c++/src/ColumnReader.cc Outdated
hasNegativeLength |= value < 0;
size_t length = static_cast<size_t>(value);
size_t nextTotalLength = totalLength + length;
hasLengthOverflow |= nextTotalLength < totalLength;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like skip() method, shall we throw here instead of waiting for the whole loop is finished?

@ffacs ffacs changed the title ORC-2166: [C++] Validate string lengths in direct encoding reader ORC-2166: [C++] Guard C++ integer arithmetic against overflow May 24, 2026
@ffacs ffacs force-pushed the string-direct-length-overflow branch from ffa2af2 to 0c3a0c7 Compare May 24, 2026 13:39
@ffacs ffacs force-pushed the string-direct-length-overflow branch from 0c3a0c7 to 266b26b Compare May 24, 2026 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants