Skip to content

IPC code writes data with insufficient alignment #5553

Closed
@hzuo

Description

@hzuo

Describe the bug

FileWriter and StreamWriter should ensure that the data is written with appropriate alignment such that arrays can be used without copying to a more-aligned buffer.

In particular, as of Rust 1.77.0 and LLVM 18, i128 now has a 16-byte alignment requirement even on x86 (ARM always had this requirement), i.e. std::mem::align_of::<i128> == 16. So Decimal128Arrays must be aligned to a 16-byte boundary when serialized into an IPC buffer. The pad_to_8 used everywhere in the IPC code causes it to pad insufficiently.

This prevents readers of the IPC data generated by this crate from doing true zero-copy reads (e.g. mmapping) since the data may be insufficiently aligned.

On some platforms, SIMD may also be significantly slower if the beginning of the IPC block isn't aligned to a 16-, 32-, or 64- byte boundary (as discussed in the Arrow spec document).

To Reproduce

See the test test_decimal128_alignment8_is_unaligned in PR #5554 - the fact that this test throws an error shows that alignment is not currently respected.

Expected behavior

See the test test_decimal128_alignment16 in PR #5554 - increasing alignment should allow us to do "true" zero-copy reads.

Additional context

IpcWriteOptions already has an "alignment" field but it is not being respected throughout the IPC code.

Related PRs and issues:

Metadata

Metadata

Assignees

No one assigned

    Labels

    arrowChanges to the arrow cratearrow-flightChanges to the arrow-flight cratebug

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions