Skip to content

Optimize arrow-ipc #10029

@Rich-T-kid

Description

@Rich-T-kid

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

I want to gain more visibility into the runtime performance of arrow-flight. The crate currently has no benchmarks in-tree, which makes a few things hard:

  • Validating that future changes don't regress encode/decode or roundtrip performance.
  • Characterizing where time is actually spent in a Flight roundtrip, gRPC frame assembly, IPC decode, alignment-related copies, etc.
  • Verifying that the zero-copy properties Flight advertises actually hold end-to-end.

Describe the solution you'd like

  1. Add a benchmark suite to arrow-flight that covers:
  • End-to-end roundtrip benchmarks : full (client → server → client) over a real gRPC channel, measuring throughput and per-batch latency for a representative DoGet / DoPut flow.
  • Encode-only and decode-only benchmarks : isolate the IPC encode and decode steps so their measured independently and regressions can be attributed cleanly.
  • Tunable batch shape : the benchmarks should parameterize over the number of columns (and ideally batch size and column types) so we can see how cost scales with schema width. Wide and narrow batches stress different per-column overheads.
  1. follow up PR the removes any copies / any performance optimizations
  • benchmarks should prove these are faster

Describe alternatives you've considered

N/A

Additional context

These three resources should provide the backing knowledge to understand arrow-flight
Introducing Apache Arrow Flight
Arrow Flight RPC
Arrow IPC
What are flat buffers

TODO: fill in remaining context when I get a chance. Non blocking

Metadata

Metadata

Assignees

Labels

enhancementAny new improvement worthy of a entry in the changelog

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions