Conversation
Performance is important in Apache Arrow. So benchmark is useful for
developing Apache Arrow implementation.
* Add benchmarks for file and streaming writers.
* Remove redundant type arguments from array constructors.
Here are benchmark results on my environment.
Pure Ruby implementation is about 2-2.5x slower than release build C++
implementation but about 2-2.5x faster than debug build C++
implementation.
Release build C++/GLib:
File format:
```console
$ ruby -v -S benchmark-driver ruby/red-arrow-format/benchmark/file-writer.yaml
ruby 4.1.0dev (2026-03-26T07:27:31Z master c5ab2114df) +PRISM [x86_64-linux]
Warming up --------------------------------------
Arrow::Table#save 348.499 i/s - 374.000 times in 1.073175s (2.87ms/i)
Arrow::RecordBatchFileWriter 353.426 i/s - 385.000 times in 1.089337s (2.83ms/i)
ArrowFormat::FileWriter 133.293 i/s - 140.000 times in 1.050314s (7.50ms/i)
Calculating -------------------------------------
Arrow::Table#save 336.984 i/s - 1.045k times in 3.101035s (2.97ms/i)
Arrow::RecordBatchFileWriter 338.695 i/s - 1.060k times in 3.129655s (2.95ms/i)
ArrowFormat::FileWriter 134.640 i/s - 399.000 times in 2.963462s (7.43ms/i)
Comparison:
Arrow::RecordBatchFileWriter: 338.7 i/s
Arrow::Table#save: 337.0 i/s - 1.01x slower
ArrowFormat::FileWriter: 134.6 i/s - 2.52x slower
```
Streaming format:
```console
$ ruby -v -S benchmark-driver ruby/red-arrow-format/benchmark/streaming-writer.yaml
ruby 4.1.0dev (2026-03-26T07:27:31Z master c5ab2114df) +PRISM [x86_64-linux]
Warming up --------------------------------------
Arrow::Table#save 356.995 i/s - 385.000 times in 1.078447s (2.80ms/i)
Arrow::RecordBatchStreamWriter 347.891 i/s - 374.000 times in 1.075050s (2.87ms/i)
ArrowFormat::StreamingWriter 156.709 i/s - 160.000 times in 1.021004s (6.38ms/i)
Calculating -------------------------------------
Arrow::Table#save 350.743 i/s - 1.070k times in 3.050665s (2.85ms/i)
Arrow::RecordBatchStreamWriter 345.821 i/s - 1.043k times in 3.016011s (2.89ms/i)
ArrowFormat::StreamingWriter 160.022 i/s - 470.000 times in 2.937090s (6.25ms/i)
Comparison:
Arrow::Table#save: 350.7 i/s
Arrow::RecordBatchStreamWriter: 345.8 i/s - 1.01x slower
ArrowFormat::StreamingWriter: 160.0 i/s - 2.19x slower
```
Debug build C++/GLib:
File format:
```console
$ ruby -v -S benchmark-driver ruby/red-arrow-format/benchmark/file-writer.yaml
ruby 4.1.0dev (2026-03-26T07:27:31Z master c5ab2114df) +PRISM [x86_64-linux]
Warming up --------------------------------------
Arrow::Table#save 63.290 i/s - 66.000 times in 1.042815s (15.80ms/i)
Arrow::RecordBatchFileWriter 62.655 i/s - 66.000 times in 1.053389s (15.96ms/i)
ArrowFormat::FileWriter 138.082 i/s - 140.000 times in 1.013891s (7.24ms/i)
Calculating -------------------------------------
Arrow::Table#save 63.165 i/s - 189.000 times in 2.992143s (15.83ms/i)
Arrow::RecordBatchFileWriter 61.773 i/s - 187.000 times in 3.027220s (16.19ms/i)
ArrowFormat::FileWriter 134.709 i/s - 414.000 times in 3.073285s (7.42ms/i)
Comparison:
ArrowFormat::FileWriter: 134.7 i/s
Arrow::Table#save: 63.2 i/s - 2.13x slower
Arrow::RecordBatchFileWriter: 61.8 i/s - 2.18x slower
```
Streaming format:
```console
$ ruby -v -S benchmark-driver ruby/red-arrow-format/benchmark/streaming-writer.yaml
ruby 4.1.0dev (2026-03-26T07:27:31Z master c5ab2114df) +PRISM [x86_64-linux]
Warming up --------------------------------------
Arrow::Table#save 63.252 i/s - 66.000 times in 1.043439s (15.81ms/i)
Arrow::RecordBatchStreamWriter 61.272 i/s - 66.000 times in 1.077162s (16.32ms/i)
ArrowFormat::StreamingWriter 152.598 i/s - 160.000 times in 1.048506s (6.55ms/i)
Calculating -------------------------------------
Arrow::Table#save 61.016 i/s - 189.000 times in 3.097525s (16.39ms/i)
Arrow::RecordBatchStreamWriter 63.024 i/s - 183.000 times in 2.903642s (15.87ms/i)
ArrowFormat::StreamingWriter 160.416 i/s - 457.000 times in 2.848846s (6.23ms/i)
Comparison:
ArrowFormat::StreamingWriter: 160.4 i/s
Arrow::RecordBatchStreamWriter: 63.0 i/s - 2.55x slower
Arrow::Table#save: 61.0 i/s - 2.63x slower
```
|
|
There was a problem hiding this comment.
Pull request overview
This PR adds new benchmark-driver scenarios for Arrow file/streaming writers in the red-arrow-format Ruby implementation, and refactors several ArrowFormat::*Type#build_array paths to avoid passing redundant type objects into array constructors.
Changes:
- Added benchmark-driver YAML benchmarks for file and streaming writer performance comparisons.
- Updated
*Type#build_arrayimplementations to instantiate arrays without redundanttypearguments for singleton types. - Refactored several
ArrowFormat::*Arraysubclasses to derive theirtypefrom singleton type instances internally.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| ruby/red-arrow-format/lib/arrow-format/type.rb | Updates build_array to remove redundant self type arguments for singleton-backed arrays. |
| ruby/red-arrow-format/lib/arrow-format/array.rb | Adds/updates array constructors to infer type from singleton type instances instead of receiving it as an argument. |
| ruby/red-arrow-format/benchmark/file-writer.yaml | New benchmark-driver config covering Arrow file writer performance comparisons. |
| ruby/red-arrow-format/benchmark/streaming-writer.yaml | New benchmark-driver config covering Arrow streaming writer performance comparisons. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| class NullArray < Array | ||
| def initialize(type, size) | ||
| super(type, size, nil) | ||
| def initialize(size) | ||
| super(NullType.singleton, size, nil) | ||
| end |
There was a problem hiding this comment.
NullArray (and other arrays in this file) now hard-depend on *Type.singleton constants (e.g., NullType.singleton) but array.rb doesn’t require_relative "type". This makes require "arrow-format/array" (or any changed load order) fragile and can raise NameError when these constructors are called before type.rb is loaded. Consider making the dependency explicit by requiring type from this file (or otherwise ensuring types are loaded before these constructors can run).
Rationale for this change
Performance is important in Apache Arrow. So benchmark is useful for developing Apache Arrow implementation.
What changes are included in this PR?
Here are benchmark results on my environment.
Pure Ruby implementation is about 2-2.5x slower than release build C++ implementation but about 2-2.5x faster than debug build C++ implementation.
Release build C++/GLib:
File format:
Streaming format:
Debug build C++/GLib:
File format:
Streaming format:
Are these changes tested?
Yes.
Are there any user-facing changes?
Yes.