Skip to content

GH-49656: [Ruby] Add benchmark for writers#49657

Open
kou wants to merge 1 commit intoapache:mainfrom
kou:ruby-benchmark-writer
Open

GH-49656: [Ruby] Add benchmark for writers#49657
kou wants to merge 1 commit intoapache:mainfrom
kou:ruby-benchmark-writer

Conversation

@kou
Copy link
Copy Markdown
Member

@kou kou commented Apr 3, 2026

Rationale for this change

Performance is important in Apache Arrow. So benchmark is useful for developing Apache Arrow implementation.

What changes are included in this PR?

  • Add benchmarks for file and streaming writers.
  • Remove redundant type arguments from array constructors.

Here are benchmark results on my environment.

Pure Ruby implementation is about 2-2.5x slower than release build C++ implementation but about 2-2.5x faster than debug build C++ implementation.

Release build C++/GLib:

File format:

$ ruby -v -S benchmark-driver ruby/red-arrow-format/benchmark/file-writer.yaml
ruby 4.1.0dev (2026-03-26T07:27:31Z master c5ab2114df) +PRISM [x86_64-linux]
Warming up --------------------------------------
           Arrow::Table#save    348.499 i/s -     374.000 times in 1.073175s (2.87ms/i)
Arrow::RecordBatchFileWriter    353.426 i/s -     385.000 times in 1.089337s (2.83ms/i)
     ArrowFormat::FileWriter    133.293 i/s -     140.000 times in 1.050314s (7.50ms/i)
Calculating -------------------------------------
           Arrow::Table#save    336.984 i/s -      1.045k times in 3.101035s (2.97ms/i)
Arrow::RecordBatchFileWriter    338.695 i/s -      1.060k times in 3.129655s (2.95ms/i)
     ArrowFormat::FileWriter    134.640 i/s -     399.000 times in 2.963462s (7.43ms/i)

Comparison:
Arrow::RecordBatchFileWriter:       338.7 i/s
           Arrow::Table#save:       337.0 i/s - 1.01x  slower
     ArrowFormat::FileWriter:       134.6 i/s - 2.52x  slower

Streaming format:

$ ruby -v -S benchmark-driver ruby/red-arrow-format/benchmark/streaming-writer.yaml
ruby 4.1.0dev (2026-03-26T07:27:31Z master c5ab2114df) +PRISM [x86_64-linux]
Warming up --------------------------------------
             Arrow::Table#save    356.995 i/s -     385.000 times in 1.078447s (2.80ms/i)
Arrow::RecordBatchStreamWriter    347.891 i/s -     374.000 times in 1.075050s (2.87ms/i)
  ArrowFormat::StreamingWriter    156.709 i/s -     160.000 times in 1.021004s (6.38ms/i)
Calculating -------------------------------------
             Arrow::Table#save    350.743 i/s -      1.070k times in 3.050665s (2.85ms/i)
Arrow::RecordBatchStreamWriter    345.821 i/s -      1.043k times in 3.016011s (2.89ms/i)
  ArrowFormat::StreamingWriter    160.022 i/s -     470.000 times in 2.937090s (6.25ms/i)

Comparison:
             Arrow::Table#save:       350.7 i/s
Arrow::RecordBatchStreamWriter:       345.8 i/s - 1.01x  slower
  ArrowFormat::StreamingWriter:       160.0 i/s - 2.19x  slower

Debug build C++/GLib:

File format:

$ ruby -v -S benchmark-driver ruby/red-arrow-format/benchmark/file-writer.yaml
ruby 4.1.0dev (2026-03-26T07:27:31Z master c5ab2114df) +PRISM [x86_64-linux]
Warming up --------------------------------------
           Arrow::Table#save     63.290 i/s -      66.000 times in 1.042815s (15.80ms/i)
Arrow::RecordBatchFileWriter     62.655 i/s -      66.000 times in 1.053389s (15.96ms/i)
     ArrowFormat::FileWriter    138.082 i/s -     140.000 times in 1.013891s (7.24ms/i)
Calculating -------------------------------------
           Arrow::Table#save     63.165 i/s -     189.000 times in 2.992143s (15.83ms/i)
Arrow::RecordBatchFileWriter     61.773 i/s -     187.000 times in 3.027220s (16.19ms/i)
     ArrowFormat::FileWriter    134.709 i/s -     414.000 times in 3.073285s (7.42ms/i)

Comparison:
     ArrowFormat::FileWriter:       134.7 i/s
           Arrow::Table#save:        63.2 i/s - 2.13x  slower
Arrow::RecordBatchFileWriter:        61.8 i/s - 2.18x  slower

Streaming format:

$ ruby -v -S benchmark-driver ruby/red-arrow-format/benchmark/streaming-writer.yaml
ruby 4.1.0dev (2026-03-26T07:27:31Z master c5ab2114df) +PRISM [x86_64-linux]
Warming up --------------------------------------
             Arrow::Table#save     63.252 i/s -      66.000 times in 1.043439s (15.81ms/i)
Arrow::RecordBatchStreamWriter     61.272 i/s -      66.000 times in 1.077162s (16.32ms/i)
  ArrowFormat::StreamingWriter    152.598 i/s -     160.000 times in 1.048506s (6.55ms/i)
Calculating -------------------------------------
             Arrow::Table#save     61.016 i/s -     189.000 times in 3.097525s (16.39ms/i)
Arrow::RecordBatchStreamWriter     63.024 i/s -     183.000 times in 2.903642s (15.87ms/i)
  ArrowFormat::StreamingWriter    160.416 i/s -     457.000 times in 2.848846s (6.23ms/i)

Comparison:
  ArrowFormat::StreamingWriter:       160.4 i/s
Arrow::RecordBatchStreamWriter:        63.0 i/s - 2.55x  slower
             Arrow::Table#save:        61.0 i/s - 2.63x  slower

Are these changes tested?

Yes.

Are there any user-facing changes?

Yes.

Performance is important in Apache Arrow. So benchmark is useful for
developing Apache Arrow implementation.

* Add benchmarks for file and streaming writers.
* Remove redundant type arguments from array constructors.

Here are benchmark results on my environment.

Pure Ruby implementation is about 2-2.5x slower than release build C++
implementation but about 2-2.5x faster than debug build C++
implementation.

Release build C++/GLib:

File format:

```console
$ ruby -v -S benchmark-driver ruby/red-arrow-format/benchmark/file-writer.yaml
ruby 4.1.0dev (2026-03-26T07:27:31Z master c5ab2114df) +PRISM [x86_64-linux]
Warming up --------------------------------------
           Arrow::Table#save    348.499 i/s -     374.000 times in 1.073175s (2.87ms/i)
Arrow::RecordBatchFileWriter    353.426 i/s -     385.000 times in 1.089337s (2.83ms/i)
     ArrowFormat::FileWriter    133.293 i/s -     140.000 times in 1.050314s (7.50ms/i)
Calculating -------------------------------------
           Arrow::Table#save    336.984 i/s -      1.045k times in 3.101035s (2.97ms/i)
Arrow::RecordBatchFileWriter    338.695 i/s -      1.060k times in 3.129655s (2.95ms/i)
     ArrowFormat::FileWriter    134.640 i/s -     399.000 times in 2.963462s (7.43ms/i)

Comparison:
Arrow::RecordBatchFileWriter:       338.7 i/s
           Arrow::Table#save:       337.0 i/s - 1.01x  slower
     ArrowFormat::FileWriter:       134.6 i/s - 2.52x  slower

```

Streaming format:

```console
$ ruby -v -S benchmark-driver ruby/red-arrow-format/benchmark/streaming-writer.yaml
ruby 4.1.0dev (2026-03-26T07:27:31Z master c5ab2114df) +PRISM [x86_64-linux]
Warming up --------------------------------------
             Arrow::Table#save    356.995 i/s -     385.000 times in 1.078447s (2.80ms/i)
Arrow::RecordBatchStreamWriter    347.891 i/s -     374.000 times in 1.075050s (2.87ms/i)
  ArrowFormat::StreamingWriter    156.709 i/s -     160.000 times in 1.021004s (6.38ms/i)
Calculating -------------------------------------
             Arrow::Table#save    350.743 i/s -      1.070k times in 3.050665s (2.85ms/i)
Arrow::RecordBatchStreamWriter    345.821 i/s -      1.043k times in 3.016011s (2.89ms/i)
  ArrowFormat::StreamingWriter    160.022 i/s -     470.000 times in 2.937090s (6.25ms/i)

Comparison:
             Arrow::Table#save:       350.7 i/s
Arrow::RecordBatchStreamWriter:       345.8 i/s - 1.01x  slower
  ArrowFormat::StreamingWriter:       160.0 i/s - 2.19x  slower
```

Debug build C++/GLib:

File format:

```console
$ ruby -v -S benchmark-driver ruby/red-arrow-format/benchmark/file-writer.yaml
ruby 4.1.0dev (2026-03-26T07:27:31Z master c5ab2114df) +PRISM [x86_64-linux]
Warming up --------------------------------------
           Arrow::Table#save     63.290 i/s -      66.000 times in 1.042815s (15.80ms/i)
Arrow::RecordBatchFileWriter     62.655 i/s -      66.000 times in 1.053389s (15.96ms/i)
     ArrowFormat::FileWriter    138.082 i/s -     140.000 times in 1.013891s (7.24ms/i)
Calculating -------------------------------------
           Arrow::Table#save     63.165 i/s -     189.000 times in 2.992143s (15.83ms/i)
Arrow::RecordBatchFileWriter     61.773 i/s -     187.000 times in 3.027220s (16.19ms/i)
     ArrowFormat::FileWriter    134.709 i/s -     414.000 times in 3.073285s (7.42ms/i)

Comparison:
     ArrowFormat::FileWriter:       134.7 i/s
           Arrow::Table#save:        63.2 i/s - 2.13x  slower
Arrow::RecordBatchFileWriter:        61.8 i/s - 2.18x  slower

```

Streaming format:

```console
$ ruby -v -S benchmark-driver ruby/red-arrow-format/benchmark/streaming-writer.yaml
ruby 4.1.0dev (2026-03-26T07:27:31Z master c5ab2114df) +PRISM [x86_64-linux]
Warming up --------------------------------------
             Arrow::Table#save     63.252 i/s -      66.000 times in 1.043439s (15.81ms/i)
Arrow::RecordBatchStreamWriter     61.272 i/s -      66.000 times in 1.077162s (16.32ms/i)
  ArrowFormat::StreamingWriter    152.598 i/s -     160.000 times in 1.048506s (6.55ms/i)
Calculating -------------------------------------
             Arrow::Table#save     61.016 i/s -     189.000 times in 3.097525s (16.39ms/i)
Arrow::RecordBatchStreamWriter     63.024 i/s -     183.000 times in 2.903642s (15.87ms/i)
  ArrowFormat::StreamingWriter    160.416 i/s -     457.000 times in 2.848846s (6.23ms/i)

Comparison:
  ArrowFormat::StreamingWriter:       160.4 i/s
Arrow::RecordBatchStreamWriter:        63.0 i/s - 2.55x  slower
             Arrow::Table#save:        61.0 i/s - 2.63x  slower

```
Copilot AI review requested due to automatic review settings April 3, 2026 07:15
@github-actions
Copy link
Copy Markdown

github-actions bot commented Apr 3, 2026

⚠️ GitHub issue #49656 has been automatically assigned in GitHub to PR creator.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds new benchmark-driver scenarios for Arrow file/streaming writers in the red-arrow-format Ruby implementation, and refactors several ArrowFormat::*Type#build_array paths to avoid passing redundant type objects into array constructors.

Changes:

  • Added benchmark-driver YAML benchmarks for file and streaming writer performance comparisons.
  • Updated *Type#build_array implementations to instantiate arrays without redundant type arguments for singleton types.
  • Refactored several ArrowFormat::*Array subclasses to derive their type from singleton type instances internally.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
ruby/red-arrow-format/lib/arrow-format/type.rb Updates build_array to remove redundant self type arguments for singleton-backed arrays.
ruby/red-arrow-format/lib/arrow-format/array.rb Adds/updates array constructors to infer type from singleton type instances instead of receiving it as an argument.
ruby/red-arrow-format/benchmark/file-writer.yaml New benchmark-driver config covering Arrow file writer performance comparisons.
ruby/red-arrow-format/benchmark/streaming-writer.yaml New benchmark-driver config covering Arrow streaming writer performance comparisons.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 142 to 145
class NullArray < Array
def initialize(type, size)
super(type, size, nil)
def initialize(size)
super(NullType.singleton, size, nil)
end
Copy link

Copilot AI Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NullArray (and other arrays in this file) now hard-depend on *Type.singleton constants (e.g., NullType.singleton) but array.rb doesn’t require_relative "type". This makes require "arrow-format/array" (or any changed load order) fragile and can raise NameError when these constructors are called before type.rb is loaded. Consider making the dependency explicit by requiring type from this file (or otherwise ensuring types are loaded before these constructors can run).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants