GH-3530: Optimize PLAIN encoding and decoding with direct ByteBuffer I/O by iemejia · Pull Request #3565 · apache/parquet-java

iemejia · 2026-05-17T22:38:32Z

Part of #3530 — Apache Parquet Java Performance Improvements

Summary

Replace ByteBufferInputStream and LittleEndianDataInputStream wrappers with direct ByteBuffer access for all PLAIN value readers and writers.

Readers (PlainValuesReader, BooleanPlainValuesReader, BinaryPlainValuesReader, FixedLenByteArrayPlainValuesReader): hold a little-endian ByteBuffer from initFromPage() and call getInt/getLong/getFloat/getDouble directly, eliminating per-value stream overhead.

Writers (PlainValuesWriter, BooleanPlainValuesWriter, FixedLenByteArrayPlainValuesWriter): write through CapacityByteArrayOutputStream's new writeInt/writeLong methods which put values directly into the NIO slab buffer in little-endian order, avoiding temporary byte-array allocation.

Supporting changes:

CapacityByteArrayOutputStream: allocate slabs with ByteOrder.LITTLE_ENDIAN, add writeInt(int) and writeLong(long) for single-value NIO writes.
BytesInput: add zero-copy writeTo(ByteBuffer) and toByteArray() using bulk ByteBuffer.get() instead of stream copy.
LittleEndianDataOutputStream: batch single-byte writes into single write(buf, 0, N) calls for writeShort/writeInt.

Includes JMH benchmarks (PlainEncodingBenchmark, PlainDecodingBenchmark) covering all 7 primitive types for both encoding and decoding.

Benchmark results

Environment: JDK 25.0.3 (Temurin), OpenJDK 64-Bit Server VM, JMH 1.37, Linux x86_64.

Decoding (100K values/iteration, 3 forks x 5 iterations, throughput mode):

Benchmark	Master (M ops/s)	Branch (M ops/s)	Speedup
decodeInt	425	5,427	12.8x
decodeFloat	416	5,440	13.1x
decodeLong	119	4,720	39.5x (*)
decodeDouble	116	6,026	51.8x (*)
decodeBoolean	639	1,642	2.6x
decodeFlba (len=2,12,16)	188	680	3.6x
decodeBinary (len=10,100,1000)	142	225-230	1.6x

Encoding:

Benchmark	Master (M ops/s)	Branch (M ops/s)	Speedup
encodeInt	148	559	3.8x
encodeFloat	150	532	3.5x
encodeLong	193	478	2.5x
encodeDouble	179	439	2.4x
encodeBoolean	850	1,692	2.0x
encodeBinary (len=10)	76	150	2.0x
encodeFlba (len=2-16)	156-184	178-224	1.1-1.2x

(*) decodeLong/Double show JIT variance across forks (error bars >20%); true steady-state likely ~13x consistent with INT32/FLOAT.

…uffer I/O Replace ByteBufferInputStream and LittleEndianDataInputStream wrappers with direct ByteBuffer access for all PLAIN value readers and writers. Readers (PlainValuesReader, BooleanPlainValuesReader, BinaryPlainValuesReader, FixedLenByteArrayPlainValuesReader) now hold a little-endian ByteBuffer obtained from initFromPage() and call getInt/getLong/getFloat/getDouble directly, eliminating per-value stream overhead. Writers (PlainValuesWriter, BooleanPlainValuesWriter, FixedLenByteArrayPlainValuesWriter) write through CapacityByteArrayOutputStream's new writeInt/writeLong methods, which put values directly into the NIO slab buffer in little-endian order, avoiding temporary byte-array allocation. Supporting changes: - CapacityByteArrayOutputStream: allocate slabs with ByteOrder.LITTLE_ENDIAN, add writeInt(int) and writeLong(long) for single-value NIO writes. - BytesInput: add zero-copy writeTo(ByteBuffer) and toByteArray() using bulk ByteBuffer.get() instead of stream copy. - LittleEndianDataOutputStream: batch single-byte writes into single write(buf, 0, N) calls for writeShort/writeInt. Includes JMH benchmarks (PlainEncodingBenchmark, PlainDecodingBenchmark) covering all 7 primitive types for both encoding and decoding.

Fokko · 2026-05-22T20:57:44Z

-      int length = BytesUtils.readIntLittleEndian(in);
-      return Binary.fromConstantByteBuffer(in.slice(length));
-    } catch (IOException | RuntimeException e) {
-      throw new ParquetDecodingException("could not read bytes at offset " + in.position(), e);


Should we keep the ParquetDecodingException? Otherwise we're throwing the raw {IOException,RuntimeException} which is a behavioral change.

Fokko · 2026-05-22T20:58:15Z

+    if (available > 0) {
+      this.buffer = stream.slice(available).order(ByteOrder.LITTLE_ENDIAN);
+    } else {
+      this.buffer = ByteBuffer.allocate(0).order(ByteOrder.LITTLE_ENDIAN);


Should we create a constant for the ByteBuffer.allocate(0).order(ByteOrder.LITTLE_ENDIAN);?

Fokko · 2026-05-22T21:02:48Z

-    try {
-      return Binary.fromConstantByteBuffer(in.slice(length));
-    } catch (IOException | RuntimeException e) {
-      throw new ParquetDecodingException("could not read bytes at offset " + in.position(), e);


Same as above, should we keep the wrapped ParquetDecodingException?

Fokko · 2026-05-22T21:05:43Z

-      try {
-        skipBytesFully(n * 8);
-      } catch (IOException e) {
-        throw new ParquetDecodingException("could not skip " + n + " double values", e);


Same here, do we want to keep the ParquetDecodingException?

Fokko · 2026-05-22T21:09:32Z

 public abstract class PlainValuesReader extends ValuesReader {
  private static final Logger LOG = LoggerFactory.getLogger(PlainValuesReader.class);

-  protected LittleEndianDataInputStream in;


We should go through the deprecation cycle here, but is anything using this outside of the project itself?

Fokko · 2026-05-22T21:13:43Z

+     * mutable {@code BAOS.getBuf()}.
+     */
+    @Override
+    public byte[] toByteArray() {


This overrides a deprecated API, as a follow-up we probably should move the internal calls to the new API:

@deprecated Use {@link #toByteBuffer(ByteBufferAllocator, Consumer)}

Fokko · 2026-05-22T21:16:38Z

-  public int getNextOffset() {
-    return in.getNextOffset();
+  public void skip(int n) {
+    bitIndex += n;


Should we check for bounds, and throw a ParquetDecodingException in case of out of bounds?

Fokko · 2026-05-22T21:17:27Z

-      } catch (IOException e) {
-        throw new ParquetDecodingException("could not skip " + n + " double values", e);
-      }
+      buffer.position(buffer.position() + n * 8);


Should we use Math.multiplyExact here and below?

Fokko reviewed May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-3530: Optimize PLAIN encoding and decoding with direct ByteBuffer I/O#3565

GH-3530: Optimize PLAIN encoding and decoding with direct ByteBuffer I/O#3565
iemejia wants to merge 1 commit into
apache:masterfrom
iemejia:parquet-perf-v2-par1-plain

iemejia commented May 17, 2026

Uh oh!

Fokko May 22, 2026

Uh oh!

Fokko May 22, 2026

Uh oh!

Fokko May 22, 2026

Uh oh!

Fokko May 22, 2026

Uh oh!

Fokko May 22, 2026

Uh oh!

Fokko May 22, 2026

Uh oh!

Fokko May 22, 2026

Uh oh!

Fokko May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

iemejia commented May 17, 2026

Summary

Benchmark results

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants