Skip to content

fix(parquet): add WriteBatchSpacedWithError to surface spaced-write failures#852

Merged
zeroshade merged 4 commits into
apache:mainfrom
zeroshade:fix/parquet-writebatchspaced-error
Jun 18, 2026
Merged

fix(parquet): add WriteBatchSpacedWithError to surface spaced-write failures#852
zeroshade merged 4 commits into
apache:mainfrom
zeroshade:fix/parquet-writebatchspaced-error

Conversation

@zeroshade

@zeroshade zeroshade commented Jun 11, 2026

Copy link
Copy Markdown
Member

Rationale for this change

Closes #826.

Across all 8 generated *ColumnChunkWriter types in parquet/file/column_writer_types.gen.go, WriteBatch and WriteBatchSpaced are an asymmetric pair:

  • WriteBatch returns (valueOffset int64, err error) and opens with a defer recover() that converts any panic into a returned error via utils.FormatRecoveredError.
  • WriteBatchSpaced returns nothing, has no defer recover(), and on the doBatches (non-byte-array) path silently discards the error returned by commitWriteAndCheckPageLimit.

So a write failure during a spaced write is either silently discarded (numeric/boolean/fixed paths) or escapes as a panic out of the calling goroutine (the byte-array branch's panic(err), plus other internal panics such as dictionary-fallback bookkeeping) — because, unlike WriteBatch, there is no recover() to turn it into an error.

This is the same class of "panic escapes / error lost on the public API" bug as #820 (fixed in #824), on a different code path. It is reachable through pqarrow(*pqarrow.FileWriter) drives WriteBatchSpaced for nullable columns — so for consumers writing nullable columns to a network-attached sink (GCS, S3 multipart, HDFS, etc.), a transient downstream failure on the spaced path crashes the worker process or silently drops the error instead of returning it.

The Boolean-only WriteBitmapBatchSpaced has the same void-signature / no-recover shape and is fixed here too.

What changes are included in this PR?

  1. New method WriteBatchSpacedWithError(values, defLevels, repLevels, validBits, validBitsOffset) (int64, error) generated for all 8 column-writer types from column_writer_types.gen.go.tmpl. It mirrors WriteBatch:

    • opens with defer func() { if r := recover(); r != nil { err = utils.FormatRecoveredError(...) } }(), and
    • checks the error from commitWriteAndCheckPageLimit on every branch (the doBatches branch previously ignored it), panic-ing so the deferred recover surfaces it as a returned error.
  2. New method WriteBitmapBatchSpacedWithError(...) (int64, error) on *BooleanColumnChunkWriter, applying the same treatment to the bitmap spaced-write path.

  3. WriteBatchSpaced and WriteBitmapBatchSpaced are left byte-for-byte unchanged — same signatures, same behavior. This keeps the change non-breaking and purely additive, exactly the approach fix(parquet): return error instead of panicking on first-write failure #824 took with NewParquetWriter / NewParquetWriterWithError. Existing callers are unaffected; callers that want error handling opt into the …WithError variants.

  4. pqarrow's dense-array encoder (writeDenseArrow) now calls WriteBatchSpacedWithError at all 12 spaced-write sites and propagates the error through its existing (err error) return, so nullable-column writes report sink failures as errors instead of crashing the goroutine or losing the error. (WriteBitmapBatchSpaced has no internal callers, so nothing to rewire there.)

The diff to the generated file and the template is additions-only (no existing generated method bodies change).

Are these changes tested?

Yes. Two new tests in parquet/file/column_writer_test.go use the existing mockpagewriter to fail WriteDataPage, with a tiny DataPageSize and dictionary disabled to force a flush mid-call:

  • TestWriteBatchSpacedWithErrorPropagatesWriteFailure
  • TestWriteBitmapBatchSpacedWithErrorPropagatesWriteFailure

Each asserts that the …WithError variant returns the wrapped failure (errors.Is(err, failureErr)) rather than panicking or discarding it, and that the legacy method does not panic out of the call (its original behavior is preserved).

The full parquet/file/... and parquet/pqarrow/... suites pass (PARQUET_TEST_DATA pointed at parquet-testing/data), and go vet is clean.

Are there any user-facing changes?

No breaking changes.

  • WriteBatchSpaced(...) and WriteBitmapBatchSpaced(...) — signatures and behavior unchanged.
  • WriteBatchSpacedWithError(...) (int64, error) (all 8 types) and WriteBitmapBatchSpacedWithError(...) (int64, error) (Boolean) — new exported methods. Adding exported methods is non-breaking in Go.
  • pqarrow public signatures are unchanged; the dense-array nullable write path now returns transient sink failures as an error (callers already checking err get strictly better behavior).

…ailures

WriteBatchSpaced returns nothing, so a write failure on the spaced path
was either silently discarded (the doBatches branch ignored the error
from commitWriteAndCheckPageLimit) or escaped as a panic out of the
calling goroutine -- unlike WriteBatch, it had no deferred recover. This
is the same class of bug as apache#820 (fixed in apache#824) on a different code
path, reachable through pqarrow when writing nullable columns to a sink
that may transiently fail (e.g. a cloud-storage upload writer).

Add WriteBatchSpacedWithError(...) (int64, error) to each generated
ColumnChunkWriter via the template: it mirrors WriteBatch, recovering
panics into a returned error and checking the commit error on every
branch. WriteBatchSpaced is left byte-for-byte unchanged (same signature
and behavior), so this is a non-breaking, additive change -- exactly as
apache#824 did with NewParquetWriter / NewParquetWriterWithError.

pqarrow's dense-array encoder now uses WriteBatchSpacedWithError so
nullable-column writes propagate sink failures as errors instead of
crashing the goroutine or losing the error.
WriteBitmapBatchSpaced (Boolean) shares the same void-signature, no-recover
shape as WriteBatchSpaced: it discarded the error from
commitWriteAndCheckPageLimit. Add WriteBitmapBatchSpacedWithError mirroring
WriteBatchSpacedWithError, leaving WriteBitmapBatchSpaced byte-for-byte
unchanged (additive, non-breaking). It has no internal callers, so no
rewiring is required; a regression test covers the new method.
@zeroshade zeroshade requested a review from lidavidm June 11, 2026 20:13

@lidavidm lidavidm left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WriteBatchSpaced is internal, right? Why keep it at this point if it's not a public API?

@zeroshade

Copy link
Copy Markdown
Member Author

Both are technically part of the public API and I'm trying to avoid making breaking changes, it isn't inside of an internal package and it starts with a capital letter, so it's publicly accessible.

// failure as an error instead of panicking or silently discarding it,
// mirroring the error handling of WriteBatch. Prefer this method on write
// paths whose sink may fail (for example a network-attached writer).
func (w *{{.Name}}ColumnChunkWriter) WriteBatchSpacedWithError(values []{{.name}}, defLevels, repLevels []int16, validBits []byte, validBitsOffset int64) (valueOffset int64, err error) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) Could we deprecate WriteBatchSpaced?
(2) Could WriteBatchSpaced be written in terms of WriteBatchSpacedWithError instead of duplicating the code entirely? (e.g. defer to WriteBatchSpacedWithError and panic if it returns an error)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 975d052.

  1. Deprecated — both WriteBatchSpaced (all 8 column-writer types) and WriteBitmapBatchSpaced now carry a // Deprecated: notice pointing at their …WithError counterparts.
  2. No more duplication — each now just defers to the …WithError variant and panics if it returns an error, so the duplicated bodies are gone (−265 lines in the generated file; template + regen).

One behavior note worth your eyes: the legacy methods now fail loudly on a write error instead of silently discarding it on the numeric/boolean/fixed paths (the byte-array path already panicked). Because the delegation routes through …WithError's recover, the panic payload is now the FormatRecoveredError("unknown error type", …) wrapper rather than the raw error — errors.Is(…, original) still resolves through the %w chain, but the exact value differs for the byte-array/FLBA paths that previously panicked with the raw error. Happy to re-panic the unwrapped cause for those two if you'd rather preserve their exact original payload.

Internal callers are off the deprecated API: pqarrow already uses WriteBatchSpacedWithError, and the spaced-write test helper in internal/testutils was migrated too. The spaced-write failure tests now assert the deprecated methods panic with the underlying error.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed f4b92fc5 to address the panic-payload nuance I mentioned above.

The deprecated WriteBatchSpaced / WriteBitmapBatchSpaced now re-panic the unwrapped cause (via a small shared panicCause helper) instead of panic(err), so the byte-array / FLBA paths surface the original error value again rather than the FormatRecoveredError("unknown error type", …) wrapper they'd otherwise inherit from delegating through …WithError. errors.Is(…, original) still resolves, and non-error panics fall back to the wrapper.

Net result: those two paths keep their exact pre-deprecation panic payload, while the numeric/boolean paths still fail loud instead of silently dropping the error. go build/vet/staticcheck are clean and the parquet/file + parquet/pqarrow suites pass.

…gate to WithError variants

Per review feedback on apache#852, deprecate WriteBatchSpaced (all column writer types) and the Boolean WriteBitmapBatchSpaced, and rewrite them to delegate to their *WithError counterparts and panic on error rather than duplicating the full method bodies. The legacy paths now fail loudly instead of silently discarding write errors (the byte-array path already panicked), and ~265 lines of generated duplication are removed.

The internal spaced-write test helper now calls WriteBatchSpacedWithError, and the spaced-write failure tests assert the deprecated methods panic with the underlying error.
…riters

WriteBatchSpaced and WriteBitmapBatchSpaced delegate to their *WithError variants, whose recover wraps the original panic via FormatRecoveredError. Re-panic the unwrapped cause through a shared panicCause helper so the deprecated methods surface the original error value (matching their pre-deprecation behavior on the byte-array/FLBA paths) instead of the wrapper. errors.Is against the original still resolves; non-error panics fall back to the wrapper.
@zeroshade zeroshade merged commit a51d098 into apache:main Jun 18, 2026
28 of 31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

parquet/file: WriteBatchSpaced panics escape the API and silently discards commit-write errors

2 participants