fix(parquet): add WriteBatchSpacedWithError to surface spaced-write failures#852
Conversation
…ailures WriteBatchSpaced returns nothing, so a write failure on the spaced path was either silently discarded (the doBatches branch ignored the error from commitWriteAndCheckPageLimit) or escaped as a panic out of the calling goroutine -- unlike WriteBatch, it had no deferred recover. This is the same class of bug as apache#820 (fixed in apache#824) on a different code path, reachable through pqarrow when writing nullable columns to a sink that may transiently fail (e.g. a cloud-storage upload writer). Add WriteBatchSpacedWithError(...) (int64, error) to each generated ColumnChunkWriter via the template: it mirrors WriteBatch, recovering panics into a returned error and checking the commit error on every branch. WriteBatchSpaced is left byte-for-byte unchanged (same signature and behavior), so this is a non-breaking, additive change -- exactly as apache#824 did with NewParquetWriter / NewParquetWriterWithError. pqarrow's dense-array encoder now uses WriteBatchSpacedWithError so nullable-column writes propagate sink failures as errors instead of crashing the goroutine or losing the error.
WriteBitmapBatchSpaced (Boolean) shares the same void-signature, no-recover shape as WriteBatchSpaced: it discarded the error from commitWriteAndCheckPageLimit. Add WriteBitmapBatchSpacedWithError mirroring WriteBatchSpacedWithError, leaving WriteBitmapBatchSpaced byte-for-byte unchanged (additive, non-breaking). It has no internal callers, so no rewiring is required; a regression test covers the new method.
lidavidm
left a comment
There was a problem hiding this comment.
WriteBatchSpaced is internal, right? Why keep it at this point if it's not a public API?
|
Both are technically part of the public API and I'm trying to avoid making breaking changes, it isn't inside of an |
| // failure as an error instead of panicking or silently discarding it, | ||
| // mirroring the error handling of WriteBatch. Prefer this method on write | ||
| // paths whose sink may fail (for example a network-attached writer). | ||
| func (w *{{.Name}}ColumnChunkWriter) WriteBatchSpacedWithError(values []{{.name}}, defLevels, repLevels []int16, validBits []byte, validBitsOffset int64) (valueOffset int64, err error) { |
There was a problem hiding this comment.
(1) Could we deprecate WriteBatchSpaced?
(2) Could WriteBatchSpaced be written in terms of WriteBatchSpacedWithError instead of duplicating the code entirely? (e.g. defer to WriteBatchSpacedWithError and panic if it returns an error)
There was a problem hiding this comment.
Done in 975d052.
- Deprecated — both
WriteBatchSpaced(all 8 column-writer types) andWriteBitmapBatchSpacednow carry a// Deprecated:notice pointing at their…WithErrorcounterparts. - No more duplication — each now just defers to the
…WithErrorvariant and panics if it returns an error, so the duplicated bodies are gone (−265 lines in the generated file; template + regen).
One behavior note worth your eyes: the legacy methods now fail loudly on a write error instead of silently discarding it on the numeric/boolean/fixed paths (the byte-array path already panicked). Because the delegation routes through …WithError's recover, the panic payload is now the FormatRecoveredError("unknown error type", …) wrapper rather than the raw error — errors.Is(…, original) still resolves through the %w chain, but the exact value differs for the byte-array/FLBA paths that previously panicked with the raw error. Happy to re-panic the unwrapped cause for those two if you'd rather preserve their exact original payload.
Internal callers are off the deprecated API: pqarrow already uses WriteBatchSpacedWithError, and the spaced-write test helper in internal/testutils was migrated too. The spaced-write failure tests now assert the deprecated methods panic with the underlying error.
There was a problem hiding this comment.
Pushed f4b92fc5 to address the panic-payload nuance I mentioned above.
The deprecated WriteBatchSpaced / WriteBitmapBatchSpaced now re-panic the unwrapped cause (via a small shared panicCause helper) instead of panic(err), so the byte-array / FLBA paths surface the original error value again rather than the FormatRecoveredError("unknown error type", …) wrapper they'd otherwise inherit from delegating through …WithError. errors.Is(…, original) still resolves, and non-error panics fall back to the wrapper.
Net result: those two paths keep their exact pre-deprecation panic payload, while the numeric/boolean paths still fail loud instead of silently dropping the error. go build/vet/staticcheck are clean and the parquet/file + parquet/pqarrow suites pass.
…gate to WithError variants Per review feedback on apache#852, deprecate WriteBatchSpaced (all column writer types) and the Boolean WriteBitmapBatchSpaced, and rewrite them to delegate to their *WithError counterparts and panic on error rather than duplicating the full method bodies. The legacy paths now fail loudly instead of silently discarding write errors (the byte-array path already panicked), and ~265 lines of generated duplication are removed. The internal spaced-write test helper now calls WriteBatchSpacedWithError, and the spaced-write failure tests assert the deprecated methods panic with the underlying error.
…riters WriteBatchSpaced and WriteBitmapBatchSpaced delegate to their *WithError variants, whose recover wraps the original panic via FormatRecoveredError. Re-panic the unwrapped cause through a shared panicCause helper so the deprecated methods surface the original error value (matching their pre-deprecation behavior on the byte-array/FLBA paths) instead of the wrapper. errors.Is against the original still resolves; non-error panics fall back to the wrapper.
Rationale for this change
Closes #826.
Across all 8 generated
*ColumnChunkWritertypes inparquet/file/column_writer_types.gen.go,WriteBatchandWriteBatchSpacedare an asymmetric pair:WriteBatchreturns(valueOffset int64, err error)and opens with adefer recover()that converts any panic into a returnederrorviautils.FormatRecoveredError.WriteBatchSpacedreturns nothing, has nodefer recover(), and on thedoBatches(non-byte-array) path silently discards theerrorreturned bycommitWriteAndCheckPageLimit.So a write failure during a spaced write is either silently discarded (numeric/boolean/fixed paths) or escapes as a panic out of the calling goroutine (the byte-array branch's
panic(err), plus other internal panics such as dictionary-fallback bookkeeping) — because, unlikeWriteBatch, there is norecover()to turn it into anerror.This is the same class of "panic escapes / error lost on the public API" bug as #820 (fixed in #824), on a different code path. It is reachable through
pqarrow—(*pqarrow.FileWriter)drivesWriteBatchSpacedfor nullable columns — so for consumers writing nullable columns to a network-attached sink (GCS, S3 multipart, HDFS, etc.), a transient downstream failure on the spaced path crashes the worker process or silently drops the error instead of returning it.The Boolean-only
WriteBitmapBatchSpacedhas the same void-signature / no-recover shape and is fixed here too.What changes are included in this PR?
New method
WriteBatchSpacedWithError(values, defLevels, repLevels, validBits, validBitsOffset) (int64, error)generated for all 8 column-writer types fromcolumn_writer_types.gen.go.tmpl. It mirrorsWriteBatch:defer func() { if r := recover(); r != nil { err = utils.FormatRecoveredError(...) } }(), anderrorfromcommitWriteAndCheckPageLimiton every branch (thedoBatchesbranch previously ignored it),panic-ing so the deferred recover surfaces it as a returnederror.New method
WriteBitmapBatchSpacedWithError(...) (int64, error)on*BooleanColumnChunkWriter, applying the same treatment to the bitmap spaced-write path.WriteBatchSpacedandWriteBitmapBatchSpacedare left byte-for-byte unchanged — same signatures, same behavior. This keeps the change non-breaking and purely additive, exactly the approach fix(parquet): return error instead of panicking on first-write failure #824 took withNewParquetWriter/NewParquetWriterWithError. Existing callers are unaffected; callers that want error handling opt into the…WithErrorvariants.pqarrow's dense-array encoder (writeDenseArrow) now callsWriteBatchSpacedWithErrorat all 12 spaced-write sites and propagates the error through its existing(err error)return, so nullable-column writes report sink failures as errors instead of crashing the goroutine or losing the error. (WriteBitmapBatchSpacedhas no internal callers, so nothing to rewire there.)The diff to the generated file and the template is additions-only (no existing generated method bodies change).
Are these changes tested?
Yes. Two new tests in
parquet/file/column_writer_test.gouse the existingmockpagewriterto failWriteDataPage, with a tinyDataPageSizeand dictionary disabled to force a flush mid-call:TestWriteBatchSpacedWithErrorPropagatesWriteFailureTestWriteBitmapBatchSpacedWithErrorPropagatesWriteFailureEach asserts that the
…WithErrorvariant returns the wrapped failure (errors.Is(err, failureErr)) rather than panicking or discarding it, and that the legacy method does not panic out of the call (its original behavior is preserved).The full
parquet/file/...andparquet/pqarrow/...suites pass (PARQUET_TEST_DATApointed atparquet-testing/data), andgo vetis clean.Are there any user-facing changes?
No breaking changes.
WriteBatchSpaced(...)andWriteBitmapBatchSpaced(...)— signatures and behavior unchanged.WriteBatchSpacedWithError(...) (int64, error)(all 8 types) andWriteBitmapBatchSpacedWithError(...) (int64, error)(Boolean) — new exported methods. Adding exported methods is non-breaking in Go.pqarrowpublic signatures are unchanged; the dense-array nullable write path now returns transient sink failures as anerror(callers already checkingerrget strictly better behavior).