Describe the bug, including details regarding any error messages, version, and platform.
(*file.Writer).startFile() panics with "failed to write magic number" when the underlying sink's first Write returns an error or short-writes. Because NewParquetWriter calls startFile synchronously inside the constructor — and NewParquetWriter has no error return — there is no way for a consumer to recover via the standard Go error handling pattern.
Code (parquet/file/file_writer.go:201-204):
n, err := fw.sink.Write(magic)
if n != 4 || err != nil {
panic("failed to write magic number")
}
Constructor signature (parquet/file/file_writer.go:72):
func NewParquetWriter(w io.Writer, sc *schema.GroupNode, opts ...WriteOption) *Writer
The higher-level pqarrow.NewFileWriter does have an error return, but it propagates the panic from file.NewParquetWriter through unconverted — so callers of either constructor must wrap the call in defer recover() to handle a flaky sink.
Why this matters in practice
When the sink is backed by a network-attached writer — e.g., a *storage.Writer from cloud.google.com/go/storage, a Cloud Storage resumable upload, an HDFS client, an S3 multipart upload — a transient network blip on the first 4-byte write reliably crashes the consumer process.
Concrete example: a Kubernetes-deployed Go service that writes ~thousands of Parquet files per day to GCS sees ~1-2 pod crashes per 5 pod-hours under normal cluster conditions, all from this panic. Each crash:
- Loses the entire worker process (not just the failing call)
- Forces a container restart
- Leaves any in-flight resumable upload session orphaned in GCS until session timeout
- Requires application-level orphan-recovery machinery to reclaim work
The panic stack consistently looks like:
panic: failed to write magic number
goroutine N [running]:
github.com/apache/arrow-go/v18/parquet/file.(*Writer).startFile(...)
github.com/apache/arrow-go/v18@v18.6.0/parquet/file/file_writer.go:203 +0x257
github.com/apache/arrow-go/v18/parquet/file.NewParquetWriter(...)
github.com/apache/arrow-go/v18@v18.6.0/parquet/file/file_writer.go:90 +0x2d4
github.com/apache/arrow-go/v18/parquet/pqarrow.NewFileWriter(...)
github.com/apache/arrow-go/v18@v18.6.0/parquet/pqarrow/file_writer.go:166 +0x34d
<consumer code calling pqarrow.NewFileWriter>
Workarounds exist (defer recover() at the call site) but they're brittle:
- Tied to an undocumented panic-string format
- Forced on every consumer of the public API
- Leave any partial sink state (open upload session, buffered bytes) for the caller to clean up
Expected behavior
Per Go's errors-as-values convention and the contract suggested by pqarrow.NewFileWriter's (*FileWriter, error) signature, a failure to write the magic header should surface as an error to the caller, not a panic.
Suggested fixes (smallest change first)
-
Add a sibling constructor that returns (*Writer, error), e.g.:
func NewParquetWriterWithError(w io.Writer, sc *schema.GroupNode, opts ...WriteOption) (*Writer, error)
Non-breaking. Existing callers keep current panic semantics; new callers opt into clean error handling. pqarrow.NewFileWriter switches to the new constructor internally so its error return becomes meaningful for this failure mode.
-
Defer startFile() until first append. Move the magic-header write to lazy execution on the first AppendRowGroup / AppendBufferedRowGroup call — both of which already have error paths. Slightly larger change but doesn't add a new public symbol.
-
Lazy error pattern. Have startFile store its error in an unexported Writer.initErr field, and have every other public method (AppendRowGroup, Close, WritePageIndex, …) short-circuit on it. Non-breaking but touches every public method.
Reproduction sketch
// Synthetic flaky sink that fails the first Write.
type flakySink struct{ writes int }
func (f *flakySink) Write(p []byte) (int, error) {
f.writes++
if f.writes == 1 {
return 0, errors.New("synthesized network blip")
}
return len(p), nil
}
func main() {
// ... build a trivial schema.GroupNode ...
defer func() {
if r := recover(); r != nil {
log.Printf("recovered: %v", r) // prints "failed to write magic number"
}
}()
_ = file.NewParquetWriter(&flakySink{}, schemaNode) // panics here
}
Component(s)
Parquet
Version
github.com/apache/arrow-go/v18@v18.6.0 (also reproducible on current main; the code at line 201-203 is unchanged).
Describe the bug, including details regarding any error messages, version, and platform.
(*file.Writer).startFile()panics with"failed to write magic number"when the underlying sink's firstWritereturns an error or short-writes. BecauseNewParquetWritercallsstartFilesynchronously inside the constructor — andNewParquetWriterhas no error return — there is no way for a consumer to recover via the standard Go error handling pattern.Code (
parquet/file/file_writer.go:201-204):Constructor signature (
parquet/file/file_writer.go:72):The higher-level
pqarrow.NewFileWriterdoes have anerrorreturn, but it propagates the panic fromfile.NewParquetWriterthrough unconverted — so callers of either constructor must wrap the call indefer recover()to handle a flaky sink.Why this matters in practice
When the
sinkis backed by a network-attached writer — e.g., a*storage.Writerfromcloud.google.com/go/storage, a Cloud Storage resumable upload, an HDFS client, an S3 multipart upload — a transient network blip on the first 4-byte write reliably crashes the consumer process.Concrete example: a Kubernetes-deployed Go service that writes ~thousands of Parquet files per day to GCS sees ~1-2 pod crashes per 5 pod-hours under normal cluster conditions, all from this panic. Each crash:
The panic stack consistently looks like:
Workarounds exist (
defer recover()at the call site) but they're brittle:Expected behavior
Per Go's errors-as-values convention and the contract suggested by
pqarrow.NewFileWriter's(*FileWriter, error)signature, a failure to write the magic header should surface as an error to the caller, not a panic.Suggested fixes (smallest change first)
Add a sibling constructor that returns
(*Writer, error), e.g.:Non-breaking. Existing callers keep current panic semantics; new callers opt into clean error handling.
pqarrow.NewFileWriterswitches to the new constructor internally so itserrorreturn becomes meaningful for this failure mode.Defer
startFile()until first append. Move the magic-header write to lazy execution on the firstAppendRowGroup/AppendBufferedRowGroupcall — both of which already have error paths. Slightly larger change but doesn't add a new public symbol.Lazy error pattern. Have
startFilestore its error in an unexportedWriter.initErrfield, and have every other public method (AppendRowGroup,Close,WritePageIndex, …) short-circuit on it. Non-breaking but touches every public method.Reproduction sketch
Component(s)
Parquet
Version
github.com/apache/arrow-go/v18@v18.6.0(also reproducible on currentmain; the code at line 201-203 is unchanged).