Skip to content

parquet/file: NewParquetWriter panics on transient sink.Write errors during file initialization #820

@AaronReboot

Description

@AaronReboot

Describe the bug, including details regarding any error messages, version, and platform.

(*file.Writer).startFile() panics with "failed to write magic number" when the underlying sink's first Write returns an error or short-writes. Because NewParquetWriter calls startFile synchronously inside the constructor — and NewParquetWriter has no error return — there is no way for a consumer to recover via the standard Go error handling pattern.

Code (parquet/file/file_writer.go:201-204):

n, err := fw.sink.Write(magic)
if n != 4 || err != nil {
    panic("failed to write magic number")
}

Constructor signature (parquet/file/file_writer.go:72):

func NewParquetWriter(w io.Writer, sc *schema.GroupNode, opts ...WriteOption) *Writer

The higher-level pqarrow.NewFileWriter does have an error return, but it propagates the panic from file.NewParquetWriter through unconverted — so callers of either constructor must wrap the call in defer recover() to handle a flaky sink.

Why this matters in practice

When the sink is backed by a network-attached writer — e.g., a *storage.Writer from cloud.google.com/go/storage, a Cloud Storage resumable upload, an HDFS client, an S3 multipart upload — a transient network blip on the first 4-byte write reliably crashes the consumer process.

Concrete example: a Kubernetes-deployed Go service that writes ~thousands of Parquet files per day to GCS sees ~1-2 pod crashes per 5 pod-hours under normal cluster conditions, all from this panic. Each crash:

  1. Loses the entire worker process (not just the failing call)
  2. Forces a container restart
  3. Leaves any in-flight resumable upload session orphaned in GCS until session timeout
  4. Requires application-level orphan-recovery machinery to reclaim work

The panic stack consistently looks like:

panic: failed to write magic number

goroutine N [running]:
github.com/apache/arrow-go/v18/parquet/file.(*Writer).startFile(...)
	github.com/apache/arrow-go/v18@v18.6.0/parquet/file/file_writer.go:203 +0x257
github.com/apache/arrow-go/v18/parquet/file.NewParquetWriter(...)
	github.com/apache/arrow-go/v18@v18.6.0/parquet/file/file_writer.go:90 +0x2d4
github.com/apache/arrow-go/v18/parquet/pqarrow.NewFileWriter(...)
	github.com/apache/arrow-go/v18@v18.6.0/parquet/pqarrow/file_writer.go:166 +0x34d
<consumer code calling pqarrow.NewFileWriter>

Workarounds exist (defer recover() at the call site) but they're brittle:

  • Tied to an undocumented panic-string format
  • Forced on every consumer of the public API
  • Leave any partial sink state (open upload session, buffered bytes) for the caller to clean up

Expected behavior

Per Go's errors-as-values convention and the contract suggested by pqarrow.NewFileWriter's (*FileWriter, error) signature, a failure to write the magic header should surface as an error to the caller, not a panic.

Suggested fixes (smallest change first)

  1. Add a sibling constructor that returns (*Writer, error), e.g.:

    func NewParquetWriterWithError(w io.Writer, sc *schema.GroupNode, opts ...WriteOption) (*Writer, error)

    Non-breaking. Existing callers keep current panic semantics; new callers opt into clean error handling. pqarrow.NewFileWriter switches to the new constructor internally so its error return becomes meaningful for this failure mode.

  2. Defer startFile() until first append. Move the magic-header write to lazy execution on the first AppendRowGroup / AppendBufferedRowGroup call — both of which already have error paths. Slightly larger change but doesn't add a new public symbol.

  3. Lazy error pattern. Have startFile store its error in an unexported Writer.initErr field, and have every other public method (AppendRowGroup, Close, WritePageIndex, …) short-circuit on it. Non-breaking but touches every public method.

Reproduction sketch

// Synthetic flaky sink that fails the first Write.
type flakySink struct{ writes int }

func (f *flakySink) Write(p []byte) (int, error) {
    f.writes++
    if f.writes == 1 {
        return 0, errors.New("synthesized network blip")
    }
    return len(p), nil
}

func main() {
    // ... build a trivial schema.GroupNode ...
    defer func() {
        if r := recover(); r != nil {
            log.Printf("recovered: %v", r) // prints "failed to write magic number"
        }
    }()
    _ = file.NewParquetWriter(&flakySink{}, schemaNode) // panics here
}

Component(s)

Parquet

Version

github.com/apache/arrow-go/v18@v18.6.0 (also reproducible on current main; the code at line 201-203 is unchanged).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions