Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Double close of ParquetFileWriter in ParquetWriter #2935

Open
hellishfire opened this issue Jun 27, 2024 · 0 comments
Open

Double close of ParquetFileWriter in ParquetWriter #2935

hellishfire opened this issue Jun 27, 2024 · 0 comments

Comments

@hellishfire
Copy link

hellishfire commented Jun 27, 2024

ParquetWriter.close() invokes InternalParquetRecordWriter.close() with following logic:

  public void close() throws IOException, InterruptedException {
    if (!closed) {
      try {
        if (aborted) {
          return;
        }
        flushRowGroupToStore();
        FinalizedWriteContext finalWriteContext = writeSupport.finalizeWrite();
        Map<String, String> finalMetadata = new HashMap<String, String>(extraMetaData);
        String modelName = writeSupport.getName();
        if (modelName != null) {
          finalMetadata.put(ParquetWriter.OBJECT_MODEL_NAME_PROP, modelName);
        }
        finalMetadata.putAll(finalWriteContext.getExtraMetaData());
        parquetFileWriter.end(finalMetadata);
      } finally {
        AutoCloseables.uncheckedClose(columnStore, pageStore, bloomFilterWriteStore, parquetFileWriter);
        closed = true;
      }
    }
  }

Apparently parquetFileWriter is closed twice here, first time by
parquetFileWriter.end(finalMetadata), which eventually calls parquetFileWriter.close()

second time by
AutoCloseables.uncheckedClose(columnStore, pageStore, bloomFilterWriteStore, parquetFileWriter);

This causes the underlying PositionOutputStream in ParquetFileWriter to be flushed again after it's closed, which may raise exception depending on the underlying stream implementation.

  public void close() throws IOException {
    try (PositionOutputStream temp = out) {
      temp.flush();
      if (crcAllocator != null) {
        crcAllocator.close();
      }
    }
  }

sample exception:

Caused by: org.apache.parquet.util.AutoCloseables$ParquetCloseResourceException: Unable to close resource
at org.apache.parquet.util.AutoCloseables.uncheckedClose(AutoCloseables.java:85)
at org.apache.parquet.util.AutoCloseables.uncheckedClose(AutoCloseables.java:94)
at org.apache.parquet.hadoop.InternalParquetRecordWriter.close(InternalParquetRecordWriter.java:144)
at org.apache.parquet.hadoop.ParquetWriter.close(ParquetWriter.java:437)
... 70 more
Caused by: java.io.IOException: stream is already closed
(-------- specific stream implementation ----------------)
at java.io.FilterOutputStream.flush(FilterOutputStream.java:140)
at java.io.DataOutputStream.flush(DataOutputStream.java:123)
at org.apache.parquet.hadoop.util.HadoopPositionOutputStream.flush(HadoopPositionOutputStream.java:59)
at org.apache.parquet.hadoop.ParquetFileWriter.close(ParquetFileWriter.java:1659)
at org.apache.parquet.util.AutoCloseables.close(AutoCloseables.java:49)
at org.apache.parquet.util.AutoCloseables.uncheckedClose(AutoCloseables.java:83)

This issue is observed since 1.14.0, and I suspect PARQUET-2496 is caused by this similar issue.

@hellishfire hellishfire changed the title Double close of parquetFileWriter in ParquetWriter Double close of ParquetFileWriter in ParquetWriter Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant