-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should deny further write if ParquetWriter is aborted #3011
Comments
@hellishfire, How users would not notice the corruption when the related exception is thrown? My change is related to auto-closing to not to write anything automatically (at closing) after an exception happened. The users of the API shall not continue writing data after an exception. |
We found that some users still try to write after exception, then successfully close the file, only to find out later that the file can not be opened for read because of missing file footer. |
@gszadovszky On another note, in InternalParquetRecordWriter.java
Should we narrow the try-catch scope to exclude exceptions thrown by writeSupport (i.e. illegal input values denied by writeSupport), which is irrelevant to actual file io operation? |
I don't think parquet-java is fool-proof in any ways. And how do you think we should handle the case of continue writing with a writer that is already "aborted"? Throw yet another exception? I am not against it but cannot see the point. Also, it shall not significantly impact write performance.
It would make sense but I am not sure if we can 100% differentiate the potential exceptions and whether we already had impact on the file writing at the throwing time or not. Feel free to put up a PR if you would like to work on this. I'm happy to review. |
OK. Would it be possible for you to elaborate a bit on why aborted state was introduced in the first place? Like, which user case triggered a bug that led you to fix it? It was not explained clearly in your commit message, so I somewhat lack the context of that commit. |
@hellishfire, my bad, should have explained in more details. In many cases, Parquet writers are used in a try-with-resources construct. It invokes |
Describe the bug, including details regarding any error messages, version, and platform.
After PARQUET-2437, InternalParquetRecordWriter will switch to aborted state if write throws, but still accept further writes even though these writes will not be flushed during close, leaving the file incomplete. Users will not notice the corrupted file until they actually read the file.
Should we check for aborted state and reject further writes to avoid such surprises?
@gszadovszky
Component(s)
No response
The text was updated successfully, but these errors were encountered: