Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[contrib/pzstd] Prevent hangs when there are errors #4080

Merged
merged 1 commit into from
Jan 13, 2025

Conversation

yotann
Copy link
Contributor

@yotann yotann commented Jun 20, 2024

When two threads are using a WorkQueue and the reader thread exits due to an error, it must call WorkQueue::finish() to wake up the writer thread. Otherwise, if the queue is full and the writer thread is waiting for a free slot, it could hang forever.

This can happen in pratice when decompressing a large, corrupted file that does not contain pzstd skippable frames.

When two threads are using a WorkQueue and the reader thread exits due
to an error, it must call WorkQueue::finish() to wake up the writer
thread. Otherwise, if the queue is full and the writer thread is waiting
for a free slot, it could hang forever.

This can happen in pratice when decompressing a large, corrupted file
that does not contain pzstd skippable frames.
@embg
Copy link
Contributor

embg commented Jan 10, 2025

Why is in->finish() here not sufficient?

auto inGuard = makeScopeGuard([&] { in->finish(); });

I do agree that the changes in writeFile() are necessary.

@embg
Copy link
Contributor

embg commented Jan 10, 2025

OK, I see what's going on now. In both asyncCompressChunks() and asyncDecompressFrames(), the scope guard is in the same scope as readData(). So the scope guard destructor won't run if readData() is stuck, which can cause a deadlock.

The only remaining question I have is whether it's safe to finish() a WorkQueue twice. I see you added a code comment addressing this question. Will take a look on Monday.

@embg
Copy link
Contributor

embg commented Jan 13, 2025

OK, I don't see any reason why we can't call finish() twice on the same WorkQueue. Locking twice is safe, setting done_ twice is safe, and notify_all() doesn't have any pre-conditions according to C++ docs.

LGTM.

@terrelln
Copy link
Contributor

Thanks for the PR @yotann & thanks for reviewing @embg!

@terrelln terrelln merged commit 80af41e into facebook:dev Jan 13, 2025
93 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants