Skip to content

[C++][Dataset] Fix DatasetWriter deadlock on writting batch greater than max_rows_queued #46139

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

gitmodimo
Copy link
Contributor

Rationale for this change

DatasetWriter deadlocks when batch written is longer than max_rows_queued. This also causes deadlock in acero and propagates paused state.

What changes are included in this PR?

Changed throttle threshold calculation to allow batch.lenght>max_rows_queued

Are these changes tested?

Provoking test added.

Are there any user-facing changes?

No.

Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

@gitmodimo gitmodimo force-pushed the DatasetWriterMaxRowsQueued branch from 84d6b8b to c5189ae Compare April 14, 2025 19:27
@gitmodimo gitmodimo force-pushed the DatasetWriterMaxRowsQueued branch from c5189ae to 2a6a913 Compare April 14, 2025 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant