fix: set finish before notifying load in LanceArrowWriter setFinished method #92
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Recently I was testing the integration between Lance and Spark.
I found that the Lance writer has a certain probability of hanging. After some troubleshooting, I discovered this is related to the LanceArrowWriter.setFinished method.
The original code appears to have a bug where it sets the finished status after notifying loadNextBatch, which could cause loadNextBatch to hang.
Root Cause
The ideal flow should be:
However, there's a chance it becomes:
If the second scenario occurs, thread 2 will hang indefinitely and cannot receive new notifications. jstack will show stacks hanging in LanceDataWriter.commit.
Reproduction
This issue is hard to reproduce. I encountered it in a very low-resource environment (Spark executor with only 1 core 4g) when creating a new table and writing 600 rows of data at once, where one column is a 1024-dimensional vector column.
It also occurs intermittently.
Further Confirmation
Although the current fix seems reasonable, I hope to get confirmation from maintainers to avoid introducing new unknown issues.
Any comments about this. @jackye1995