Skip to content

fix: concurrency issues when uploading large number of files #197

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 3, 2025

Conversation

quitrk
Copy link
Collaborator

@quitrk quitrk commented Apr 2, 2025

No description provided.

@quitrk quitrk force-pushed the tavram/async-extract branch from 0690f54 to 69fd86f Compare April 2, 2025 18:55
) -> list[str]:
"""
Extract all files from the given list of compressed files.
Runs the entire extraction process in a separate thread.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@quitrk quitrk force-pushed the tavram/async-extract branch 3 times, most recently from cbe6078 to 069de50 Compare April 3, 2025 12:12
@quitrk quitrk changed the title wip fix: concurrency issues when uploading large number of files Apr 3, 2025
* moved execution of archive extraction to a separate thread
* batched text extraction in smaller chunks. This became blocking when doing it for a larger number of files due to synchronous encoding detection done by charset_normalizer
@quitrk quitrk force-pushed the tavram/async-extract branch from 069de50 to 11f276c Compare April 3, 2025 12:14
@quitrk quitrk merged commit 054a87f into master Apr 3, 2025
3 checks passed
@quitrk quitrk deleted the tavram/async-extract branch April 3, 2025 13:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants