-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Display correct progress bar when resuming with batch #197
Conversation
self.remaining_batch_ids.remove(batch_id) | ||
response_files_found += 1 | ||
if response_files_found > 0: | ||
tasks = [self.resume(batch_id, all_response_files) for batch_id in self.remaining_batch_ids] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we change this to .update_finished_batches or sth more reflective? this confuses me a little bit since i thought it was actually resuming the batch downloads.
pbar.n += self.tracker.n_completed_in_progress_requests | ||
pbar.n += self.tracker.n_failed_in_progress_requests | ||
pbar.refresh() | ||
self.pbar.n = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should also do this pbar update outside of the loop, otherwise i think the log is a bit confusing if all batches hit the cache and this loop doesn't get triggered
Completed OpenAI requests in batches: 0%| | 0/3 [00:00<?, ?request/s]2024-12-04 17:59:40,228 - bespokelabs.curator.request_processor.openai_batch_request_processor - INFO - File /home/trung/.cache/curator/47f1b0209c82f62b/responses_2.jsonl found for batch batch_67509878d7f48191882c1c055f27a8e1, skipping status check and download.
2024-12-04 17:59:40,229 - bespokelabs.curator.request_processor.openai_batch_request_processor - INFO - File /home/trung/.cache/curator/47f1b0209c82f62b/responses_1.jsonl found for batch batch_67509878f4a48191b133673cfddc8f27, skipping status check and download.
2024-12-04 17:59:40,230 - bespokelabs.curator.request_processor.openai_batch_request_processor - INFO - File /home/trung/.cache/curator/47f1b0209c82f62b/responses_0.jsonl found for batch batch_675098793790819197d298dc42ace1e1, skipping status check and download.
2024-12-04 17:59:40,233 - bespokelabs.curator.request_processor.openai_batch_request_processor - INFO - Found 3 out of 3 completed batches, resuming polling for the remaining 0 batches.
Completed OpenAI requests in batches: 0%| | 0/3 [00:00<?, ?request/s]
2024-12-04 17:59:40,240 - bespokelabs.curator.request_processor.base_request_processor - INFO - Using existing dataset file /home/trung/.cache/curator/47f1b0209c82f62b/ef46db3751d8e999.arrow
note how Completed OpenAI requests is 0% even though all batches hit the cache because we never actually entered the loop
ah feel like this should just be an in memory database |
why in-memory database? i think the current logic works fine, just needs that one fix? |
Yea it's not necessary, it's just even confusing me with all the places you need to keep track of everything. |
Addressing this now in #198 |
Fixes #196