Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display correct progress bar when resuming with batch #197

Closed
wants to merge 1 commit into from

Conversation

RyanMarten
Copy link
Contributor

Fixes #196

self.remaining_batch_ids.remove(batch_id)
response_files_found += 1
if response_files_found > 0:
tasks = [self.resume(batch_id, all_response_files) for batch_id in self.remaining_batch_ids]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we change this to .update_finished_batches or sth more reflective? this confuses me a little bit since i thought it was actually resuming the batch downloads.

pbar.n += self.tracker.n_completed_in_progress_requests
pbar.n += self.tracker.n_failed_in_progress_requests
pbar.refresh()
self.pbar.n = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also do this pbar update outside of the loop, otherwise i think the log is a bit confusing if all batches hit the cache and this loop doesn't get triggered

Completed OpenAI requests in batches:   0%|                                                                            | 0/3 [00:00<?, ?request/s]2024-12-04 17:59:40,228 - bespokelabs.curator.request_processor.openai_batch_request_processor - INFO - File /home/trung/.cache/curator/47f1b0209c82f62b/responses_2.jsonl found for batch batch_67509878d7f48191882c1c055f27a8e1, skipping status check and download.
2024-12-04 17:59:40,229 - bespokelabs.curator.request_processor.openai_batch_request_processor - INFO - File /home/trung/.cache/curator/47f1b0209c82f62b/responses_1.jsonl found for batch batch_67509878f4a48191b133673cfddc8f27, skipping status check and download.
2024-12-04 17:59:40,230 - bespokelabs.curator.request_processor.openai_batch_request_processor - INFO - File /home/trung/.cache/curator/47f1b0209c82f62b/responses_0.jsonl found for batch batch_675098793790819197d298dc42ace1e1, skipping status check and download.
2024-12-04 17:59:40,233 - bespokelabs.curator.request_processor.openai_batch_request_processor - INFO - Found 3 out of 3 completed batches, resuming polling for the remaining 0 batches.
Completed OpenAI requests in batches:   0%|                                                                            | 0/3 [00:00<?, ?request/s]
2024-12-04 17:59:40,240 - bespokelabs.curator.request_processor.base_request_processor - INFO - Using existing dataset file /home/trung/.cache/curator/47f1b0209c82f62b/ef46db3751d8e999.arrow

note how Completed OpenAI requests is 0% even though all batches hit the cache because we never actually entered the loop

@RyanMarten
Copy link
Contributor Author

ah feel like this should just be an in memory database

@vutrung96
Copy link
Contributor

why in-memory database? i think the current logic works fine, just needs that one fix?

@RyanMarten
Copy link
Contributor Author

Yea it's not necessary, it's just even confusing me with all the places you need to keep track of everything.
I'll send you the fixed version shortly

@RyanMarten
Copy link
Contributor Author

Addressing this now in #198

@RyanMarten RyanMarten closed this Dec 5, 2024
@RyanMarten RyanMarten deleted the ryam/batch-pbar-resume branch December 5, 2024 01:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants