You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In case of errors, the `InferenceClient.do_bulk_inference` method
will now return `None` for the affected objects instead of aborting
the entire bulk inference operation (and discarding any successfully
processed objects).
Fixes issue #68
The fix for #68 is different than what is described in #68. Instead of
using a generator based approach which will require the SDK consumer to
implement the error handling themselves, the SDK itself now handles the
errors. The downside of not using a generator is a larger memory footprint
to accumulate the results in a list. As an alternative, we can consider
using a generator to either yield the successfully processed inference
results or the list containing `None`. This approach will save memory.
Additionally, this commit introduces parallel processing in `InferenceClient.do_bulk_inference`.
This will greatly improve performance. Due to the non-lazy implementation of
`ThreadPoolProcessor.map`, this increases memory usage slightly ([cpython issue #74028])
[cpython issue #74028]: python/cpython#74028
The current implementation will try to perform inference for all chunks. If an exception occurs, the progress is lost.
By yielding the individual chunks, the caller can be in charge of error handling.
We can either change the existing API or introduce a separate method (and implement
do_bulk_inference
using the new method).See also #62
The text was updated successfully, but these errors were encountered: