You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Customers have observed significant speedups when they need to generate based on multiple prompts using batched generations.
Currently fir/irr state are maintained without batch index, so to get batching we would need to introduce batch index in inference_context.fir_state etc in the inference kernels.
Add batch index to fir/irr state are maintained without batch index, so to get batching we would need to introduce batch index in inference_context.fir_state etc in NeMO.
Add test coverage for batched inference.
Expected Benefits
Significant (10x+) performance gains for many shorter generations.