Open
Description
🚀 The feature
it seems that initial handlers are loaded sequentially for different models(handlers for same model are loaded in parallel though). When serving many models in production, this will significantly slowdown the new server spinning up. If it is possible to load all handlers in parallel? e.g. for a 32 core machine, on server startup, ideally we should process 32 workers in parallel in startup. This will dramatically decrease the startup time and can scale up better during traffic surge.
Motivation, pitch
see above
Alternatives
No response
Additional context
No response