You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @hljjjmssyh, I think our framework can support your needs. You can try using FSDP backend + vLLMRollout with tensor_parallel_size=8 and tune the gpu_memory_utilization and other hyper-parameters.
If you encounter OOM, you can turn on the param offload in reference and reward model similar to qwen2_7b example. And you can also try param/grad/optimzier offload in actor/critic model if you prefer a larger micro-batch size for training.
as mentioned in the title.
The text was updated successfully, but these errors were encountered: