Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does this framework support full parameter PPO tuning for the Qwen2.5-14B model on 8-A100 GPUs with 80GB memory each? #40

Open
hljjjmssyh opened this issue Dec 7, 2024 · 1 comment

Comments

@hljjjmssyh
Copy link

as mentioned in the title.

@PeterSH6
Copy link
Collaborator

PeterSH6 commented Dec 9, 2024

Hi @hljjjmssyh, I think our framework can support your needs. You can try using FSDP backend + vLLMRollout with tensor_parallel_size=8 and tune the gpu_memory_utilization and other hyper-parameters.

If you encounter OOM, you can turn on the param offload in reference and reward model similar to qwen2_7b example. And you can also try param/grad/optimzier offload in actor/critic model if you prefer a larger micro-batch size for training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants