-
Notifications
You must be signed in to change notification settings - Fork 2.5k
NeMo-Gym Integration #4848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
NeMo-Gym Integration #4848
Conversation
Signed-off-by: Christian Munley <cmunley@nvidia.com>
Signed-off-by: Christian Munley <cmunley@nvidia.com>
Signed-off-by: Christian Munley <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
sergiopaniego
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super cool!
Some initial ideas:
- We could add the training script to the list of examples in here (file).
- Maybe we could rename the training script to be more explanatory
train_multi_agent.pyor something similar. - It would be super cool if we could add a guide to the docs covering this integration (similar to this but specific for NeMo-Gym)
🚀 🚀 🚀 🚀
Signed-off-by: cmunley1 <cmunley@nvidia.com>
… into cmunley1/nemo_gym_on_policy
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
Signed-off-by: cmunley1 <cmunley@nvidia.com>
|
thanks for reviewing @sergiopaniego what do you think about changes in grpo_trainer and vllm_serve? |
sergiopaniego
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cmunley1, I'll review the changes.
There's a conflict on tests/test_vllm_client_server.py, could you take a look at it?
Adding @kashif @qgallouedec here for review.
|
|
||
| uvicorn.run( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| uvicorn.run( | |
| # Start the server | |
| uvicorn.run( |
sergiopaniego
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be rename the integration file to nemo_gym.md?
We need to add this new file to _toctree.yml to display it on the documentation.
Co-authored-by: Kashif Rasul <kashif.rasul@gmail.com>
|
Also tagging @lewtun (multi-env) |
This integration supports training language models in NeMo-Gym environments using TRL GRPO. Both single step and multi step tasks are supported, including multi-environment training. NeMo-Gym orchestrates rollouts, returning token ids and logprobs to TRL through the rollout function for training. Currently this integration is only supported through TRL's vllm server mode.
see docs/source/nemo_gym_integration.md for a guide