-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ready for testing 🧪 Multi-policy training support #181
Merged
Merged
Changes from 13 commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
4cd1b84
Multi-policy training support added
Ivan-267 e5a5d50
Create TRAINING_MULTIPLE_POLICIES.md
Ivan-267 3e080b2
Update TRAINING_MULTIPLE_POLICIES.md
Ivan-267 8a6a8d8
Update TRAINING_MULTIPLE_POLICIES.md
Ivan-267 f4b1d88
Update hyperparameters in rllib_config.yaml
Ivan-267 beb1203
Update rllib_config.yaml hyperparameters
Ivan-267 fe60a8b
Auto-set num_envs_per_worker in rllib_example.py
Ivan-267 668b88d
Update rllib_config.yaml
Ivan-267 7ade3b0
Added basic calculation for train_batch_size to rllib_config.yaml
Ivan-267 0fd70c2
Multiple observation spaces fix
Ivan-267 39c5d91
Adds support for multidiscrete actions with sb3
Ivan-267 322b398
Removes init variables in ray_wrapper.py
Ivan-267 e7489ad
Removes register_env arguments - ray_wrapper.py
Ivan-267 26532d4
Update ADV_RLLIB.md
Ivan-267 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
This is a brief guide on training multiple policies focusing on Rllib specifically. If you don’t require agents with different action/obs spaces, you might also consider using Sample Factory (it’s fully supported on Linux), or for simpler multi-agent envs, SB3 might work using a single shared policy for all agents. | ||
|
||
## Installation and configuration: | ||
|
||
### Install dependencies: | ||
|
||
`pip install https://github.com/edbeeching/godot_rl_agents/archive/refs/heads/main.zip` (to get the latest version) | ||
|
||
`pip install ray[rllib]` | ||
|
||
`pip install PettingZoo` | ||
|
||
### Download the examples file and config file: | ||
|
||
From https://github.com/edbeeching/godot_rl_agents/tree/main/examples, you will need `rllib_example.py` and `rllib_config.yaml.` | ||
|
||
### Open the config file: | ||
|
||
If your env has multiple different policies you wish to train (explained below), set `env_is_multiagent: true`, otherwise keep it `false`. | ||
|
||
Change `env_path: None *# Set your env path here (exported executable from Godot) - e.g. 'env_path.exe' on Windows`* to point to your exported env from Godot. In-editor training with this script is not recommended as it will launch the env multiple times, to get info about different policy names, to train, and to export to onnx after training, so while possible, you would need to press `Play` in Godot editor multiple times during the process. | ||
|
||
You can also adjust the stop criteria (set to 1200 seconds by default), and other settings. | ||
|
||
## Configuring and exporting the Godot Env: | ||
|
||
### Multipolicy env design differences: | ||
|
||
When you set `env_is_multiagent` to `true`, if one agent (AIController) has `done = true` set, it will receive actions with zeros as values until all agents have set `done = true` at least once during that episode, at which point Rllib considers the episode for all agents to be done and will send a reset signal (this sets `needs_reset = true` in each AIController), and display episode rewards in stats. | ||
|
||
If you notice individual agents standing still or behaving oddly (depending on what action values set to zeros do in the game), it’s possible that some agents had `done = true` set previously in the episode while others are still active. | ||
|
||
In the example env, we have a training manager script that sets all agents `done` to true at the same time after a fixed amount of steps, and we’re ignoring the `needs_reset = true` signal as we’re manually resetting all agents once the episode is done. You could also handle resetting agents when `needs_reset` is set to `true` in your env instead (keep in mind that AIControllers also automatically set it to `true` after `reset_after` steps, you can override the behavior if needed). | ||
|
||
**The behavior described above is different from setting `env_is_multiagent` to `false`, or e.g. using the [SB3 example to train](https://github.com/edbeeching/godot_rl_agents/blob/main/docs/ADV_STABLE_BASELINES_3.md)**, in which case a single policy will be trained as a vectorized environment, meaning that each agent can have its own episode lengths and it will continue to receive actions even after setting `done = true`, as the agents are considered to auto-reset in the env itself (the reset needs to be implemented in Godot as in the example envs). | ||
|
||
### Setting policy names: | ||
For each AIController, you can set a different policy name in Godot. Policies will be assigned to agents based on this name. E.g. if you have 10 agents assigned to `policy1`, they will all use policy 1, and if you have one agent assigned to `policy2`, it will use policy 2. | ||
|
||
![setting-policy-names](https://github.com/edbeeching/godot_rl_agents/assets/61947090/13eb9b46-f7fb-467c-ad16-8609cda9f292) | ||
|
||
Screenshot from [MultiAgent Simple env](https://github.com/edbeeching/godot_rl_agents_examples/tree/main/examples/MultiAgentSimple). | ||
|
||
> [!IMPORTANT] | ||
> All agents that have the same policy name must have the same observation and action space. | ||
|
||
## Training: | ||
After installing the prerequisites and adjusting the config, you can start training by using `python rllib_example.py` in your conda env/venv (if you are in the same folder). | ||
Rllib will print out useful info in the console, such as the command to start `Tensorboard` to see the training logs for the session. | ||
Onnx files will automatically be exported once training is done and their paths will be printed near the bottom of the console log (you can also stop mid training with `CTRL+C`, but if you press it twice in a row, saving/exporting will not be done). | ||
|
||
For an example of a multi-policy env with 2 policies, check out the [MultiAgent Simple env](https://github.com/edbeeching/godot_rl_agents_examples/tree/main/examples/MultiAgentSimple). | ||
|
||
Additional arguments: | ||
- You can change the folder for logging, checkpoints, and onnx files by using:`--experiment_dir [experiment_path]`, | ||
- You can resume stopped sessions by using: `--restore [resume_path]` argument (rllib will print out the path to resume in the console if you stop training), | ||
- You can set the config file location using `--config_file [path_to_config.yaml]` (default is set to `rllib_config.yaml`). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
algorithm: PPO | ||
|
||
# Multi-agent-env setting: | ||
# If true: | ||
# - Any AIController with done = true will receive zeroes as action values until all AIControllers are done, an episode ends at that point. | ||
# - ai_controller.needs_reset will also be set to true every time a new episode begins (but you can ignore it in your env if needed). | ||
# If false: | ||
# - AIControllers auto-reset in Godot and will receive actions after setting done = true. | ||
# - Each AIController has its own episodes that can end/reset at any point. | ||
# Set to false if you have a single policy name for all agents set in AIControllers | ||
env_is_multiagent: false | ||
|
||
checkpoint_frequency: 20 | ||
|
||
# You can set one or more stopping criteria | ||
stop: | ||
#episode_reward_mean: 0 | ||
#training_iteration: 1000 | ||
#timesteps_total: 10000 | ||
time_total_s: 10000000 | ||
|
||
config: | ||
env: godot | ||
env_config: | ||
env_path: null # Set your env path here (exported executable from Godot) - e.g. env_path: 'env_path.exe' on Windows | ||
action_repeat: null # Doesn't need to be set here, you can set this in sync node in Godot editor as well | ||
show_window: true # Displays game window while training. Might be faster when false in some cases, turning off also reduces GPU usage if you don't need rendering. | ||
speedup: 30 # Speeds up Godot physics | ||
|
||
framework: torch # ONNX models exported with torch are compatible with the current Godot RL Agents Plugin | ||
|
||
lr: 0.0003 | ||
lambda: 0.95 | ||
gamma: 0.99 | ||
|
||
vf_loss_coeff: 0.5 | ||
vf_clip_param: .inf | ||
#clip_param: 0.2 | ||
entropy_coeff: 0.0001 | ||
entropy_coeff_schedule: null | ||
#grad_clip: 0.5 | ||
|
||
normalize_actions: False | ||
clip_actions: True # During onnx inference we simply clip the actions to [-1.0, 1.0] range, set here to match | ||
|
||
rollout_fragment_length: 32 | ||
sgd_minibatch_size: 128 | ||
num_workers: 4 | ||
num_envs_per_worker: 1 # This will be set automatically if not multi-agent. If multi-agent, changing this changes how many envs to launch per worker. | ||
# The value below needs changing per env | ||
# Basic calculation for this value can be rollout_fragment_length * num_workers * num_envs_per_worker (how many AIControllers you have if not multi_agent, otherwise the value you set) | ||
train_batch_size: 2048 | ||
|
||
num_sgd_iter: 4 | ||
batch_mode: truncate_episodes | ||
|
||
num_gpus: 0 | ||
model: | ||
vf_share_layers: False | ||
fcnet_hiddens: [64, 64] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,116 @@ | ||
# Rllib Example for single and multi-agent training for GodotRL with onnx export, | ||
# needs rllib_config.yaml in the same folder or --config_file argument specified to work. | ||
|
||
import argparse | ||
import os | ||
import pathlib | ||
|
||
import ray | ||
import yaml | ||
from ray import train, tune | ||
from ray.rllib.algorithms.algorithm import Algorithm | ||
from ray.rllib.env.wrappers.pettingzoo_env import ParallelPettingZooEnv | ||
from ray.rllib.policy.policy import PolicySpec | ||
|
||
from godot_rl.core.godot_env import GodotEnv | ||
from godot_rl.wrappers.petting_zoo_wrapper import GDRLPettingZooEnv | ||
from godot_rl.wrappers.ray_wrapper import RayVectorGodotEnv | ||
|
||
if __name__ == "__main__": | ||
parser = argparse.ArgumentParser(allow_abbrev=False) | ||
parser.add_argument("--config_file", default="rllib_config.yaml", type=str, help="The yaml config file") | ||
parser.add_argument("--restore", default=None, type=str, help="the location of a checkpoint to restore from") | ||
parser.add_argument( | ||
"--experiment_dir", | ||
default="logs/rllib", | ||
type=str, | ||
help="The name of the the experiment directory, used to store logs.", | ||
) | ||
args, extras = parser.parse_known_args() | ||
|
||
# Get config from file | ||
with open(args.config_file) as f: | ||
exp = yaml.safe_load(f) | ||
|
||
is_multiagent = exp["env_is_multiagent"] | ||
|
||
# Register env | ||
env_name = "godot" | ||
env_wrapper = None | ||
|
||
def env_creator(env_config): | ||
index = env_config.worker_index * exp["config"]["num_envs_per_worker"] + env_config.vector_index | ||
port = index + GodotEnv.DEFAULT_PORT | ||
seed = index | ||
if is_multiagent: | ||
return ParallelPettingZooEnv(GDRLPettingZooEnv(config=env_config, port=port, seed=seed)) | ||
else: | ||
return RayVectorGodotEnv(config=env_config, port=port, seed=seed) | ||
|
||
tune.register_env(env_name, env_creator) | ||
|
||
policy_names = None | ||
num_envs = None | ||
tmp_env = None | ||
|
||
if is_multiagent: # Make temp env to get info needed for multi-agent training config | ||
print("Starting a temporary multi-agent env to get the policy names") | ||
tmp_env = GDRLPettingZooEnv(config=exp["config"]["env_config"], show_window=False) | ||
policy_names = tmp_env.agent_policy_names | ||
print("Policy names for each Agent (AIController) set in the Godot Environment", policy_names) | ||
else: # Make temp env to get info needed for setting num_workers training config | ||
print("Starting a temporary env to get the number of envs and auto-set the num_envs_per_worker config value") | ||
tmp_env = GodotEnv(env_path=exp["config"]["env_config"]["env_path"], show_window=False) | ||
num_envs = tmp_env.num_envs | ||
|
||
tmp_env.close() | ||
|
||
def policy_mapping_fn(agent_id: int, episode, worker, **kwargs) -> str: | ||
return policy_names[agent_id] | ||
|
||
ray.init(_temp_dir=os.path.abspath(args.experiment_dir)) | ||
|
||
if is_multiagent: | ||
exp["config"]["multiagent"] = { | ||
"policies": {policy_name: PolicySpec() for policy_name in policy_names}, | ||
"policy_mapping_fn": policy_mapping_fn, | ||
} | ||
else: | ||
exp["config"]["num_envs_per_worker"] = num_envs | ||
|
||
tuner = None | ||
if not args.restore: | ||
tuner = tune.Tuner( | ||
trainable=exp["algorithm"], | ||
param_space=exp["config"], | ||
run_config=train.RunConfig( | ||
storage_path=os.path.abspath(args.experiment_dir), | ||
stop=exp["stop"], | ||
checkpoint_config=train.CheckpointConfig(checkpoint_frequency=exp["checkpoint_frequency"]), | ||
), | ||
) | ||
else: | ||
tuner = tune.Tuner.restore( | ||
trainable=exp["algorithm"], | ||
path=args.restore, | ||
resume_unfinished=True, | ||
) | ||
result = tuner.fit() | ||
|
||
# Onnx export after training if a checkpoint was saved | ||
checkpoint = result.get_best_result().checkpoint | ||
|
||
if checkpoint: | ||
result_path = result.get_best_result().path | ||
ppo = Algorithm.from_checkpoint(checkpoint) | ||
if is_multiagent: | ||
for policy_name in set(policy_names): | ||
ppo.get_policy(policy_name).export_model(f"{result_path}/onnx_export/{policy_name}_onnx", onnx=12) | ||
print( | ||
f"Saving onnx policy to {pathlib.Path(f'{result_path}/onnx_export/{policy_name}_onnx').resolve()}" | ||
) | ||
else: | ||
ppo.get_policy().export_model(f"{result_path}/onnx_export/single_agent_policy_onnx", onnx=12) | ||
print( | ||
f"Saving onnx policy to {pathlib.Path(f'{result_path}/onnx_export/single_agent_policy_onnx').resolve()}" | ||
) |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be
examples/rllib_config.ymal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I usually call the example from within the examples folder, so the default was based on my usage. If calling from GDRL repository directly then it should be changed.
If someone installs GDRL using
pip install
and then just downloads the example file and config file, they might not have the entire repository, but I'm not sure how common this is.I leave this up to you, I can definitely change the default.