Skip to content

Commit

Permalink
Merge pull request #5 from Microgorath/development
Browse files Browse the repository at this point in the history
Improve README
  • Loading branch information
Microgorath authored Jul 30, 2024
2 parents e561118 + 60bd6ed commit a5e7768
Show file tree
Hide file tree
Showing 3 changed files with 94 additions and 18 deletions.
44 changes: 44 additions & 0 deletions .devcontainer/devcontainer-cpu.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
// For format details, see https://aka.ms/devcontainer.json. For config options, see the
// README at: https://github.com/devcontainers/templates/tree/main/src/docker-existing-dockerfile
{
"name": "poke-rl-cpu",
"image": "tensorflow/tensorflow:2.15.0-jupyter",

"runArgs": ["--shm-size=50gb"],

// Features to add to the dev container. More info: https://containers.dev/features.
// "features": {},

// Use 'forwardPorts' to make a list of ports inside the container available locally.
// Showdown uses port 8000 by default.
"forwardPorts": [8000],

// Uncomment the next line to run commands after the container is created.
// "postCreateCommand": "bash .devcontainer/install-pokemon-showdown.sh",
// Uncomment the next line to run commands each time the container is started, before postAttachCommand
// "postStartCommand": "",
// Running in interactive shell (-i) allows it to be used as soon as it is installed.
"postAttachCommand": "bash -i .devcontainer/start-pokemon-showdown.sh",

"updateContentCommand": "bash .devcontainer/install-dev-tools.sh",

// Configure tool-specific properties.
"customizations": {
"vscode": {
"extensions": [
"ms-azuretools.vscode-docker",
"ms-python.python",
"ms-toolsai.jupyter",
"ms-toolsai.vscode-jupyter-cell-tags",
"ms-toolsai.jupyter-keymap",
"ms-toolsai.jupyter-renderers",
"ms-toolsai.vscode-jupyter-slideshow",
"ms-toolsai.tensorboard",
"ms-python.vscode-pylance"
]
}
}

// Uncomment to connect as an existing user other than the container default. More info: https://aka.ms/dev-containers-non-root.
// "remoteUser": "devcontainer"
}
60 changes: 46 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,24 @@
# Pokemon Battle Reinforcement Learning With Poke-Env, RLlib, and Pokemon Showdown
This repo contains a Jupyter Notebook that trains, evaluates, and tests a basic DQN reinforcement learning model on Pokemon Showdown using [poke-env](https://github.com/hsahovic/poke-env) and the [Ray Reinforcement Learning Library](https://github.com/ray-project/ray/tree/master/rllib).
Using RLlib allows for compatibility with the rest of the Ray suite, notably Tune's hyperparameter tuning and Ray's resource scaling.

Currently, only the DQN algorithm is supported, but it is easy to replace the DQNConfig with any algorithm RLlib supports.

A more generalized, modular version of the notebook is planned, with the goal of custom multiagent reinforcement learning.

## Installation

### Requirements Overview
This notebook uses RLLib, an open-source scalable reinforcement learning library in the Ray framework.
RLLib currently supports Python 3.9 - 3.12.
RLLib supports both PyTorch and Tensorflow, so either may be used. This setup will assume GPU will be used, but it is not necessary for most algorithms. Training with GPU was found to be slightly slower than only using CPU for DQN. GPU use is most likely only useful for large models that take longer for inference or backprop.
This notebook uses RLlib, an open-source scalable reinforcement learning library in the Ray framework.
RLlib currently supports Python 3.9 - 3.12.
RLlib supports both PyTorch and Tensorflow, so either may be used. This setup will assume GPU will be used, but it is not necessary for most algorithms. Training with GPU was found to be slightly slower than only using CPU for DQN. GPU use is most likely only useful for large models that take longer for inference or backprop.

### Tensorflow GPU Support
A dev container is provided that will set up a Linux Tensorflow 2.15.0-gpu-jupyter Docker container with everything set up for Tensorflow GPU support, which also starts its own local pokemon showdown server when started. The showdown server is port forwarded to be visible on the host, at http://localhost:8000.
Requires Docker Desktop, with Nvidia Container Toolkit set up.
To implement Pokemon battles, this notebook uses the Pokemon reinforcement learning environment [poke-env](https://github.com/hsahovic/poke-env). Without poke-env, this project would not exist!

If on Windows, also requires WSL2. Follow [this guide](https://gdevakumar.medium.com/setup-windows-10-11-machines-for-deep-learning-with-docker-and-gpu-using-wsl-9349f0224971) to set up Docker Desktop with WSL2 and Nvidia Container Toolkit. The CUDA toolkit version installed on the local WSL2 does not matter, as the Docker image installs its own CUDA Toolkit and cuDNN automatically.
As of Tensorflow 2.11, using GPU on Windows is not supported, thus why WSL2 is required.
Pokemon Showdown must also be installed for poke-env to function. If using the devcontainer, this will be done for you. For all other installation options, follow the steps on installing Pokemon Showdown in poke-env's [getting started doc](https://poke-env.readthedocs.io/en/stable/getting_started.html).

The resulting container is 7.3 GB.

### PyTorch GPU Support
PyTorch works just fine, without needing WSL2 or Docker. Training time is about the same as Tensorflow, but has very limited Tensorboard support. Set up PyTorch with GPU support however you would normally on a new Conda environment.
### PyTorch GPU Installation
PyTorch works just fine, without needing WSL2 or Docker. Training time is about the same as Tensorflow, but has limited Tensorboard support. Set up PyTorch with GPU support however you would normally on a new Conda environment.
```
conda create -n poke-rl-torch python=3.11
conda activate poke-rl-torch
Expand All @@ -29,5 +32,34 @@ config = config.framework(framework="tf2")

The resulting conda environment is 5.6 GB.

## Future RLLib Support
RLLib is currently in the process of updating to 3.0, which changes to a new API stack. However, the new API stack does not support all algorithms. This notebook currently uses some of the new API stack and some of the old API stack in order to use the DQN algorithm.
### PyTorch CPU-only Installation
The steps are the same as the PyTorch GPU Installation, except install regular non-cuda PyTorch.
```
conda create -n poke-rl-torch python=3.11
conda activate poke-rl-torch
pip3 install --user torch
pip3 install --user -r requirements.txt
```
Once installed, in basic_rl.ipynb be sure to change ```"tf2"``` to ```"torch"``` in the line:
```python
config = config.framework(framework="tf2")
```

### Tensorflow GPU Installation / Development Container
A dev container is provided that will set up a Linux Tensorflow 2.15.0-gpu-jupyter Docker container with everything set up for Tensorflow GPU support, which also starts its own local pokemon showdown server when started. The showdown server is port forwarded to be visible on the host, at http://localhost:8000.
Requires Docker Desktop, with Nvidia Container Toolkit set up.

If on Windows, also requires WSL2. Follow [this guide](https://gdevakumar.medium.com/setup-windows-10-11-machines-for-deep-learning-with-docker-and-gpu-using-wsl-9349f0224971) to set up Docker Desktop with WSL2 and Nvidia Container Toolkit. The CUDA toolkit version installed on the local WSL2 does not matter, as the Docker image installs its own CUDA Toolkit and cuDNN automatically.
As of Tensorflow 2.11, using GPU on Windows is not supported, thus why WSL2 is required.

The resulting container is 7.3 GB.

### Optional Local Showdown Server Setup
Open the Pokemon Showdown config that you copied into the pokemon-showdown config directory during setup. For the devcontainer, it is pokemon-showdown/config/config.js.

Decrease exports.inactiveuserthreshold to 1000. This reduces the time before inactive users' usernames are freed to use again to 1 second. Since no two users are allowed to have the same username, this is necessary to prevent exceptions when rerunning notebook cells multiple times.

Increase exports.simulatorprocesses to allow faster concurrent processing of battles. I set this to 4, but some experimentation may be necessary for your system and your algorithm config's number of concurrent player environments, which is num_env_runners * num_envs_per_env_runner * max_concurrent_trials.

## Future RLlib Support
RLlib is currently in the process of updating to 3.0, which changes to a new API stack. However, the new API stack does not support all algorithms. This notebook currently uses some of the new API stack and some of the old API stack in order to use the DQN algorithm.
8 changes: 4 additions & 4 deletions notebooks/basic_rl.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
"metadata": {},
"source": [
"#### Start Local Pokemon Showdown Server\n",
"If on the devcontainer, this is done automatically every time the container is started. Otherwise: \n",
"cd into your pokemon-showdown directory \n",
"node pokemon-showdown start --no-security"
]
Expand Down Expand Up @@ -198,7 +199,6 @@
"from ray.rllib.algorithms.dqn import DQNConfig\n",
"from ray import tune, train\n",
"import os\n",
"import ray\n",
"\n",
"# This is passed to each environment (SimpleRLPlayer) during training.\n",
"# 'player_config' is passed as a kwarg to the super().__init__() of SimpleRLPlayer's Gen9EnvSinglePlayer superclass.\n",
Expand Down Expand Up @@ -245,9 +245,9 @@
"config = config.env_runners(\n",
" # Number of cpus assigned to each env_runner. Does not improve sampling speed very much on its own. \n",
" num_cpus_per_env_runner=1,\n",
" # Number of workers to run environments. 0 forces rollouts onto the local worker.\n",
" # Number of workers to run environments. 0 forces rollouts onto the local worker. Each uses the above number of cpus.\n",
" num_env_runners=4,\n",
" # Number of environments on each env_runner worker, higher drastically improves sampling speed.\n",
" # Number of environments on each env_runner worker, increasing this drastically improves sampling speed.\n",
" num_envs_per_env_runner=4,\n",
" # Don't cut off episodes before they finish when batching.\n",
" # As a result, the batch size hyperparameter acts as a minimum and batches may vary in size.\n",
Expand Down Expand Up @@ -542,7 +542,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.0rc1"
"version": "3.11.0"
}
},
"nbformat": 4,
Expand Down

0 comments on commit a5e7768

Please sign in to comment.