Code for my walkthrough of: Reinforcement Learning An Introduction by Richard Sutton and Andrew Barto (http://incompleteideas.net/book/the-book.html)
curl -LsSf https://astral.sh/uv/install.sh | sh
or with pip
# With pip.
pip install uv
Link to instructions for other OS's
uv sync
Run commands using the rlbook environment via uv:
uv run run.py
or by first activating the rlbook venv (this is my preferred workflow):
source ./venv/bin/activate
Sign up for an account at wandb: https://app.wandb.ai/login?signup=true
Copy the api key from: https://wandb.ai/authorize
Login to wandb via:
wandb login
Algorithm implementations are located in the /src
directory while the scaffolding code/notebooks for recreating/exploring Sutton & Barto are segmented into the experiments/
directory.
e.g. for recreating Figure 2.3, navigate to /experiments/ch2_bandits/
and run:
python run.py -m run.steps=1000 run.n_runs=2000 +bandit.epsilon=0,0.01,0.1 +bandit.random_argmax=true experiment.tag=fig2.2 experiment.upload=true
Figure 2.3 (rlbook): The +bandit.random_argmax=true
flag was used to switch over to an argmax implementation that randomizes between tiebreakers rather than first occurence used in the default numpy implementation to better align with the original example.
Link to wandb artifact
Further details on experimental setup and results can be found at corresponding chapter README's.