📋 Table of Contents
DP-MuZero is an implementation of the MuZero reinforcement learning algorithm for the CartPole environment. MuZero combines Monte Carlo Tree Search (MCTS) with deep neural networks to achieve superhuman performance in various domains without prior knowledge of the environment dynamics.
This project implements the core MuZero algorithm with a focus on the CartPole control problem, demonstrating how model-based reinforcement learning can be applied to classic control tasks.
Training loss visualization from Weights & Biases
- MuZero Algorithm Implementation: Complete implementation of the MuZero algorithm with representation, dynamics, and prediction networks
- CartPole Environment: Integration with the CartPole-v1 environment from Gymnasium
- Monte Carlo Tree Search: Efficient MCTS implementation for action selection
- Training Visualization: Integration with Weights & Biases for experiment tracking
- Docker Support: Containerized application with separate frontend and backend services
- Web Interface: React frontend for visualization and interaction
The project consists of two main components:
-
Backend (Python/FastAPI):
- MuZero algorithm implementation
- Environment integration
- Training and inference API
-
Frontend (React/TypeScript):
- Visualization of training progress
- Interactive environment for testing trained models
- Docker and Docker Compose
- Git
- Weights & Biases account for experiment tracking
-
Clone the repository:
git clone https://github.com/CogitoNTNU/DeepTactics-Muzero.git cd DeepTactics-Muzero
-
Build and start the containers:
docker compose up --build
-
Install wandb:
pip install wandb
-
Login to wandb (you'll need to provide your API key):
wandb login
-
The project uses wandb to track:
- Model gradients and parameters
- Training losses (value, reward, policy, total)
- Learning rate
- Game statistics (episode rewards and lengths)
- Training configuration
The easiest way to run the project is using Docker Compose:
docker compose up --build
This will:
- Build the backend container with the MuZero implementation
- Build the frontend container with the React application
- Start both services with the appropriate networking
- Frontend: http://localhost:9135
- Backend API: http://localhost:9135/api/ping (proxied through nginx)
Training progress can be monitored in real-time through the Weights & Biases dashboard. The implementation tracks:
- Total loss and component losses (value, reward, policy)
- Model gradients and parameters
- Episode rewards and lengths
- Learning rate changes
Visit https://wandb.ai/adisinghwork/muzero-cartpole to view training metrics.
To run the test suite:
docker compose run backend python -m pytest
For quick local testing without rebuilding the Docker image:
pytest backend/tests
When contributing to the project, please use type hinting in all methods:
import numpy as np
import torch
def dummy_fun(a: int, b: np.ndarray, c: torch.Tensor) -> list[int]:
pass
This project was developed by the following contributors:
![]() ChristianFredrikJohnsen |
![]() ludvigovrevik |
![]() Eiriksol |
![]() kristiancarlenius |
![]() BrageHK |
![]() adisinghstudent |
![]() Nicolai9897 |
![]() Vegardhgr |
![]() SverreNystad |
For users with access to NTNU's IDUN supercomputer:
# Add your GitHub SSH key to the agent
ssh-add ~/.ssh/your_github_key
# Connect to IDUN
ssh -A idun.hpc.ntnu.no
# Test GitHub connectivity
ssh -T [email protected]
# Submit a job using the provided SLURM scripts
sbatch job.slurm
Distributed under the MIT License. See LICENSE
for more information.