DP-MuZero: Deep Policy MuZero Implementation

📋 Table of Contents

DP-MuZero: Deep Policy MuZero Implementation

Description

DP-MuZero is an implementation of the MuZero reinforcement learning algorithm for the CartPole environment. MuZero combines Monte Carlo Tree Search (MCTS) with deep neural networks to achieve superhuman performance in various domains without prior knowledge of the environment dynamics.

This project implements the core MuZero algorithm with a focus on the CartPole control problem, demonstrating how model-based reinforcement learning can be applied to classic control tasks.

Training loss visualization from Weights & Biases

Features

MuZero Algorithm Implementation: Complete implementation of the MuZero algorithm with representation, dynamics, and prediction networks
CartPole Environment: Integration with the CartPole-v1 environment from Gymnasium
Monte Carlo Tree Search: Efficient MCTS implementation for action selection
Training Visualization: Integration with Weights & Biases for experiment tracking
Docker Support: Containerized application with separate frontend and backend services
Web Interface: React frontend for visualization and interaction

Architecture

The project consists of two main components:

Backend (Python/FastAPI):
- MuZero algorithm implementation
- Environment integration
- Training and inference API
Frontend (React/TypeScript):
- Visualization of training progress
- Interactive environment for testing trained models

Getting Started

Prerequisites

Docker and Docker Compose
Git
Weights & Biases account for experiment tracking

Installation

Clone the repository:

git clone https://github.com/CogitoNTNU/DeepTactics-Muzero.git
cd DeepTactics-Muzero

Build and start the containers:
```
docker compose up --build
```

Weights & Biases Setup

Install wandb:
```
pip install wandb
```
Login to wandb (you'll need to provide your API key):
```
wandb login
```
The project uses wandb to track:
- Model gradients and parameters
- Training losses (value, reward, policy, total)
- Learning rate
- Game statistics (episode rewards and lengths)
- Training configuration

Usage

Running with Docker Compose

The easiest way to run the project is using Docker Compose:

docker compose up --build

This will:

Build the backend container with the MuZero implementation
Build the frontend container with the React application
Start both services with the appropriate networking

Accessing the Application

Frontend: http://localhost:9135
Backend API: http://localhost:9135/api/ping (proxied through nginx)

Training Visualization

Training progress can be monitored in real-time through the Weights & Biases dashboard. The implementation tracks:

Total loss and component losses (value, reward, policy)
Model gradients and parameters
Episode rewards and lengths
Learning rate changes

Visit https://wandb.ai/adisinghwork/muzero-cartpole to view training metrics.

Testing

To run the test suite:

docker compose run backend python -m pytest

For quick local testing without rebuilding the Docker image:

pytest backend/tests

Development

Code Style

When contributing to the project, please use type hinting in all methods:

import numpy as np
import torch

def dummy_fun(a: int, b: np.ndarray, c: torch.Tensor) -> list[int]:
    pass

Team

This project was developed by the following contributors:

_{ChristianFredrikJohnsen}	_{ludvigovrevik}	_Eiriksol	_{kristiancarlenius}
_BrageHK	_{adisinghstudent}	_Nicolai9897	_Vegardhgr
_SverreNystad

IDUN HPC Usage

For users with access to NTNU's IDUN supercomputer:

# Add your GitHub SSH key to the agent
ssh-add ~/.ssh/your_github_key

# Connect to IDUN
ssh -A idun.hpc.ntnu.no

# Test GitHub connectivity
ssh -T [email protected]

# Submit a job using the provided SLURM scripts
sbatch job.slurm

License

Distributed under the MIT License. See LICENSE for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 392 Commits
.github		.github
.vscode		.vscode
backend		backend
docs		docs
frontend		frontend
public		public
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
hyperparameter.slurm		hyperparameter.slurm
job.slurm		job.slurm
job2.slurm		job2.slurm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DP-MuZero: Deep Policy MuZero Implementation

Description

Features

Architecture

Getting Started

Prerequisites

Installation

Weights & Biases Setup

Usage

Running with Docker Compose

Accessing the Application

Training Visualization

Testing

Development

Code Style

Team

IDUN HPC Usage

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 10

Uh oh!

Languages

License

CogitoNTNU/DeepTactics-Muzero

Folders and files

Latest commit

History

Repository files navigation

DP-MuZero: Deep Policy MuZero Implementation

Description

Features

Architecture

Getting Started

Prerequisites

Installation

Weights & Biases Setup

Usage

Running with Docker Compose

Accessing the Application

Training Visualization

Testing

Development

Code Style

Team

IDUN HPC Usage

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 10

Uh oh!

Languages

Packages