Skip to content

CogitoNTNU/DeepTactics-Muzero

Repository files navigation

DP-MuZero: Deep Policy MuZero Implementation

License: MIT Python Version PyTorch FastAPI

MuZero Logo
📋 Table of Contents

Description

DP-MuZero is an implementation of the MuZero reinforcement learning algorithm for the CartPole environment. MuZero combines Monte Carlo Tree Search (MCTS) with deep neural networks to achieve superhuman performance in various domains without prior knowledge of the environment dynamics.

This project implements the core MuZero algorithm with a focus on the CartPole control problem, demonstrating how model-based reinforcement learning can be applied to classic control tasks.

Training Loss

Training loss visualization from Weights & Biases

Features

  • MuZero Algorithm Implementation: Complete implementation of the MuZero algorithm with representation, dynamics, and prediction networks
  • CartPole Environment: Integration with the CartPole-v1 environment from Gymnasium
  • Monte Carlo Tree Search: Efficient MCTS implementation for action selection
  • Training Visualization: Integration with Weights & Biases for experiment tracking
  • Docker Support: Containerized application with separate frontend and backend services
  • Web Interface: React frontend for visualization and interaction

Architecture

The project consists of two main components:

  1. Backend (Python/FastAPI):

    • MuZero algorithm implementation
    • Environment integration
    • Training and inference API
  2. Frontend (React/TypeScript):

    • Visualization of training progress
    • Interactive environment for testing trained models

Getting Started

Prerequisites

Installation

  1. Clone the repository:

    git clone https://github.com/CogitoNTNU/DeepTactics-Muzero.git
    cd DeepTactics-Muzero
  2. Build and start the containers:

    docker compose up --build

Weights & Biases Setup

  1. Install wandb:

    pip install wandb
  2. Login to wandb (you'll need to provide your API key):

    wandb login
  3. The project uses wandb to track:

    • Model gradients and parameters
    • Training losses (value, reward, policy, total)
    • Learning rate
    • Game statistics (episode rewards and lengths)
    • Training configuration

Usage

Running with Docker Compose

The easiest way to run the project is using Docker Compose:

docker compose up --build

This will:

  1. Build the backend container with the MuZero implementation
  2. Build the frontend container with the React application
  3. Start both services with the appropriate networking

Accessing the Application

Training Visualization

Training progress can be monitored in real-time through the Weights & Biases dashboard. The implementation tracks:

  • Total loss and component losses (value, reward, policy)
  • Model gradients and parameters
  • Episode rewards and lengths
  • Learning rate changes

Visit https://wandb.ai/adisinghwork/muzero-cartpole to view training metrics.

Testing

To run the test suite:

docker compose run backend python -m pytest

For quick local testing without rebuilding the Docker image:

pytest backend/tests

Development

Code Style

When contributing to the project, please use type hinting in all methods:

import numpy as np
import torch

def dummy_fun(a: int, b: np.ndarray, c: torch.Tensor) -> list[int]:
    pass

Team

This project was developed by the following contributors:

ChristianFredrikJohnsen
ChristianFredrikJohnsen
ludvigovrevik
ludvigovrevik
Eiriksol
Eiriksol
kristiancarlenius
kristiancarlenius
BrageHK
BrageHK
adisinghstudent
adisinghstudent
Nicolai9897
Nicolai9897
Vegardhgr
Vegardhgr
SverreNystad
SverreNystad

IDUN HPC Usage

For users with access to NTNU's IDUN supercomputer:

# Add your GitHub SSH key to the agent
ssh-add ~/.ssh/your_github_key

# Connect to IDUN
ssh -A idun.hpc.ntnu.no

# Test GitHub connectivity
ssh -T [email protected]

# Submit a job using the provided SLURM scripts
sbatch job.slurm

License

Distributed under the MIT License. See LICENSE for more information.

About

Implementation of MuZero from Google Deepmind. Issues are tracked at https://app.plane.so/neattactics/projects/c1e69ebe-3696-49ee-b88e-1e1141c5ca4c/issues/

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 10