CRForge Gymnasium Environment

Python RL training framework for the CRForge Clash Royale simulator.

Python (Gymnasium)  --ZMQ-->  Java (CRForge Simulation)
   env.step()       JSON       GameEngine.tick()
   env.reset()      PAIR       30 FPS deterministic

Prerequisites

Python 3.10+
Java 17 (for the bridge server)
The crforge project built: ./gradlew build

Installation

# Basic install (env + bridge client)
pip install -e python/

# With training dependencies (SB3 + TensorBoard)
pip install -e "python/[train]"

Quick Start

1. Start the Java bridge server

export JAVA_HOME=$(/usr/libexec/java_home -v 17)
./gradlew :gym-bridge:run

The server listens on tcp://localhost:9876 by default.

2. Run random episodes (smoke test)

python python/examples/run_episodes.py

This runs 3 episodes with random actions and prints per-episode stats.

3. Train a PPO agent

python python/examples/train_ppo.py --timesteps 50000

Monitor training:

tensorboard --logdir logs/ppo_crforge

4. Evaluate a trained model

python python/examples/evaluate.py --model models/ppo_crforge --episodes 50

5. Watch a trained model play (AI Visualizer)

The desktop visualizer can run in AI mode, where Python controls the game via the same ZMQ protocol as the headless bridge. This lets you watch the trained model deploy cards with full rendering.

# Terminal 1: Start the desktop visualizer in AI mode
export JAVA_HOME=$(/usr/libexec/java_home -v 17)
./gradlew :desktop:run --args="--ai-port 9876"

# Terminal 2: Run the trained model (same command as headless evaluation)
python python/examples/evaluate.py --model models/ppo_crforge

The visualizer renders each step in real-time: when the model sends a step message, the engine ticks are spread across render frames so you can see entities move, projectiles fly, and cards deploy visually.

Controls during AI playback:

Key	Action
SPACE	Pause/resume (Python blocks)
+/-	Speed up/slow down (0.25x-8x)
P	Toggle path visualization
O	Toggle attack range circles
D	Toggle floating damage numbers
A	Toggle AOE damage indicators
H	Toggle HP numbers

Any script that connects to the bridge server works -- run_episodes.py, evaluate.py, or your own custom loop. The Python side requires no changes; it cannot tell whether the server is headless or rendering.

Environment Details

Action Space

MultiDiscrete([2, 4, 18, 32])

Index	Meaning	Values
0	action_type	0=no-op, 1=play card
1	hand_index	0-3 (which card slot)
2	tile_x	0-17 (arena column)
3	tile_y	0-31 (arena row)

Observation Space

All float32 for SB3 compatibility. Spatial coordinates normalized to [0, 1].

Key	Shape	Description
frame	(1,)	Current simulation frame
game_time	(1,)	Game time in seconds (0-600)
is_overtime	(1,)	1.0 if overtime
elixir	(2,)	[blue, red] elixir (0-10)
crowns	(2,)	[blue, red] crown count
hand_costs	(4,)	Card costs / 10 (normalized)
hand_types	(4,)	0=troop, 1=spell, 2=building
next_card_cost	(1,)	Next card cost / 10
next_card_type	(1,)	Next card type
towers	(6, 4)	[hp_frac, x_norm, y_norm, alive]
entities	(64, 7)	[team, type, move, x, y, hp, shield]
num_entities	(1,)	Active entity count

Reward Structure

Source	Magnitude	Purpose
Tower damage	+0.005/HP	Incentivize attacking
Crown earned	+1.0	Major milestone
Win	+5.0	Terminal reward
Loss	-5.0	Terminal penalty
Unit kill	+0.05/kill	Reward killing enemy units
Unit damage	+0.001/HP	Reward damaging enemy units
Elixir waste	-0.005/step	Penalize capping at 10
Time penalty	-0.0001/step	Discourage passive play
Invalid action	-0.01/step	Penalize unaffordable plays

Configuration

env = CRForgeEnv(
    endpoint="tcp://localhost:9876",  # Bridge server address
    blue_deck=["knight", "archer", ...],  # 8 card IDs
    red_deck=["knight", "archer", ...],   # 8 card IDs
    level=11,                        # Card/tower level (1-15)
    ticks_per_step=6,                # Sim ticks per step (default: 6 = ~5 decisions/sec)
    opponent="random",               # "random", "noop", or callable
    invalid_action_penalty=-0.01,    # Penalty for failed actions
)

Deterministic Seeding

Pass seed to env.reset() for reproducible episodes:

obs, info = env.reset(seed=42)  # Same seed -> same deck shuffle

FlattenedObsWrapper

For SB3's MlpPolicy, wrap the env to flatten Dict obs into a single vector:

from crforge_gym.wrappers import FlattenedObsWrapper
env = FlattenedObsWrapper(CRForgeEnv(...))

Integration Tests

Require the Java server running:

# Start server in one terminal
./gradlew :gym-bridge:run

# Run tests in another
CRFORGE_INTEGRATION=1 pytest python/tests/ -v

Architecture

Java bridge (gym-bridge/): ZMQ PAIR server, wraps GameEngine in a step/reset API
Python bridge (crforge_gym/bridge.py): ZMQ PAIR client, JSON protocol
Gymnasium env (crforge_gym/env.py): Wraps bridge client in standard Gym interface
Wrappers (crforge_gym/wrappers.py): FlattenedObsWrapper for SB3 compatibility

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CRForge Gymnasium Environment

Prerequisites

Installation

Quick Start

1. Start the Java bridge server

2. Run random episodes (smoke test)

3. Train a PPO agent

4. Evaluate a trained model

5. Watch a trained model play (AI Visualizer)

Environment Details

Action Space

Observation Space

Reward Structure

Configuration

Deterministic Seeding

FlattenedObsWrapper

Integration Tests

Architecture

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

CRForge Gymnasium Environment

Prerequisites

Installation

Quick Start

1. Start the Java bridge server

2. Run random episodes (smoke test)

3. Train a PPO agent

4. Evaluate a trained model

5. Watch a trained model play (AI Visualizer)

Environment Details

Action Space

Observation Space

Reward Structure

Configuration

Deterministic Seeding

FlattenedObsWrapper

Integration Tests

Architecture