Skip to content

ASSERT-KTH/CodeRepairRL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CodeRepairRL - Reinforcement Learning for Program Repair

Overview

CodeRepairRL leverages recent advancements in applying Reinforcement Learning (RL) to Large Language Models (LLMs) to fine-tune them for domain-specific tasks. Our ultimate goal is to develop models similar to RepairLLama and Llama-3-SWE-RL, which "punch above their weight-class" in terms of parameter count, demonstrating exceptional performance in software engineering benchmarks.

The project uses a two-stage training approach:

  1. Supervised Fine-Tuning (SFT): Initial fine-tuning on high-quality code repair demonstrations
  2. Group Relative Policy Optimization (GRPO): Reinforcement learning to further improve performance on specific tasks

For more details on the project's objectives, conceptual background, and implementation specifics, see docs/PROJECT.md.

Academic Paper

The methodology and findings of this project are documented in an academic paper. The LaTeX repository for the paper is available at CodeRepairRL-Paper.

Getting Started

Building the Container

To build the Apptainer container:

# Build the training container 
apptainer build crrl.sif scripts/train_container.def

(the build process may take several minutes)

Reproducing on different compute setups

Using our Apptainer/SLURM setup

Before launching jobs, you should set CRRL_WORKDIR in your environment. Otherwise large files like model weights are downloaded to your $HOME/.cache:

# Choose your working directory (pick a location with plenty of fast storage)
export CRRL_WORKDIR="/path/to/your/crrl_workspace"

Then follow the container build and SLURM job submission steps above. This ensures that large model files and datasets are stored in a location with sufficient space rather than your home directory.

Alternative: Local reproduction with uv

If you do not have Apptainer/SLURM or want to reproduce runs locally, you can use uv. Below are self-contained bash snippets.

1) Install uv

curl -LsSf https://astral.sh/uv/install.sh | sh

2) Create the environment and install dependencies

# Install project dependencies (creates/uses a virtualenv automatically)
uv sync --extra vllm --extra flash

3.) Exact 14B GRPO reproduction (3x ≥80GB GPUs) — run in two terminals

  • Requires 3 GPUs with at least 80 GB VRAM each (e.g., A100 80GB/H100 80GB)
  • Terminal 1 runs the vLLM server on GPU 0; Terminal 2 runs training on GPUs 1–2

Terminal 1 (vLLM server on GPU 0):

CUDA_VISIBLE_DEVICES=0 uv run trl vllm-serve-async \
  --model "Qwen/Qwen3-14B" \
  --max-model-len 14336 \
  --gpu-memory-utilization 0.94 \
  --async-scheduling \
  --enable-prefix-caching \
  --max-num-seqs 16 \
  --max-num-batched-tokens 8192 \
  --long-prefill-token-threshold 2048 \
  --disable_log_stats \
  --enable_auto_tool_choice \
  --reasoning_parser qwen3 \
  --tool_call_parser hermes
# Leave this terminal running

Terminal 2 (trainer on GPUs 1–2):

CUDA_VISIBLE_DEVICES=1,2 uv run accelerate launch \
  --config_file scripts/deepspeed/zero2.yaml \
  --num_processes 2 \
  --module src.train_grpo -- \
        run=repo_repair \
        model=medium_qwen \
        agent.time_limit=60 \
        grpo=multi_turn_gspo \
        grpo.max_prompt_length=1024 \
        grpo.max_completion_length=12288 \
        grpo.num_train_epochs=10 \
        grpo.num_generations=8 \
        grpo.generation_batch_size=8 \
        grpo.per_device_train_batch_size=4 \
        grpo.gradient_accumulation_steps=4 \
        grpo.optim=adamw_torch \
        grpo.run_name="your-run-name"

Notes:

  • If you plan to push to the HuggingFace Hub, run huggingface-cli login first and drop run.push_to_hub=false.
  • You can override any config at the CLI via Hydra (e.g., change model, learning rate, batch sizes, etc.).

Running Supervised Fine-Tuning (SFT)

Before GRPO training, you can optionally run SFT to create a better starting point:

# Run SFT training job (small model)
sbatch scripts/small_sft_lora_train_job.sh

# Run SFT training job (large model)
sbatch scripts/large_sft_lora_train_job.sh

# Or run locally for testing
uv run -m src.train_sft

The SFT stage uses curated datasets of high-quality code repair examples to provide the model with a strong foundation before RL training.

Running GRPO Training Jobs

We provide specialized SLURM scripts for different model sizes, each pre-configured with appropriate compute resource allocations:

# For small models (8B), defaults to Qwen/Qwen3-8B
sbatch scripts/small_grpo_train_job.sh       # Full model training (5 GPUs)
sbatch scripts/small_grpo_lora_train_job.sh  # LoRA training (2 GPUs)

# For large models (32B), defaults to Qwen/Qwen3-32B
sbatch scripts/large_grpo_train_job.sh       # GRPO training (4 GPUs)

Each script includes pre-tuned GRPO parameters optimized for the corresponding model size category. The scripts support three task types:

  • detection: Binary vulnerability detection
  • repair: Single-file code repair with search-replace diffs
  • repo_repair: Repository-level code repair using agentic approaches

You can customize training with Hydra overrides:

# Change task type
sbatch scripts/small_grpo_train_job.sh run=detection

# Use a different model
sbatch scripts/large_grpo_train_job.sh model=large_qwen

# Override the automatic run name with a custom one
sbatch scripts/small_grpo_lora_train_job.sh grpo.run_name="custom-experiment-name"

Local Development

For "local" development and testing without Apptainer containers, you can use uv directly.

Installing uv

Install the uv package manager with:

MacOS / Linux

curl -LsSf https://astral.sh/uv/install.sh | sh

Windows (project not tested on Windows)

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

Testing

# run all tests
uv run pytest

# run specific testing file
uv run pytest tests/test_search_replace_diff.py

# run specific test
uv run pytest tests/test_search_replace_diff.py::test_specific_function

Documentation Structure

This repository uses several Markdown files to organize information:

  • README.md: (This file) Provides a high-level overview, setup instructions, and basic usage examples.
  • docs/PROJECT.md: Contains detailed information about the project's goals, implementation notes, theoretical background, and conceptual insights.
  • docs/DIARY.md: A development diary tracking progress, challenges, and decisions.
  • docs/AGENT_RL_INTEGRATION.md: Describes our approach to integrating agent frameworks into RL training loops using OpenAI-compatible API servers.
  • docs/DATASETS.md: Describes the datasets used in the project.
  • docs/RESOURCES.md: Lists relevant research papers, literature and broader resources reviewed for the project.
  • docs/VOCABULARY.md: Defines key terms and concepts used throughout the project.
  • docs/PAPER.md: Outlines the structure and key points for the academic paper.

About

Reinforcement Fine-Tuning

Topics

Resources

Stars

Watchers

Forks