Skip to content

This is the Github Repo for the paper: VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation

Notifications You must be signed in to change notification settings

NellyW8/VeriReason

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

VeriReason Repository

This repository contains tools and configurations for training language models for the paper:

VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation

Our Hugging face models and dataset: huggingface repo

Project Description

This study introduces VeriReason, a novel approach utilizing reinforcement learning with testbench feedback to enhance the performance of pre-trained models for Verilog RTL code generation. VeriReason-Qwen2.5-3B is a 3B parameter model based on Qwen2.5-Coder-3B that combines supervised fine-tuning with Guided Reward Proximal Optimization (GRPO) reinforcement learning, specifically tailored for RTL code generation. The model integrates explicit reasoning capabilities with reinforcement learning for Verilog generation, establishing a new state-of-the-art for automated RTL synthesis in a smaller model size. By using our curated high-quality training examples alongside a feedback-driven reward model, this 3B parameter model delivers exceptional performance on Verilog generation tasks while maintaining efficiency.

VeriReason Workflow

Training Options

Supervised Fine-Tuning (SFT)

You can use either of the following methods to train an SFT model:

Using LLamaFactory

llamafactory-cli train qwen2.5_7b.yaml

Using OpenR1

  1. Move sft_rtl to the folder: src/open_r1/
  2. Make the training script executable:
    chmod +x run_rtl_training.sh
  3. Run the training script:
    ./run_rtl_training.sh

GRPO Training

For GRPO (Generative Reinforcement Learning from Preference Optimization) training:

  1. Move the necessary files to the OpenR1 directory:

    mv verilog_rewards_tb.py verilog_train_tb.py src/open-r1/
  2. Create a new directory for the Verilog recipe:

    mkdir verilog_recipe
    mv verilog_grpo_tb.yaml verilog_recipe/
  3. Example training command:

    NCCL_DEBUG=INFO TORCH_DISTRIBUTED_DEBUG=DETAIL CUDA_VISIBLE_DEVICES=5,6,7 ACCELERATE_USE_NCCL=1 accelerate launch --config_file recipes/accelerate_configs/zero3.yaml --num_processes=3 src/open_r1/verilog_train_rtlcoder.py --config verilog_recipe/verilog_grpo_tb.yaml --use_vllm=false

Datasets

The following datasets are available on Hugging Face:

Dataset Description Link
RTL-Coder_small Filtered dataset with no reasoning Link
RTL-Coder_7b_reasoning_tb_simple VeriReason simple dataset with reasoning and testbench Link
RTL-Coder_7b_reasoning_tb VeriReason hard dataset with reasoning and testbench Link
RTL-Coder_7b_reasoning_tb_combined VeriReason combined dataset with reasoning and testbench Link

Model checkpoints

The following fine-tuned models are available on Hugging Face:

Model Description Link
VeriReason-Qwen2.5-1.5B 1.5B parameter model based on Qwen2.5 Link
VeriReason-Qwen2.5-3B 3B parameter model based on Qwen2.5 with RTL GRPO Link
VeriReason-Qwen2.5-7b 7B parameter model based on Qwen2.5 with SFT Reasoning Link
VeriReason-Llama-7b 7B parameter model based on Code Llama Link

Requirements

  • CUDA-compatible GPUs
  • PyTorch with CUDA support
  • Accelerate library
  • NCCL for distributed training

Citation

@misc{wang2025verireasonreinforcementlearningtestbench,
      title={VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation}, 
      author={Yiting Wang and Guoheng Sun and Wanghao Ye and Gang Qu and Ang Li},
      year={2025},
      eprint={2505.11849},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2505.11849}, 
}

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •