This repository contains tools and configurations for training language models for the paper:
VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation
Our Hugging face models and dataset: huggingface repo
This study introduces VeriReason, a novel approach utilizing reinforcement learning with testbench feedback to enhance the performance of pre-trained models for Verilog RTL code generation. VeriReason-Qwen2.5-3B is a 3B parameter model based on Qwen2.5-Coder-3B that combines supervised fine-tuning with Guided Reward Proximal Optimization (GRPO) reinforcement learning, specifically tailored for RTL code generation. The model integrates explicit reasoning capabilities with reinforcement learning for Verilog generation, establishing a new state-of-the-art for automated RTL synthesis in a smaller model size. By using our curated high-quality training examples alongside a feedback-driven reward model, this 3B parameter model delivers exceptional performance on Verilog generation tasks while maintaining efficiency.
You can use either of the following methods to train an SFT model:
llamafactory-cli train qwen2.5_7b.yaml
- Move
sft_rtl
to the folder:src/open_r1/
- Make the training script executable:
chmod +x run_rtl_training.sh
- Run the training script:
./run_rtl_training.sh
For GRPO (Generative Reinforcement Learning from Preference Optimization) training:
-
Move the necessary files to the OpenR1 directory:
mv verilog_rewards_tb.py verilog_train_tb.py src/open-r1/
-
Create a new directory for the Verilog recipe:
mkdir verilog_recipe mv verilog_grpo_tb.yaml verilog_recipe/
-
Example training command:
NCCL_DEBUG=INFO TORCH_DISTRIBUTED_DEBUG=DETAIL CUDA_VISIBLE_DEVICES=5,6,7 ACCELERATE_USE_NCCL=1 accelerate launch --config_file recipes/accelerate_configs/zero3.yaml --num_processes=3 src/open_r1/verilog_train_rtlcoder.py --config verilog_recipe/verilog_grpo_tb.yaml --use_vllm=false
The following datasets are available on Hugging Face:
Dataset | Description | Link |
---|---|---|
RTL-Coder_small | Filtered dataset with no reasoning | Link |
RTL-Coder_7b_reasoning_tb_simple | VeriReason simple dataset with reasoning and testbench | Link |
RTL-Coder_7b_reasoning_tb | VeriReason hard dataset with reasoning and testbench | Link |
RTL-Coder_7b_reasoning_tb_combined | VeriReason combined dataset with reasoning and testbench | Link |
The following fine-tuned models are available on Hugging Face:
Model | Description | Link |
---|---|---|
VeriReason-Qwen2.5-1.5B | 1.5B parameter model based on Qwen2.5 | Link |
VeriReason-Qwen2.5-3B | 3B parameter model based on Qwen2.5 with RTL GRPO | Link |
VeriReason-Qwen2.5-7b | 7B parameter model based on Qwen2.5 with SFT Reasoning | Link |
VeriReason-Llama-7b | 7B parameter model based on Code Llama | Link |
- CUDA-compatible GPUs
- PyTorch with CUDA support
- Accelerate library
- NCCL for distributed training
@misc{wang2025verireasonreinforcementlearningtestbench,
title={VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation},
author={Yiting Wang and Guoheng Sun and Wanghao Ye and Gang Qu and Ang Li},
year={2025},
eprint={2505.11849},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2505.11849},
}