VeriReason Repository

This repository contains tools and configurations for training language models for the paper:

VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation

Our Hugging face models and dataset: huggingface repo

Project Description

This study introduces VeriReason, a novel approach utilizing reinforcement learning with testbench feedback to enhance the performance of pre-trained models for Verilog RTL code generation. VeriReason-Qwen2.5-3B is a 3B parameter model based on Qwen2.5-Coder-3B that combines supervised fine-tuning with Guided Reward Proximal Optimization (GRPO) reinforcement learning, specifically tailored for RTL code generation. The model integrates explicit reasoning capabilities with reinforcement learning for Verilog generation, establishing a new state-of-the-art for automated RTL synthesis in a smaller model size. By using our curated high-quality training examples alongside a feedback-driven reward model, this 3B parameter model delivers exceptional performance on Verilog generation tasks while maintaining efficiency.

Training Options

Supervised Fine-Tuning (SFT)

You can use either of the following methods to train an SFT model:

Using LLamaFactory

llamafactory-cli train qwen2.5_7b.yaml

Using OpenR1

Move sft_rtl to the folder: src/open_r1/
Make the training script executable:
```
chmod +x run_rtl_training.sh
```
Run the training script:
```
./run_rtl_training.sh
```

GRPO Training

For GRPO (Generative Reinforcement Learning from Preference Optimization) training:

Move the necessary files to the OpenR1 directory:

mv verilog_rewards_tb.py verilog_train_tb.py src/open-r1/

Create a new directory for the Verilog recipe:

mkdir verilog_recipe
mv verilog_grpo_tb.yaml verilog_recipe/

Example training command:

NCCL_DEBUG=INFO TORCH_DISTRIBUTED_DEBUG=DETAIL CUDA_VISIBLE_DEVICES=5,6,7 ACCELERATE_USE_NCCL=1 accelerate launch --config_file recipes/accelerate_configs/zero3.yaml --num_processes=3 src/open_r1/verilog_train_rtlcoder.py --config verilog_recipe/verilog_grpo_tb.yaml --use_vllm=false

Datasets

The following datasets are available on Hugging Face:

Dataset	Description	Link
RTL-Coder_small	Filtered dataset with no reasoning	Link
RTL-Coder_7b_reasoning_tb_simple	VeriReason simple dataset with reasoning and testbench	Link
RTL-Coder_7b_reasoning_tb	VeriReason hard dataset with reasoning and testbench	Link
RTL-Coder_7b_reasoning_tb_combined	VeriReason combined dataset with reasoning and testbench	Link

Model checkpoints

The following fine-tuned models are available on Hugging Face:

Model	Description	Link
VeriReason-Qwen2.5-1.5B	1.5B parameter model based on Qwen2.5	Link
VeriReason-Qwen2.5-3B	3B parameter model based on Qwen2.5 with RTL GRPO	Link
VeriReason-Qwen2.5-7b	7B parameter model based on Qwen2.5 with SFT Reasoning	Link
VeriReason-Llama-7b	7B parameter model based on Code Llama	Link

Requirements

CUDA-compatible GPUs
PyTorch with CUDA support
Accelerate library
NCCL for distributed training

Citation

@misc{wang2025verireasonreinforcementlearningtestbench,
      title={VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation}, 
      author={Yiting Wang and Guoheng Sun and Wanghao Ye and Gang Qu and Ang Li},
      year={2025},
      eprint={2505.11849},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2505.11849}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
GRPO		GRPO
SFT		SFT
assets		assets
README.MD		README.MD
gen_reasoning_tb.py		gen_reasoning_tb.py
index.html		index.html
robots.txt		robots.txt
sitemap.xml		sitemap.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VeriReason Repository

Project Description

Training Options

Supervised Fine-Tuning (SFT)

Using LLamaFactory

Using OpenR1

GRPO Training

Datasets

Model checkpoints

Requirements

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

NellyW8/VeriReason

Folders and files

Latest commit

History

Repository files navigation

VeriReason Repository

Project Description

Training Options

Supervised Fine-Tuning (SFT)

Using LLamaFactory

Using OpenR1

GRPO Training

Datasets

Model checkpoints

Requirements

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages