Skip to content

CyberAgentAILab/Consensus-GRPO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Consensus Group Relative Policy Optimization for Text Generation

Scripts for MBR/C-GRPO experiments.

The experiments conducted using NVIDIA A100 GPUs with 80 GB of VRAM.

Structure

  • src/: main training/evaluation code
  • scripts/: runnable shell scripts
  • data/: data prep and utilities
  • dataset/: dataset directory (after download/creation)

Usage

  • Python 3.12
  • cuda:12.6.1-devel-ubuntu22.04
  1. Install dependencies
bash scripts/setup.sh
  1. Run
bash scripts/run_c_grpo.sh

Use scripts/run_mbr.sh for using MBR decoding. You can edit arguments at the top of each script.

C-GRPO

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published