Skip to content

franzscherr/minimal-grpo-game24

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Minimal GRPO for the game of 24

This is a minimal implementation for training LLMs to solve the game of 24 as an educative example.

The game of 24 requires the LLM to combine 4 different numbers, e.g. 2, 2, 7, 12, arithmetically to achieve 24 using each number exactly once. Here, one solution could be 2 * 7 - 2 + 12

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages