This is a minimal implementation for training LLMs to solve the game of 24 as an educative example.
The game of 24 requires the LLM to combine 4 different numbers, e.g. 2, 2, 7, 12, arithmetically to achieve 24 using each number exactly once.
Here, one solution could be 2 * 7 - 2 + 12