PRIME-RL
Researching scalable (RL) methods on language models.
Pinned Loading
Repositories
Showing 6 of 6 repositories
- RL-Compositionality Public
FROM $f(x)$ AND $g(x)$ TO $f(g(x))$: LLMs Learn New Skills in RL by Composing Old Ones
PRIME-RL/RL-Compositionality’s past year of commit activity - Entropy-Mechanism-of-RL Public
The Entropy Mechanism of Reinforcement Learning for Large Language Model Reasoning.
PRIME-RL/Entropy-Mechanism-of-RL’s past year of commit activity