labels: experimental, llm, agentic, post-training
train some lightweight model to predict whether or not the next token will be a as a mechanism for distilling a reflection behavior into an isolated component that can be used in conjunction with any model.